Agent-S : Unleash The Power Of GUI Computer Use Agents !

preview_player
Показать описание
In this video, I look at the paper "Agent-S" and how it handles GUI agents and the components that are needed to make that work.

For more tutorials on using LLMs and building agents, check out my Patreon

🕵️ Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
00:34 Agent-S Paper
01:03 Example Task
02:08 How it Works
04:49 Experience Augmented Hierarchical Planning
05:38 2 Types of Memory
06:56 Agent Computer Interface
10:27 Paper, Site & Code
Рекомендации по теме
Комментарии
Автор

Businesses often have bespoke apps that have user documentation but no API. I can see Agent S being fabulous for this type of thing.

Avman
Автор

Thanks Sam. It'll be interesting when you can fine-tune this on your domain specific apps, and what the fine-tuning process would look like.

kenchang
Автор

This sounds useful for automated testing to mimic how a user would behave then interact a web app or desktop app or even just carry workflow tasks as a chaos user.

steelwolf
Автор

awesome explanation sam. Can you do more of these videos of explaining papers really helps merge understanding between GA and scientific knowledge. Where do you find worthwhile papers, hugging face?

shiv_
Автор

I have done so many project after getting a lot knowledge from you. We need the new video on image generation model that can handle the text and facial and body problem

muhammadhasnain
Автор

Several months ago, when Rabbit R1 device was announced, there was another wave of "large action models" - an attempt of training or fine tuning transformers to do the UI interaction stuff. I wonder where did this eventually go? There were few quite promising products

alx
Автор

You mentioned something I strongly believe in. A generic solution is required. It's just a matter of time before website owners, apps, and platforms realize they need to create specific layers for AI agents and assistants. Rather than creating weird solutions to communicate with apps, it makes sense now when apps can't provide enough API data for AI applications. Website owners will likely have specific markdown with knowledge and instructions for AI, possibly developing a markup language for AI data. We can even include tools our websites or apps want AI to use. Like with robots.txt, website owners will define which parts AI can control. This isn't far off. Even for other products/services like books, musics or movies, authors can include that AI content layer. Until then, IMO we have patchwork solutions that aren't permanent but help understand the system's needs, weaknesses, and strengths.

unclecode
Автор

Thanks Sam. I'm learning everything by myself and I need help in identifying worthy recent research papers to study. How do you know which ones are good?

arungnanaable
Автор

Very interesting, does this compete with Microsoft UFO?

bombala
Автор

Create a video on image generation model plz

muhammadhasnain
Автор

Funny that the next day this video came out anthropic published their computer use API

megaklis.vasilakis
Автор

24 hours later... Anthropic brings out computer use.

davidmetekingi
Автор

I would be super concerned to allow anything to run directly on my desktop. It could see passwords, cryptographic keys, modify the registry, destroy the system.

pensiveintrovert
Автор

I don't trust AI enough to give it access to the files and apps on my computer.

micbab-vgmu