AI Agent | Perplexity Alternative Built with LangGraph & Advanced Prompt Engineering (Demo)

preview_player
Показать описание
Jar3d is an open-source AI agent that leverages sophisticated AI engineering techniques, including meta-prompting, RAG, and advanced prompt engineering orchestrated by LangGraph. Jar3d can perform long-running, research-intensive tasks that require information from the internet. This video demonstrates how Jar3d works with the Llama 3.1 70B model, effectively creating an open-source version of Perplexity. The project integrates with Ollama and can run 100% locally.

Check out professor synapse for the original chain of reasoning implementation.

Chapters
Introduction: 00:00
Setting up Llama 3.1 for Jar3d: 01:34
Jar3d Demo: 08:00
Рекомендации по теме
Комментарии
Автор

Thanks for the murals you teach inline with the tech aspect. Id like to congratulate you on Jar3d. i've been following since the LangGhraph and learned a lot. thank you

MuhanadAbulHusn
Автор

I am yet to go through the GitHub repo to try this out. I will do with local llamas3.1 8G running behind ollama. Thanks a lot for the video and sharing your code. The delivery style is excellent, and no music is even more appreciated

muhannadobeidat
Автор

Jarad,

Thank you for being transparent with your open source model. Fortunately, I believe you can run your solution completely free with an end point deployment. I would like to monitize this solution with you exclusively. The cost to the end user would be pennies a day instead of dollars per hr. with a MLM multiplier to help everybody minimize cost and be scalable. Let me know if this would be of interest to you.

If so, it's just a matter of

Kudos

SolidBuildersInc
Автор

It's amazing to build an AI agent with sophisticated prompt engineering! Alternative technologies exist that can provide even greater efficiency and customisation when it comes to AI creation. #AIScience #AIAgents

sirishkumar-mz
Автор

Remarkable you got this up and running! Thanks for putting it out there!

actellimQT
Автор

Really cool, but it looks like you have the entire workflow processing sequentially. especially for the google searches and embedding, idk if ur already doing this or not, but you might want to look into threading or multi-processing to do those in parallel.

A google search is not intensive, it just takes a while to send a message to google servers, and get a message back, about 2 seconds. if u do 10, that's 20 seconds. you can easily do all of this in parallel giving you all 10 results in 2 seconds. you could also do a similar thing with the embedding and chunking, but that depends on the power of the host computer's cpu, doing it in parallel will give you a significant speed up, but there's a limit to that depending on the power of the computer.

If you are making the LLM calls to a server, I also recommend doing this in parallel when possible, even if the server would execute it one at a time. because in the future, if the server ever supports batching, then you can batch several prompts, and get a response for all of them in almost the same amount of time for 1 call. and if you're using commercial LLM's, they likely already support this.

You could make this whole workflow way faster, with not too much more effort.

redthunder
Автор

commenting for visibility. I think this is very cool

sd
Автор

great video mate congrats .
I wanted to ask you if you can do a tutorial with chainlit and LangGraph or graphrag, like a graph retrieval instead of just a RAG.

DialogoAi
Автор

That's really impressive, thank you.

brucehe
Автор

King shit, can’t wait to play around with it myself🔥🔥

mathiasschrooten
Автор

Great Stuff !
Yes you could add a chats feature ( save to disk and reload from disk ) ... as I use a toml << Library to save markdown to disk and load also as tuples ! < So essentially you could save your chat results etc as formatted markdown and reload later as well as being in markdown ( its easy to rechunk a whole dir of markdowns) < Create a json dataset for fine tuning a model ! ) <

I like your conferance with the model pre task ! as i was also considering how to acomplish this in ntask coversation ( ie the confer with user as a tool! ) i will down load the code and take a peek !

i do like your agents as they use the APi style to connect which also is a good way to connect to these services as it is a pure way ! ( but you loose out on some things perhaps which the openAI or clinet librarys may provide ! ) but i think its still ok as myself i do everything my own library:

I think it would be nice to extract the componnt for the client you use as well as make a fast api that can be reused to host a model instead of the ollama and lm-studio ! ( which also have hidden prompts and guardrails) << as a prompt is so important ow days as i have been traiing models for a while and you can compare the training results with bad prompts and good prompts ! <<
some times i completely remove these system prompts : but again ( being black english we both have a great command of the meaning of words ! hence we can create very shrt and concise prompts which fial for other ( due ot their lack of english command )) !

Perhaps you might think about Agent teams ! and the various ways small structures can be used to solev task such as Planner, Worker and Refiner ! < as each structure has its merits afor specific workflows ) ie its good not to over do it ! ..
Off loading to files, is also faster than keeping everything in memory !
i supose a chat ( task ) can be considered working memeory and a collection of chats a short term working memory, and the rag as a longterm working memry ad the model as a longterm memeory !<
Hence the correct setup will speed up the process as well as maintia the past for refferenceing !

As an APP --- I would just keep it as simle as possible ! << as this is where i fail myself with too much complexity ! >> so you should just make this part of the app a SINGLE TAB !>>>

xspydazx
Автор

I’m running perplexica and I love how you’ve made your version more feature rich. What I like about perplexica is the local searxing- is there a way to select and use that instead of relying on Serper?

robbateman
Автор

built with langgraph... in your vid "Agency Swarm: Why It’s Better Than CrewAI & AutoGen", does that mean langgraph > agency swarm?

themaxgo
Автор

Good work, thanks for open sourcing it! Why do you need 4 gpu-s to run the 4bit quantized llama model? The model size is around 40gb so 1 card should be enough. There is a distributed async search in the background and you use the multi-gpu setup for faster inference?

attilalukacs
Автор

Excellent tutorial! 👏
Is there a way to invoke the initialization of Jar3d without the output it creates? I was just thinking if this was somehow used in a production environment I would want to hide that from the user from seeing it.

mikew
Автор

I’d love to know how you deal with long long long articles. Let’s say you are hunting thing in PDFs or website that requires you to find certain texts, and put that text in context with what else has been written.

Sort of when writing a story book and you have to know the full story that has been written when writing out each page by page. How do you handle that? And given 2M context window is only available for GEMINI models.

Also when building chain of reasoning or tree of thought. Have you thought about having the outputs as simple one token response yes / no and then putting it Upto good use.

criticalnodecapital
Автор

Amazing project. Did you parse the links with an open source AI scrapper to LLM markdown to increase the embedding quality?

vitalis
Автор

Great work here. Is there any possibility to combine this with llamafile to be able to run the model locally and reduce the machine specs needed to run larger models?

john_blues
Автор

Isn't Mistral Large 2 better than Llama 70B for that use case?

gileneusz