Chatbot Memory for Chat-GPT, Davinci + other LLMs - LangChain #4

preview_player
Показать описание
Conversational memory is how a chatbot can respond to multiple queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.

The memory allows a Large Language Model (LLM) to remember previous interactions with the user. By default, LLMs are *stateless* — meaning each incoming query is processed independently of other interactions. The only thing that exists for a stateless agent is the current input, nothing else.

There are many applications where remembering previous interactions is very important, such as chatbots. Conversational memory allows us to do that.

There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the `ConversationChain`.

🌲 Pinecone article:

📌 LangChain Handbook Code:

🙋🏽‍♂️ Francisco:

🎙️ AI Dev Studio:

🎉 Subscribe for Article and Video Updates!

👾 Discord:

00:00 Conversational memory for chatbots
00:28 Why we need conversational memory for chatbots
01:45 Implementation of conversational memory
04:05 LangChain's Conversation Chain
12:00 Conversation Summary Memory in LangChain
19:06 Conversation Buffer Window Memory in LangChain
21:35 Conversation Summary Buffer Memory in LangChain
24:33 Other LangChain Memory Types
25:25 Final thoughts on conversational memory

#artificialintelligence #nlp #openai #deeplearning #langchain
Рекомендации по теме
Комментарии
Автор

Thanks Kames to elaborate about Langchain Memory, For the viewers here are some 🎯 Key Takeaways for quick navigation:

00:00 🧠 Conversational memory is essential for chatbots and AI agents to respond coherently to queries in a conversation.
01:23 📚 Different memory types, like conversational buffer memory and conversational summary memory, help manage and recall previous interactions in chatbots.
05:42 🔄 Conversational buffer memory stores all past interactions in a chat, while conversational summary memory summarizes these interactions, reducing token usage.
14:13 🪟 Conversational buffer window memory limits the number of recent interactions saved, offering a balance between token usage and remembering recent interactions.
23:05 📊 Conversational summary buffer memory combines summarization and saving recent interactions, providing flexibility in managing conversation history.

We are also doing lots of workshops in this pace, looking forward to talk more

decodingdatascience
Автор

Super! For me, it is one of the best tutorials on this subject. Much appreciated, James.

MrFiveDirections
Автор

Things really seem to get interesting with the knowledge graph! Saving things that really matter like relation context, along with a combination of the other methods, starts to sound very powerful. Add in some embedding/vectorDB and wow. The other commenters idea about a system for bots evolving sentiment, or even personality, over time is worth thinking about as well.

daharius
Автор

Thank you. I was way behind langchain and had no time to read documentations. This video saved me a lot of time. Subscribed.

GergelyGyurics
Автор

Another masterpiece of a tutorial. You’re an absolute gem James!

kevon
Автор

Oh wow you just destroyed my project lol I gave chat GPT long term memory, autonomous memory store and recall, speech recognition, audio out put, self reflect. Thought I was the only working on stuff like this. Well I’m basically trying to build a sentient, I need vision tho. Hopefully GPT 4 is multimodal because I’m struggling to give me project vision recognition.

NextGenart
Автор

Cool! This video addressed the question that I had posed in your earlier (1st) video about the Token size limitations due to adding conversational history. The charts provide a good intuition of the workings of the memory types. Two takeaways. 1.When to use which mem. type 2. How to do performance tuning for a Chatbot app. due to the overheads posed by token tracking, memory appending so on..

cloudshoring
Автор

If I understand correctly the graphs, what is represented is the token used per interaction, in the case of the Buffer Memory (the quasi linear one), the 25th interact is about 4k tokens. But the price (in tokens) of the whole conversation up to the 25th interaction is the sum of the price of all the interactions up to the 25th. So basically the price of the conversations, in each case, is the area under the curves you showed, not the highest point it reached. The Summarized conversations, with the flat tendency towards the end, it means the price just keep adding almost the same tokens per each new interaction, not that the price of the conversation has reached a top.

adumont
Автор

Great explaining to the memory in langchain, when you show the chart is more clearly for my

davidmoran
Автор

Skimming through the docs, LangChain seems like a complicated abstraction around what's essentially auto copy and paste.

jason_v
Автор

Check out David Shapiro’s latest approach with salient summarization when you get a chance. Essentially: The summarizer can more efficiently pick and choose which context to preserve if it is properly primed with specific objectives/goals for the information.

THCV
Автор

Thanks for your content! looking forward to watching the knowledge graph video :)

DavidGarcia-gdvq
Автор

Great video! I love the graphs for token usage. I kept meaning to graph the trends myself, but I was too lazy! I was talking to Harrison Chase as he was implementing the latest changes to memory, and it's had me thinking about other unique ways to approach it. I've been using different customized summarizers, and I can bring up any subset of the message history as I like, but I'm thinking also to include some way to flag messages as important or unimportant, dynamically feeding the history. I also haven't really explored my options in terms of local storage and retrieval of old chat history. One note that I might make for the video too... I noticed you're using LangChain's usual OpenAI class and just adjusting your model to 3.5-turbo. My understanding is that we have been advised to use the new ChatOpenAI class for now when interacting with 3.5-turbo, since that's where they'll be focusing development and they can address changes there without breaking other stuff, necessary since the new model endpoint differs in how it takes a message list as parameter instead of a simple string.

m.branson
Автор

James - are you still planning to work on the KG video? Seems like a powerful method that solves for scale and token limits.

gutgutia
Автор

Thanks for this content James, awesome!

matheusrdgsf
Автор

In the scenario of conversational robots, how to limit the token consumption of the entire conversation?

For example, once the consumption reaches 1, 000, it will prompt that the tokens for this conversation have been used up.

FCrobot
Автор

Thank you! Awesome work!! Appreaciate it!

Davipar
Автор

Great content. thanks for that.

I'm working on a summary tweets use case, but I don't want to break the overall corpus into pieces, build summary to each one, and combine those summaries into a larger one. I want something more clever.

Suppose I have 10 tweets. 6 are related (same topics) and the last 4 are different from each other. I think I can build a better summary from "lang chain summary" by only summarizing the 6 related tweets and adding the 4 raw tweets. This can help not to lose the context for the future.

isaacyimgaingkuissu
Автор

Hi James, great video. This is probably a stupid comment but here goes.…Could you not just ask the LLM to capture some key variables that summarise the completion for the prompt and then feed that (rather than the full conversation) as ‘memory’ for subsequent prompts? I’m imagining a ‘ghost’ question being added to each prompt like ‘Also capture key variables to summarise the response for future recall’ and then this being used as the assistant message (per GTPTurbo 3.5) rather than all of the previous conversation?

bwilliams
Автор

Hi Sam, how do we keep the Conversation context of multiple users on different devices separate ?

sysadmin