5 Levels Of LLM Summarizing: Novice to Expert

preview_player
Показать описание


0:00 - Intro
0:40 - Level 1: Couple Sentences
2:01 - Level 2: Couple Paragraphs
3:43 - Level 3: Couple Pages
6:05 - Level 4: Entire Book
16:46 - Level 5: Unknown Amount (Agents)
Рекомендации по теме
Комментарии
Автор

This is badass. Such a cool approach with the best representation vectors. Thanks for continuing to put out great work!

dennisnichols
Автор

This content is incredible Greg. It's helping so many of us build the tools of the future (well at least the future of our own workflows!) Thank you!

taylorerwin
Автор

Just wanted to say your code and explanations are so coherent and easy to follow that an innumerate like me who barely knows python was able to grok the entire video played at 1.5x speed. Well done sir! Truly can't wait to try out the clustering technique.

feralmachine
Автор

I just wanted to thank you for your awesome video on text summarization. Your explanations were clear, concise, and informative, and your demonstrations were really helpful in understanding the concept. Your passion and expertise on the subject really shone through and I look forward to seeing more great content from you in the future!

IamalwaysOK
Автор

Really a life-changing playlist

Will check after 7 years

ujjaldeb
Автор

The level of clarity in your content is just insane. I absolutely love it! If I may make a suggestion though - something to consider... Because this technology is growing, changing and evolving so quickly, it would be soooo good to have something like a concept map showing all the main concepts and use cases of let's say langchain with particular ways of achieving it and links to the videos where this thing is explained :D

krisszostak
Автор

I'm immensely grateful for your enlightening series on the 5 Levels Of LLM Summarizing. The concept of chunks nearest to centroids representing summaries is brilliant and has offered me a fresh perspective. I eagerly anticipate your insights on AGENTS!

nattapongthanngam
Автор

Hey DI, I am a industrial designer, I watch tons of youtube very single day for research and I built an bot to download captions from youtube last week and store them in json, it saved me a ton of works already and now I have time for a coffee and half hour of ps4 every day, (even I had to wrtte more then 10 hrs of code every day after and between work last year...), I was working to 4am at night in last couple of day to try to have it work summarizing long caption (many of these lines are not important...) and I kept fail. and it was like magic your video come up! even I havent watch it yet but i leave a comment first to say thank, I trust you.

langchain doc is so messy and not clean that i am considering to redo it in Cantonese...

Thank god you make some video and you save our life.

chrisl
Автор

Your videos are extremely well explained and the use cases and examples are top notch. The vector clustering approach is pretty ingenious. Great stuff, keep it up!

DimitarShishkov
Автор

Very practical and informative video.I was waiting for the vid since I saw your Tweet. Thank you Greg

mouadse
Автор

Nice video. Clear and concise. Well done

minty
Автор

This is a great video and summary of the various options!

ozarkexpeditions
Автор

Here's some additional info about this kmeans array as I struggled to understand it myself (made by gpt-4):
The output you're seeing is from the labels_ attribute of the trained KMeans model in the sklearn library in Python.

The KMeans algorithm clusters data by trying to separate samples into n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified beforehand, which is what you've done by setting n_clusters to 11.

The labels_ attribute of the KMeans model returns an array where each element is the cluster label of the corresponding data point. These labels range from 0 to n_clusters - 1. So in your case, the labels range from 0 to 10, since you specified 11 clusters.

To put it in the context of your specific problem, you've passed a list of vectors to the KMeans model. Each vector probably represents a portion of the book you're trying to summarize, perhaps a sentence or a paragraph, which has been transformed into a numerical vector using the langchain library.

The array array([ 2, 2, 2, 8, 8, 2, 5, 1, 1, 7, 7, 4, 4, 9, 10, 5, 5, 5, 3, 3, 3, 0, 0, 10, 10, 6], dtype=int32) then represents the cluster assignments for each of these vectors. For example, the first three vectors were assigned to cluster 2, the next two to cluster 8, and so on.

These cluster assignments are based on the distances between the vectors. Vectors that are closer to each other (and hence more similar) will be assigned to the same cluster. By identifying these clusters, the KMeans algorithm helps you find groups of similar sentences or paragraphs in the book. This can help in summarizing the book by identifying the key themes or topics covered.

krisszostak
Автор

Super cool. Your tutorials are extremely helpful!

vCtrlAltDelv
Автор

very cool approaches. thanks for sharing!

kevon
Автор

Great video 👋👋Especially part 4 was illuminating.👍

henkhbit
Автор

I want to tell you that I am really appreciating the work you put in this tutorials! Realy, realy helpful. As an educationalist I am trying to get such a system to work for making learning plans, leassons, learning goals, examing question etc. You're work is realy helpfull and motivational to start working on a app that can make this stop just from documents. Really appreciate it!

NS_Miata
Автор

Thanks! you helped me to get to the right path to my solution !

rodrigoniveyro
Автор

I wouldn't have even considered the token limitation. Thank you for another great video.

michaelw
Автор

I really like your best representation vector approach!

ChrisadaSookdhis