Structured Output From OpenAI (Clean Dirty Data)

preview_player
Показать описание


0:00 - Intro
0:40 - Preview of Output
1:34 - Code
2:20 - How To Get Structured Output
4:00 - Prompt Template
7:31 - Call OpenAI API
8:34 - Review Output
10:12 - Next Steps

Рекомендации по теме
Комментарии
Автор

Hey, I’m back again. Love this, will definitely be using this video’s ideas.

I would love to see how you build a question and answering bot that can take a unstructured directory of files and let users answer questions of them.

Thanks as always!

philipsnowden
Автор

Great explanations! I love that you show step by step outputs. I have been working on returning structured outputs using Langchain + Pydantic, but have found that the output can be a bit unpredictable.

ashleybrooks
Автор

Thank you for the idea of returning a list of json I wish I had seen this the day before yesterday.
I was facing the same issues with multiple json objects, plus my chatgpt was putting unicode characters that you could not see but were smiley faces that I noticed after much much headache.

aiautoglasscrm
Автор

While you videos are highly insightful, i can hear you heartbeat. in middle of the video i though why i am getting this much excited that i can hear my heart beat while having headphone.

anujjindal
Автор

Thanks for the series of video tutorials on langchain. Really learnt a lot.

I had some issues in loading string into json format.

Here is small tweaked code that worked for me.

if "```json" in output.content:
json_string = output.content.split("```json")[1].strip().replace("```", "")

structured_data = json.loads(json_string).

Looking forward to more tutorials on langchain.

msmmpts
Автор

Your videos are fantastic. If you offer consulting, I would, at some point, like to get your advice on a problem we are trying to solve.

elraulcastro
Автор

Hey Greg, thanks for the video on structured output!
One quick tip - maybe it will help other people, when i run

code
print(output.content)


output
```json
[
{
"input_industry": "air LineZ",
...

and it cannot run next code
json.loads(output_content)

it has to correct symbol first
output_content = output_content.replace("```", "'''")

On a separate note, I'm looking for a video about using LangChain for question answering across multiple documents. Any chance you have one in your playlist?

nattapongthanngam
Автор

Never considered this use case before.

we-hate-copy-pasting
Автор

Amazing video! I have a question that might be interesting. Do you think we can use this functionality not only to map dirty inputs to more structured data (as in your example) but also to cluster text data (for example I am dealing with text comments from users, I might ask: "find what these comment most closely relates to these aspects ["price too high", "low quality", "high delivery time" etc.].

edoardodenigris
Автор

Make it decide on what is fruit and what is vegetable 😅

romandobra
Автор

Good videos, thanks. But did you notice that there is a weird background noise. Sounds like a phone or something.

AFSBallin
Автор

Hey man. Great content as always. Do you have a discord or are there any communities you’re a part of that help?

vinosamari
Автор

Thanks for the video! I have a question regarding the format. JSON is quite a verbose format where each key name is repeated and it will use up a lot of tokens depending on the volume. Wouldn't it be better in that case that to use TSV which is way simpler and less verbose?

cyrilgorrieri
Автор

fantastic idea and implenmentation! thanks a lot for the inspiration :) Well naturally the next step should be process the clean nice dataframe in a pipeline. Could you do a "text to analyze" video after this one?

ChenXibo
Автор

Hey Greg, great video!
Just one quick question on 8:08, you are talking about some parsing error, can you please explain a bit more? You are splitting on ```json but was the issue? Will be happy to get more details :)

SteveSolun
Автор

Thanks for the awesome content.
is it possible to format the response from openAI api to structured one using functions ?
so instead of feeding the format type in the template we just use openAI api function to tell about the response format ?

gurmukhsingh-uhqo
Автор

That is great video. One quick question. How to make it work if the input industry list is beyond the token limit of LLM ? (Of course one solution would be to split the input list and make multiple calls. But I was wondering if there is anything that LangChain provides to help there.)

vufplen
Автор

The one thing to stop me learning langchain is API token 😢

Cr-R
Автор

Loving your video, Can u pls suggest if any model to query tabular data using NLP and get output as filtered table

Donn
Автор

How is match score measured? I am curious how it calculated!!! Help me!!!

techsavvy