Create Custom Dataset for Question Answering with T5 using HuggingFace, Pytorch Lightning & PyTorch

preview_player
Показать описание

Learn how to create a dataset for Question Answering with T5 using questions from the BioASQ challenge. Learn the basics of the T5Tokenizer and prepare a data module for fine-tuning on Question Answering tasks.

#T5 #QuestionAnswering #HuggingFace #Transformers #PyTorch #PyTorchLightning #MachineLearning #DeepLearning #Python
Рекомендации по теме
Комментарии
Автор

I'm making my final year project and it involves training a model on custom data, thanks for this video.

awmawam
Автор

Thanks for making such videos, this will be most useful resources for my next job most probably

sayedathar
Автор

Amazing guide Venelin❤️. it' 'll very helpful if Provide the Notebook.

dv
Автор

Hi, your videos are very helpful! I would love to see an example of extractive document summarization

deabsoluteschwarz
Автор

Very interesting and detailed description. Thanks a lot for this video !!
Here, I have a question for you :

For Zero-shot learning, do we need to modify embedding or not?

Detailing of Question:

I am working on a multilingual-BERT model for the Question-Answering task. The model is already pretrained on the English dataset. Now I want to check it's performance on another language ('Hindi') in Zero-shot setting.

So, to do so by zeros-shot learning which of the following is the correct approach:
1) Give evaluation data (dev-set) of Hindi to model and check the result.
2) Using training data of Hindi train tokenizer and use that new tokenizer with your previous model (do not train m-Bert on training set of hindi) to predict the answer

Which of these is the correct interpretation of Zero-shot learning.

pandya
Автор

Thank you for the walkthrough. Appreciate your effort!

gprasadk
Автор

great to know you are a office fan too.

mahimanzum
Автор

simply amazing tutorial. Thanks a lot for sharing.

ashishbhatnagar
Автор

Great tutorial. I'm just missing one piece. I'm not understanding the step between the encoding and the batching of the inputs. I would love some help with that, please.

davidsimmonds
Автор

I have a very important question. For the dataset containing only question and answer features. How should I approach? For eg. If the user input question, model must generate answer.

iamrxn
Автор

Thanks a lot for sharing. What can be the research gaps/improvements that we can do in this type of tasks ?

shivammarathe
Автор

Hi! thanks! Can you please provide us with the notebook to experiment with it?

feravladimirovna
Автор

Hi, could you please provide a link to the notebook? I am not understanding it properly without applying it myself.

JJetinder
Автор

Hi, I recently make my own datasets, others are prepared but only answer_start is a matter, how could I give the answer_start to raw data?It is I need to find the position in the context.and label it manually?

jiajundeng
Автор

in the BioQADataset in getitem function, why don't we use self.tokenizer? (45:41) I added self. but when I run trainer.fit() I get this error: 
target_encoding = self.tokenizer(
data_row['answer'],
max_length=self.target_max_token_len,

TypeError: 'tuple' object is not callable

without self. it just doesn't recognize the tokenizer (which makes sense). Any idea why I get this error?

fatemeh
Автор

Hi Venelin, I got RecursionError: Max recursion depth exceeded while calling python object error...can you please provide me the solution. Thank you

arpitshah
Автор

Hi Venelin! Thanks for these videos. Is the code for this available anywhere?

preethiseshadri
Автор

can you share the link to the dataset as I;m ont able to down that?

datareactor
Автор

i get errors when i install pytorch lighting

flowerboy_
Автор

Bro how to put ml algorithms in a dataset

saurrav
visit shbcf.ru