Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

preview_player
Показать описание
In this video I show a more real example of how it might be where you would have collected your own data. Specifically we have machine translation task where we have two text files in english and german respectively and I show how to go from loading the text files to splitting it into training and test set to saving it as json and csv format. When we have it in JSON and CSV format the task becomes easy and the things we went through in tutorial 1 & 2 gets applied!

Dataset can be found here:

Resources I used to learn about torchtext:

❤️ Support the channel ❤️

Paid Courses I recommend for learning (affiliate links, no extra cost for you):

✨ Free Resources that are great:

💻 My Deep Learning Setup and Recording Setup:

GitHub Repository:

✅ One-Time Donations:

▶️ You Can Connect with me on:
Рекомендации по теме
Комментарии
Автор

Your explanation is so clear to me even my English is so poor!

changgengwei
Автор

very thankful for this useful stuff i was needing in my nlp assignment. thanks a tonnn

jayantpriyadarshi
Автор

Hello,

Your videos are very informative. I love your videos and learnt a lot especially designing Neural Network Architectures from Scratch. Thank you very much.

pratikhmanas
Автор

Great tutorial but you should definitely be using a context manager there!

AiCore
Автор

Excellent Content Thank you kindly.
A quick question, I have a unique text dataset that would require a specialized vocabulary and numericalization, am I able to call the Field method in the same way or must I build the vocabulary and numericalize manually save the numerical values to a csv file and then feed to the model?
Thank you

jesse-mikael
Автор

Can you do a new video with the iterable style datatypes since they removed the Field and tabular dataset.

JackJX
Автор

Does anyone know how to unpack a torchtext Batch object so you can handle the input and target tensors separately?

maxrutc
Автор

Can you explain why the padding is done within a batch and not for the entire data? I noticed that different batches will have different sentence length + padding. Could that distort the training and fitted model given the differing lengths? Or it doesn't matter.

orjihvy
Автор

What does TablularDataset exactly do? Haven't we already separated the data into train and test using train_test_split earlier in the code?

orjihvy
Автор

Can you show me how to implement the skip-gram model using your dataset?

shrimonmukherjee
Автор

The difference between Iterator and BucketIterator is that the latter does a torchtext.data.pool, which will shuffle batches between epochs.

jeremiahjohnson
Автор

What about padding? It is also important that we get same padding for all texts. 'PRE' or 'POST' padding? How can we use that in pytorch?

potdish
Автор

Great tutorial with a vivid explanation. But I was trying to load another language[Tigrinya] using spacy and it didn't work for me. I could really use your help.
thanks

milkiasbekana
Автор

Hello,
What if the dataset I use is too large to load into memory at once?
Is it feasible to split the dataset into shards as is done in OpenNMT-py?
Are there any other alternatives?

Thanks !

yangxiang
Автор

Can you please implement the article 'Structural Scaffolds for citation Intent classification in scientific publication'?

feravladimirovna
Автор

If I have one text-file with two languages separated by tab then how to apply this? and if my translation language don't have a spacy model then how do build a dataset?

zawadtahmeed
Автор

Hi ! Could you please provide the link to the used dataset?

feravladimirovna
join shbcf.ru