Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

preview_player
Показать описание
In this video I show you how to to load different file formats (json, csv, tsv) in Pytorch Torchtext using Fields, TabularDataset, BucketIterator to do all the heavy preprocessing for NLP tasks, such as numericalizing, padding, building vocabulary, which saves us a lot of time to focus on actually training the models! In this example I show a toy example dataset for sentiment analysis but the things we go through are general and can be adapted for any dataset.

Resources I used to learn about torchtext:

❤️ Support the channel ❤️

Paid Courses I recommend for learning (affiliate links, no extra cost for you):

✨ Free Resources that are great:

💻 My Deep Learning Setup and Recording Setup:

GitHub Repository:

✅ One-Time Donations:

▶️ You Can Connect with me on:
Рекомендации по теме
Комментарии
Автор

I found it very difficult to get used to torchtext docs, but then I found your video :) Many thanks!

buithanhlam
Автор

could you make a video for new version of torchtext?

salihbalci
Автор

Very helpful tutorial.
Is it possible for you to make a tutorial, how to load data which is stored in a SQL database?

henricbohm
Автор

Yo, new version of torchtext (0.12) does not have Fields

dhawalsalvi
Автор

Can you do a video on the updated Torchtext 9.0? I think they revamped much of this and the new features look pretty awesome with subword tokenization implemented. (i.e. 'sub', '_word')

jeremiahjohnson
Автор

I somehow found the potato quote very inspiring 🤔.

subhasish
Автор

hi, it seems that torchtext got quite a bit of a changeover and this tutorial's contents are outdated, any chances you might wanna update it?

ugestacoolie
Автор

I have a question, how Can I save (train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)) train data or test data in IMDB dataset as a CSV file? since every time I need 15 mins to load data. thanks bro

ximingdong
Автор

Thanks for doing that. How to save it once we created it.

MasterMan
Автор

Fields TabularDataset are depreciated now. Is there any alternatives?

sabarishwarang
Автор

Please create one video on semantic segmentation using Pytroch CNN. Dataset must contain cancer images + ground truth images. And train model will return best IoU and Accuracy of proposed model.

AhmedIqbal
Автор

torchtext had some changes it seems, can't import these modules with recent version

finix
Автор

very nice tutorial, but i get warning texts saying BucketIerator, Field and TabularDataset are being deprecated ... also i cant scale BucketIterator for TPUs and multi-GPUs, any better alternatives ??

stephennfernandes
Автор

Always getting error can you help me please...

AttributeError: module 'torchtext.data' has no attribute 'Field'

salihbalci
Автор

how can I pass pandas dataframe into this process (instead of loading the file)?

m.j.
Автор

Very nice tutorial! While i was looking at torchtext, I actually came across the libraries torchnlp and allennlp. I couldn't really tell what the differences between them were. Have you worked with them ?

le-ne
Автор

Your examples are left padded but when I use the same bucket iterator on IMDB dataset, they are right padded. This is a bit confusing

sagsriv
Автор

Great video, I do really learn a lot, thanks. When I run BucketIterator, it came up with an error 'int' object is not subscriptable
, I check my codes but still got no ideas of where the fault is.

lingfengshen
Автор

Great tutorial, but sadly I think its already outdated. Torchtext has deprecated 'Field' and some other classes and Printing the keys and values of the dict 10:27 doesn't give proper representations of the objects anymore, they probably broke something while updating the code

lazypunk
Автор

Great video! I wanted to ask, how to use TabularDataset to split train, validation and test?
Should I use something like this below?
train_data, valid_data = TabularDataset.splits(
path="data", train="train.json", test="valid.json", format="json", fields=fields
)

test_data = TabularDataset.splits(
path="data", test="test.json", format="json", fields=fields
)

shoebjoarder
join shbcf.ru