Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Показать описание

In this video I show you how to to load different file formats (json, csv, tsv) in Pytorch Torchtext using Fields, TabularDataset, BucketIterator to do all the heavy preprocessing for NLP tasks, such as numericalizing, padding, building vocabulary, which saves us a lot of time to focus on actually training the models! In this example I show a toy example dataset for sentiment analysis but the things we go through are general and can be adapted for any dataset.

Resources I used to learn about torchtext:

❤️ Support the channel ❤️

Paid Courses I recommend for learning (affiliate links, no extra cost for you):

✨ Free Resources that are great:

💻 My Deep Learning Setup and Recording Setup:

GitHub Repository:

✅ One-Time Donations:

▶️ You Can Connect with me on:

Рекомендации по теме

Комментарии

I found it very difficult to get used to torchtext docs, but then I found your video :) Many thanks!

buithanhlam

could you make a video for new version of torchtext?

salihbalci

Very helpful tutorial.
Is it possible for you to make a tutorial, how to load data which is stored in a SQL database?

henricbohm

Yo, new version of torchtext (0.12) does not have Fields

dhawalsalvi

Can you do a video on the updated Torchtext 9.0? I think they revamped much of this and the new features look pretty awesome with subword tokenization implemented. (i.e. 'sub', '_word')

jeremiahjohnson

I somehow found the potato quote very inspiring 🤔.

subhasish

hi, it seems that torchtext got quite a bit of a changeover and this tutorial's contents are outdated, any chances you might wanna update it?

ugestacoolie

I have a question, how Can I save (train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)) train data or test data in IMDB dataset as a CSV file? since every time I need 15 mins to load data. thanks bro

ximingdong

Thanks for doing that. How to save it once we created it.

MasterMan

Fields TabularDataset are depreciated now. Is there any alternatives?

sabarishwarang

Please create one video on semantic segmentation using Pytroch CNN. Dataset must contain cancer images + ground truth images. And train model will return best IoU and Accuracy of proposed model.

AhmedIqbal

torchtext had some changes it seems, can't import these modules with recent version

finix

very nice tutorial, but i get warning texts saying BucketIerator, Field and TabularDataset are being deprecated ... also i cant scale BucketIterator for TPUs and multi-GPUs, any better alternatives ??

stephennfernandes

Always getting error can you help me please...

AttributeError: module 'torchtext.data' has no attribute 'Field'

salihbalci

how can I pass pandas dataframe into this process (instead of loading the file)?

m.j.

Very nice tutorial! While i was looking at torchtext, I actually came across the libraries torchnlp and allennlp. I couldn't really tell what the differences between them were. Have you worked with them ?

le-ne

Your examples are left padded but when I use the same bucket iterator on IMDB dataset, they are right padded. This is a bit confusing

sagsriv

Great video, I do really learn a lot, thanks. When I run BucketIterator, it came up with an error 'int' object is not subscriptable
, I check my codes but still got no ideas of where the fault is.

lingfengshen

Great tutorial, but sadly I think its already outdated. Torchtext has deprecated 'Field' and some other classes and Printing the keys and values of the dict 10:27 doesn't give proper representations of the objects anymore, they probably broke something while updating the code

lazypunk

Great video! I wanted to ask, how to use TabularDataset to split train, validation and test?
Should I use something like this below?
train_data, valid_data = TabularDataset.splits(
path="data", train="train.json", test="valid.json", format="json", fields=fields
)

test_data = TabularDataset.splits(
path="data", test="test.json", format="json", fields=fields
)

shoebjoarder

Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

What is Torchtext?

Creating a Custom Text Dataset in PyTorch - Tutorial - Series 1

TorchText | PyTorch Developer Day 2020

Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

How to build custom Datasets for Text in Pytorch

Custom Datasets & Dataloaders - Pytorch Learning 1

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Text Classification on Custom Dataset using PyTorch and TORCHTEXT – On Kaggle Tweet Sentiment data...

loading custom dataset in PyTorch

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

Torchtext

PyTorch Programming - Datasets

Code Walk Thru: PyTorch Embeddings Tutorial 01

PyTorch Custom Datasets From Zero to Hero

How to Create PyTorch Dataloaders With V7 | Tutorial

PyTorch Transformations Tutorial | Rotations and Flips (2020)

How can I install torchtext?

NLP | Text Data Augmentation with PyTorch datasets

IterableDataset in PyTorch

|05| Pytorch - How to Pre-Process and Build a Custom Dataset

PyTorch - Transformer code walkthrough - Part 1 Theory

3.5 DataLoader Collate

PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training