Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

Показать описание

In this video I show a more real example of how it might be where you would have collected your own data. Specifically we have machine translation task where we have two text files in english and german respectively and I show how to go from loading the text files to splitting it into training and test set to saving it as json and csv format. When we have it in JSON and CSV format the task becomes easy and the things we went through in tutorial 1 & 2 gets applied!

Dataset can be found here:

Resources I used to learn about torchtext:

❤️ Support the channel ❤️

Paid Courses I recommend for learning (affiliate links, no extra cost for you):

✨ Free Resources that are great:

💻 My Deep Learning Setup and Recording Setup:

GitHub Repository:

✅ One-Time Donations:

▶️ You Can Connect with me on:

Рекомендации по теме

Комментарии

Your explanation is so clear to me even my English is so poor!

changgengwei

very thankful for this useful stuff i was needing in my nlp assignment. thanks a tonnn

jayantpriyadarshi

Hello,

Your videos are very informative. I love your videos and learnt a lot especially designing Neural Network Architectures from Scratch. Thank you very much.

pratikhmanas

Great tutorial but you should definitely be using a context manager there!

AiCore

Excellent Content Thank you kindly.
A quick question, I have a unique text dataset that would require a specialized vocabulary and numericalization, am I able to call the Field method in the same way or must I build the vocabulary and numericalize manually save the numerical values to a csv file and then feed to the model?
Thank you

jesse-mikael

Can you do a new video with the iterable style datatypes since they removed the Field and tabular dataset.

JackJX

Does anyone know how to unpack a torchtext Batch object so you can handle the input and target tensors separately?

maxrutc

Can you explain why the padding is done within a batch and not for the entire data? I noticed that different batches will have different sentence length + padding. Could that distort the training and fitted model given the differing lengths? Or it doesn't matter.

orjihvy

What does TablularDataset exactly do? Haven't we already separated the data into train and test using train_test_split earlier in the code?

orjihvy

Can you show me how to implement the skip-gram model using your dataset?

shrimonmukherjee

The difference between Iterator and BucketIterator is that the latter does a torchtext.data.pool, which will shuffle batches between epochs.

jeremiahjohnson

What about padding? It is also important that we get same padding for all texts. 'PRE' or 'POST' padding? How can we use that in pytorch?

potdish

Great tutorial with a vivid explanation. But I was trying to load another language[Tigrinya] using spacy and it didn't work for me. I could really use your help.
thanks

milkiasbekana

Hello,
What if the dataset I use is too large to load into memory at once?
Is it feasible to split the dataset into shards as is done in OpenNMT-py?
Are there any other alternatives?

Thanks !

yangxiang

Can you please implement the article 'Structural Scaffolds for citation Intent classification in scientific publication'?

feravladimirovna

If I have one text-file with two languages separated by tab then how to apply this? and if my translation language don't have a spacy model then how do build a dataset?

zawadtahmeed

Hi ! Could you please provide the link to the used dataset?

feravladimirovna

Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

What is Torchtext?

TorchText | PyTorch Developer Day 2020

Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Torchtext

3. The dataset class in PyTorch

How to build custom Datasets for Text in Pytorch

Pytorch Torchtext Tutorial 2: Built in Datasets with Example

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

PyTorch untuk Pemula : Ep. 3 Kelas Linear

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Let's build GPT: from scratch, in code, spelled out.

Building a vocab in torchtext

PyTorch Transformations Tutorial | Rotations and Flips (2020)

3 Installing the Latest Versions of Torch and Torchvision

How can I install torchtext?

PyTorch Programming - Datasets

Encoding a Feature Vector for PyTorch Deep Learning (4.1)

NLP | Text Data Augmentation with PyTorch datasets

How You can EASILY create Custom Datasets and Loaders!

Code Walk Thru: PyTorch Embeddings Tutorial 01

Custom Datasets & Dataloaders - Pytorch Learning 1

PyTorch Datasets, DataLoaders and Transforms (PyTorch w/ GPU series, part 3)

3.5 DataLoader Collate