Multi-Label Classification on Unhealthy Comments - Finetuning RoBERTa with PyTorch - Coding Tutorial

Показать описание

A practical Python Coding Guide - In this guide I train RoBERTa using PyTorch Lightning on a Multi-label classification task. In particular the unhealthy comment corpus - this creates a language model that can classify whether an online comment contains attributes such as sarcasm, hostility or dismissiveness.

---- TUTORIAL NOTEBOOK
remember to press copy to drive to save a copy of the notebook for yourself

Intro: 00:00:00
Video / project outline: 00:00:27
Getting Google Colab set up: 00:02:00
Imports: 00:03:23
Inspect data: 00:07:05
Pytorch dataset: 00:11:15
Pytorch lightning data module: 00:27:08
Creating the model / classifier: 00:35:45
Training and evaluating model: 01:07:30

This series attempts to offer a casual guide to Hugging Face and Transformer models focused on implementation rather than theory. Let me know if you enjoy them! Will be doing future videos on computer vision if that is something people are interested in, let me know in the comments :)

----- Research material for theory

rupert ai

Рекомендации по теме

Комментарии

Great video! Never heard of Pytorch Lightning before. It looks really useful!

vincentcoulombe

Fantastic tutorial Rupert! Thank you for putting this together. I was wondering if you might spend some time demonstrating how to save and load the state of the model for inference and if possible recover the model state that had the lowest loss on the validation set before overfitting started to creep in?

MattRosinski

Hey, something that would really be helpful is to illustrate how to use a pre-trained model on a single use case, i.e. Utilize the model to classify a single comment according to the attributes.

Thanks again for a very helpful video! Best wishes from across the Atlantic

sune

This guide is really helpful! Your explanations are very easy to understand and everything flows very smoothly.
In next videos, could you please include a little pop-up recording of yourself at the side of the screen like in the previous one? Makes it easier to maintain the focus and listen more carefully.
Also would be great if the volume was higher.
Other than that, phenomenal man!

HeadshotComing

Great tutorial, thanks for this. Like your style and setup. Future videos could be made better by sharing a viewable link to the Collab Notebook. I kept having to rewind/go back to find mistakes in my code compared to yours and having a Notebook I could pull up would go a long way.

SuirouNoJutsu

@1:12:33 How does classify_raw_comments know to create predictions on the validation data (and not the training data) if ucc_data_module contains both the train_dataset and val_dataset? You're passing "ucc_data_module" to the datamodule parameter, however both train_dataset and val_dataset were created from the setup() method

dimabear

Hi Rupert! thanks for putting this together.

mytabby

Can you add the datasets like 'ucc_train.csv', 'ucc_val.csv', and 'ucc_test.csv' to the repo? The colab notebook cannot copy the datasets from your drive. Thanks.

ppeng

Incredibly good tutorial, but I'm using my own dataset and it's been hanging at the [00:00<?, ?it/s] for a while now, is it a possibility that it's still doing the embedings or should I see if something is wrong? (I have 50k training examples)

sniffersmc

Thank you for the amazing tutorial. For my use case I have three target labels '0', '1' '2' where 0 is neutral 1 is positive and 2 is negative. I wont be able to use a BCE loss can I? what might be the alternative for it?

mahmudhasan

There are some new information i didnt know, especially the difference between hugging face version VS RoBERTa!

I stuck into making multi label claasification model. I learned a lot from your video!!! THANKS

선형소수

Fantastic Tutorial, I have one question- how to predict a single text after training

TanveerAhmed-kneh

This is a great tutorial! I have been able to replicate your idea using my own data. The question I have now is how to go from an input string like “I love this movie!” to a set of predicted labels and their confidence scores (i.e. probabilities).??

mjc

Hi Rupert! Fantastic tutorial. Your videos are excellent and really helpful!. I would like to know how to export the trained model. Do you know how it can be done? Thanks!!

vanesamena

Great tutorial Rupert. Can we make a multilabel model with around 0.1M labels like Skill extraction from the given job postings. More clearly, can we create a BERT model which can classify a job postings based on the skills it contains? Rows : job postings, Cols: Skill-labels ?

gauravdev

Did you let the whole model train? What were the considerations between training only the last layers vs letting everything go training?

nadavge

Great work! Do you have a colab/gitlab with the code?

soccihighdigger

Hi Rupert, thanks for this nice training video. I have been trying to use your work on my data but I get an error. Is it possible to share the error with you?

majidafra

Excellent video. Please make a video on codeBert finetuning

Mst.EshitaKhatun-yu

The video is great thanks. I have a question: this example can be generalized to the case of a problem with many classes? let say 400?

alessiogarau

Multi-Label Classification on Unhealthy Comments - Finetuning RoBERTa with PyTorch - Coding Tutorial

Multi-Label Classification on Unhealthy Comments - Finetuning RoBERTa with PyTorch - Coding Tutorial

MULTI-LABEL TEXT CLASSIFICATION USING 🤗 BERT AND PYTORCH | BERT Longformer MODEL

Multi-label classification of Foods with DenseNet using keras with python

Multi-Label Classification Accuracy Made Easy: A Step-by-Step Tutorial

MULTI-LABEL TEXT CLASSIFICATION USING 🤗 BERT AND PYTORCH | BERT ROBERTA MODEL

Multi-label Classification of PubMed Articles

Fine-Tuning BERT with HuggingFace and PyTorch Lightning for Multilabel Text Classification | Dataset

MULTI-LABEL TEXT CLASSIFICATION USING 🤗 BERT AND PYTORCH

PLM Partial Label Masking for Imbalanced Multi label Classification

Multilabel Toxic Comment Detection and Classification

Muliclass Multilabel Classification with python | Machine Learning | Data Magic AI

Hierarchical Multi-Label Classification System using Support Vector Machine

Deep Learning in Medical Imaging: Multi-label Classification with PyTorch | Hands-on Demo

Multi-label classification

Multi-label classification

MULTI-LABEL TEXT CLASSIFICATION USING 🤗 BERT AND PYTORCH | BERT BASE UNCASED MODEL

Multi label text classification In Machine Learning - 3

Paper ID 12 - Multi label classification of feedbacks

Create Multi-label Image Classifier in 1 Notebook 4 Minutes

Predicting multilabel probabilities - MLPClassifier

Toxic Comment Classifier Using Naive Bayes and LSTM

Multi-label Classification

MULTI-LABEL TEXT CLASSIFICATION USING 🤗 BERT AND PYTORCH | PYTORCH LIGHTNING

Imbalanced Multi-Label Classification | Balanced Weights May Not Improve Your Model Performance