Sentencepiece Tokenizer With Offsets For T5, ALBERT, XLM-RoBERTa And Many More

preview_player
Показать описание
In this video I show you how to use Google's implementation of Sentencepiece tokenizer for question and answering systems. We will be implementing the tokenizer with offsets for albert that you can use with many different transformer based models and changing the data processing function learned from previous tutorials.

If you are not familiar with previous videos, watch these:

The code implemented in this video can be found here:

Follow me on:
Рекомендации по теме
Комментарии
Автор

Thank you for this tutorial, i figure out the offset with albert and i found this video from kaggle discussion

nanto-x
Автор

So thanks for teaching the method of using this tool, easy and useful.

yaoyuwang
Автор

Hello Abhishek, learned a lot from you videos as well as from you kaggle kernels, thank you for that,

What should one do if they want to practice to write high level code like yours or want to implement some papers.. and how to read the code of the implementation of papers, since i find it quite complicated. Can you make video on that or an post on LinkedIn/kaggle will work too..

adityadhookia
Автор

Please can cover on buliding ML model which can be deployed at scale, as you say real world scenario is different than competition or you can point to some material which covers this topic:)

aadeshdeshmukh
Автор

Super thanks for T5 tokenizer ;) ;) :)

parthchokhra
Автор

much appreciate this video about the implementation. do you plan to cover the Sentencepiece paper in detail in any subsequent video. curious why leverage this tokenizer vs any other existing tokenizer say spacy tokenizer.

atinsood
Автор

Would you be able to cover multi-class classification using XLM-Roberta?

sunderrajan
Автор

Hi Abhishek, i am having a small doubt while using sentence piece tokenizer from google. I wanted to try T5 using offset but i found token id are different for the same word if i use t5 tokenizer instead. and also if you share where do we get these kind of information what preprocessing to use for particular tasks involving transformers that would definitely help. Anyway love your videos kudos for that.

Papapancho
Автор

Thanks for sharing it, is there any pre-trained model to recognize handwritten text or can u suggest some material or links on building a model on handwrittten text(ICR), i tried OCR but it is not giving good results on handwritten text.pls reply to my query:)

uthamkanth
Автор

hi Abhishek before starting anything can you please describe the problem statement that you are trying to solve?

souravghosh
Автор

Hi Abhishek, I am Pawan. I am a intern as a machine learning engineer. I I want to connect with you in order to understand the problem i am facing in cleaning of data that i m passing to my model.

I have issues with sequencing of contours that i have made on forms as the size of the contours are different. the sequencing with x axis is taking contours with bigger size first but i want to sequence the whole name of customer in a line irrespective of the size of characters. can i have your thoughts on this problem.
i have also tried connecting with you on linkedin. Thankyou

Pawan_Sharmaa
Автор

Can you cover an episode regarding Semantic Textual Similarity using T5? Thanks!

mathematicalninja