Python to optimize Input DATA Pipeline | BERT Transformer Models

preview_player
Показать описание
Python TF2 code to optimize your Tokenizer and Vocabulary for your specific dataset. Pre-trained (BERT) NLP models are trained on a general set of documents, which will not provide good enough performance for your specific Deep Learning task (in NLP).

#code_in_real_time
#Tokenizer
#HuggingFace

00:00 Code your Tokenizers
03:58 Tokenization pipeline
06:20 Full service Tokenizer
09:15 Train a new Tokenizer
15:00 Fast Tokenizer
16:16 Encode your sentences with the new tokenizer
18:30 Use a pretrained tokenizer (with vocabulary)
Рекомендации по теме
Комментарии
Автор

this channel is a hidden gem. thank you for putting in the work. I'm really eager in learning about this stuff.

static