Python to optimize Input DATA Pipeline | BERT Transformer Models

Показать описание

Python TF2 code to optimize your Tokenizer and Vocabulary for your specific dataset. Pre-trained (BERT) NLP models are trained on a general set of documents, which will not provide good enough performance for your specific Deep Learning task (in NLP).

#code_in_real_time
#Tokenizer
#HuggingFace

00:00 Code your Tokenizers
03:58 Tokenization pipeline
06:20 Full service Tokenizer
09:15 Train a new Tokenizer
15:00 Fast Tokenizer
16:16 Encode your sentences with the new tokenizer
18:30 Use a pretrained tokenizer (with vocabulary)