Language Model Evaluation and Perplexity

preview_player
Показать описание


Transcript:

In this video I'll show you how to evaluate a language model. The metric for this is called perplexity and I will explain what this is. First, you'll divide the text corpus into train validation and test data, then you will dive into the concepts of perplexity an important metric used to evaluate language models. So, how can you tell how well your language model is performing? Recall from the previous videos that a language model assigns a probability to each sentence. The model was trained on the corpus. So for the training sentences, it may assign very high probabilities. You should therefore first split the corpus to have some testing and validation data that are not used for the training. As you may have done in the other machine learning projects, you'll create the following splits of training validation and test sets. The training set is used to train your model. The validation set is used for things like tuning hyper-parameters, and the test set is held out for the end. Where you test it once and get an accuracy score that reflects how well your model performs on unseen data.
Рекомендации по теме
Комментарии
Автор

00:00 - introduction and outline
00:24 - splitting the corpus
01:29 - splitting methods
01:53 - perplexity metric
03:23 - perplexity examples
04:32 - perplexity for bigram models
05:16 - log perplexity; typical values for log perplexity
05:50 - texts generated by models with different perplexity

ДаниилИмани
Автор

I believe there might be an issue with the perplexity formula. How can we refer to 'w' as the test set containing 'm' sentences, denoting 'm' as the number of sentences, and then immediately after state that 'm' represents the number of all words in the entire test set? This description lacks clarity and coherence. Could you please clarify this part to make it more understandable?

boussouarsari
Автор

what does "normalized by number of words " in the definition of perplexity mean ?

karangadgil
Автор

If I use the gpt2 model to predict protein sequences, is the perplexity is enough to evaluate the model? Should I use regular perplexity or bi-gram perplexity?

thelastone
visit shbcf.ru