Lesson 8 - Deep Learning for Coders (2020)

preview_player
Показать описание

We finish this course with a full lesson on natural language processing (NLP). Modern NLP depends heavily on *self-supervised learning*, and in particular the use of *language models*.

Pretrained language models are fine-tuned, in order to benefit from transfer learning. Unlike computer vision, fine-tuning in NLP can take advantage of an extra step, which is the use of self-supervised learning on the target dataset.

Before we can do any modeling with text data, we first have to tokenize and numericalize it. There are a number of approaches to tokenization, and which you choose will depend on your language and dataset.

NLP models use the same basic approach of *entity embedding* that we've seen before, except that for text data it's called `word embedding`. The method, however, is nearly identical.

NLP models have to handle documents of varying sizes, so they require a somewhat different architecture, such as a *recurrent neural network* (RNN). It turns out that an RNN is basically just a regular deep net, which has been refactored using a loop.

However, simple RNNs suffer from exploding gradients, so we have to use methods such as the LSTM cell to avoid this problem.

Finally, we look at some tricks to improve the results of our NLP models, such as additional regularization approaches, including various types of *dropout*, and activation regularization, as well as looking at weight tying.
Рекомендации по теме
Комментарии
Автор

00:00:00 - Intro and NLP Review
00:01:31 - Language models for NLP
00:04:36 - Review of text classifier in Lesson 1
00:05:08 - Improving results with a domain-specific language model
00:05:58 - Language model from scratch
00:10:27 - Tokenisation
00:12:19 - Word tokeniser
00:17:38 - Subword tokeniser
00:21:21 - Question: how can we determine if pre-trained model is suitable for downstream task?
00:23:25 - Numericalization
00:25:43 - Creating batches for language model
00:29:24 - LMDataLoader
00:31:07 - Creating language model data with DataBlock
00:33:23 - Fine-tuning a language model
00:35:07 - Saving and loading models
00:36:44 - Question: Can language models learn meaning?
00:37:56 - Text generation with language model
00:39:51 - Creating classification model
00:41:04 - Question: Is stemming and lemmatisation still used in practice?
00:42:21 - Handling different sequence lengths
00:45:30 - Fine-tuning classifier
00:48:54 - Questions
00:51:52 - Ethics and risks associated with text generation language models
00:56:22 - Language model from scratch
00:56:52 - Question: are there model interpretability tools for language models?
00:58:11 - Preparing the dataset for RNN: tokenisation and numericalization
01:03:35 - Defining a simple language model
01:04:49 - Question: can you speed up fine-tuning the NLP model?
01:05:44 - Simple language model continued
01:14:41 - Recurrent neural networks (RNN)
01:18:39 - Improving our RNN
01:19:41 - Back propagation through time
01:22:19 - Ordered sequences and callbacks
01:25:00 - Creating more signal for model
01:28:29 - Multilayer RNN
01:32:39 - Exploding and vanishing gradients
01:36:29 - LSTM
01:40:00 - Questions
01:42:23 - Regularisation using Dropout
01:47:16 - AR and TAR regularisation
01:49:09 - Weight tying
01:51:00 - TextLearner
01:52:48 - Conclusion

lextmb
Автор

Big thanks to Jeremy, Rachel, Sylvain, and Alexis for creating such a well-made book/video series that introduces deep learning to those with a coding background! You all make the content so interesting and straightforward; I'm excited to learn more!

davidbyron
Автор

To all FastAI team, thank you so much for creating such quality material. I love how practical the course approach of "playing the game first". I will adopt this to my future learning

Bestietvcute
Автор

Thanks Jeremy, Rachel, Sylvain, and Alexis. Brilliantly put together.

snowwhitei
Автор

I cant believe i actually made it to the end! It took me a year but i made it! Thanks so much fastai team. Great course

jmac
Автор

amazing lesson :) perfect introduction and dive into nlp, i wanted to get started with it for a while.

johanneslaute
Автор

Love the content! When will the second half of the course be held?

daviddeng
Автор

Awesome stuff, thanks Jeremy! All in all this course is amazing but bear in mind that it takes more self-discipline and engagement compared to Coursera's course.
I'd also say that Coursera is more suited to high-school kids and students as well. But for anybody switching careers, and who is already experienced, I'll be recommending this course from now on. Great teaching methodology - resonated perfectly with how I learn. Lots of examples, contextual and top-down.

TheAIEpiphany
Автор

Amazing course. Thanks for all the thoughtful content and looking forward to part 2!

SP-dszw
Автор

Mistake regarding 41:00 - stemming is not something that removes the stem, it's a technique to reduce a word to it's more primordial form, removing suffixes and prefixes.

mindasb
Автор

Will there be a 2020 version of the image segmentation lesson this year?

cullenharris
Автор

Thanks for this great course! What would be covered in Part 2?

LiangyueLi
Автор

correction (11:00) in Polish generally the words are not glued together into one,
at least not to the same extreme like in Dutch or German,
please compare:

English: electricity production company
Polish: firma produkująca energię elektryczną
vs.
Dutch:
German: Stromerzeugungsunternehmen

I dont know about Turkish, but from what I googled it also seems a wrong example

adrianstaniec
Автор

The language model has seen and consumed the text of the test set (and also the validation set) during training. Can that be a factor affecting the accuracy of the movie review classifier later on? Shouldn't the language model be trained only on the text of the train split?

bikashg
Автор

A good course to learn the basics of ML.🍠🌍🌝🎰🥇

dsm
Автор

I don't really think you've beaten what you got in lesson one, validation loss being less than training loss basically means that your validation set and training set are either too similar to each other or not distributed in similar way, that's a data leak which result in overfitting and this model in production would 100% perform worse

michapodlaszuk