Learn Bert - most powerful NLP by Google

Показать описание

What is BERT?

BERT has been the most significant breakthrough in NLP since its inception.
But what is it? And why is it such a big deal?

Let’s start at the beginning. BERT stands for Bidirectional Encoder Representations from Transformers. Still none the wiser?

Let’s simplify it.

BERT is a deep learning framework, developed by Google, that can be applied to NLP.

Bidirectional (B)

This means that the BERT framework learns information from both the right and left side of a word (or token in NLP parlance). This makes it more efficient at understanding context.

For example, consider these two sentences:

Jimmy sat down in an armchair to read his favorite magazine.

Jimmy took a magazine and loaded it into his assault rifle.

Same word - two meanings, also known as a homonym. As BERT is bidirectional it will interpret both the left-hand and right-hand context of these two sentences. This allows the framework to more accurately predict the token given the context or vice-versa.

Encoder Representations (ER)

This refers to an encoder which is a program or algorithm used to learn a representation from a set of data. In BERT’s case, the set of data is vast, drawing from both Wikipedia (2,500 millions words) and Google’s book corpus (800 million words).

The vast number of words used in the pretraining phase means that BERT has developed an intricate understanding of how language works, making it a highly useful tool in NLP.

Transformer (T)

This means that BERT is based on the Transformer architecture. We’ll discuss this in more detail in the next section.

Why is BERT so revolutionary?

Not only is it a framework that has been pre-trained with the biggest data set ever used, it is also remarkably easy to adapt to different NLP applications, by adding additional output layers. This allows users to create sophisticated and precise models to carry out a wide variety of NLP tasks.

BERT continues the work started by word embedding models such as Word2vec and generative models, but takes a different approach.

There are 2 main steps involved in the BERT approach:

1. Create a language model by pre-training it on a very large text data set.
2. Fine-tune or simplify this large, unwieldy model to a size suitable for specific NLP applications. This allows users to benefit from the vast knowledge the model has accumulated, without the need for excessive computing power.