Neural networks [10.7] : Natural language processing - hierarchical output layer

Показать описание

Hugo Larochelle

Рекомендации по теме

Комментарии

I have two questions:
1. How is the word clustering done ? What features were used to cluster the words ? Was it based on the original tree [randomly assigned words] which were clustered based on some word properties and the tree was updated recursively ?
2. Using the hierarchical representation of the output layer we get gain in performance if we know what words we need probabilities for, however for a task like next word prediction we would need probabilities on all words to identify the ideal candidates for the next words, in that case we will still be computing for all words. What are the cases in which the words, whose probabilities are needed, are known beforehand.

ShubhanshuMishra

Hi man, great video! Seriously, hierarchical softmax is explained on various places but this finally helps me grasp the intuition behind it. So thanks for that.

Still had a few questions:

1. As you mention this has no value if we need the probabilities for all words in our vocabulary. However, while training with backpropagation, that is the case (if we don't use negative sub-sampling), right? In my case I'm doing word2vec (so I'm only interested in the hidden layer representations, not so much in actually using the language model), so I figured speeding up things is only relevant for me when it concerns the training phase. So I'd say this technique wouldn't help me unless I start using negative sub-sampling?

2. You mention using WordNet speeds up x258 but decreases performance. What's the difference between speed and performance? Is it training versus operation?

3. I don't quite understand how different trees (random versus WordNet versus learned) can lead to different performances. In each case, the length of the path through the tree will be log(vocabulary size), right? So how come the performance is variant?

jasperdriessens

How do we know the correct path from root node to the leaf node? So that we can calculate the probabilities?

danishrathore

Isn't it a bit more accurate to define the estimated probability as p(context|"cat") instead of p("cat"|context) in slide 3, because the skip-gram task is defined as p(context|target_word)?

Hugo I have one question how this tree is constructed and how it is linked to neural network model

maheshkannan

questions:
1, Does this slow down prediction runtime from n to nlogn?
2, Why would the randomly generated tree be suboptimal?
3, In the third approach, wouldn't the tree still be suboptimal if the original word vector is not fully trained?

ThomasChen-urgt

Thank you for the video. I wonder what the internode vector looks like? Are they just randomly initialized with some random weights? For example in skip gram, if I input the center word and I know the 2 context words, how does the internal node vectors look?

iidtxbc

I have a question:
what is shortlist?why we use it

chen-ning

Hello dear Hugo, Thanks for your course,
I have some questions to help me understand the h-softmax better.
-So, in a typical softmax in each phse we tried to update the parameters according to the correct class and also incorrect class. So, according to this, you said that the calculating softmax is realy expensive (e.g. for 250, 000 classes). Then, you are supposed to approximate the probability. So, you are going to use b-tree to do the estimation, but in this way we just update params for correct class not incorrect class, is that right?

- and a tiny another question, if we have for example thw windows_size=5 then the input shape will be (5*250_000, 1) and the output will have a shape of (250_000, 1), is that right?

mahdiamrollahi

Any tutorials of people implementing this?

tamimazmain

Hugo,
1.How is the loss function calculated in hierarchical output layer ? In softmax case, the number of output units would be equal to the vocabulary so the actual value of each unit would be the prob of word occurring in that context ( we can calculate this by counts from training data). What would be the actual prob of each output unit (i.e. node in the tree ) here ?

2. For each context, will the number of output units (that light up) be equal to depth (d) of the tree ? Does this this mean that the actual number of output units ( ones that light up plus ones we dont calculate for that context ) be equal to 2^(d+1) − 1

gilsinialopez

Great video!! Do you have any pointers on multi-way branching hierarchical structure you mentioned in the end of the video?

allancici

Great Lecture !! I had a simple question.
Coud you explain why different tree architecture(wordnet, random....) leads to different performance? Thanks in advance !!

user

Neural networks [10.7] : Natural language processing - hierarchical output layer

Neural Network In 5 Minutes | What Is A Neural Network? | How Neural Networks Work | Simplilearn

Explained In A Minute: Neural Networks

But what is a neural network? | Chapter 1, Deep learning

Neural networks [10.7] : Natural language processing - hierarchical output layer

Natural Language Processing In 5 Minutes | What Is NLP And How Does It Work? | Simplilearn

Deep Learning | What is Deep Learning? | Deep Learning Tutorial For Beginners | 2023 | Simplilearn

What is a Neural Network | Neural Networks Explained in 7 Minutes | Edureka Rewind - 3

Computer Scientist Explains Machine Learning in 5 Levels of Difficulty | WIRED

Computational Linguistics in Bulgaria 2024: MAIN CONFERENCE (10 September 2024)

Neural Networks explained in 60 seconds!

What is a Neural Network | Neural Networks Explained in 7 Minutes | Edureka

What is a Neural Network | Neural Networks Explained in 7 Minutes | Edureka Rewind - 1

Neural networks [10.10] : Natural language processing - multitask learning

Why greatest Mathematicians are not trying to prove Riemann Hypothesis? || #short #terencetao #maths

NLP Demystified 10: Neural Networks From Scratch

TensorFlow 2.0 Complete Course - Python Neural Networks for Beginners Tutorial

State of the Art Neural Networks - Neural architecture search (NAS)

Artificial Neural Network Tutorial | Neural Networks | Natural Language Processing | Neuralink

Day in My Life as a Quantum Computing Engineer!

Gradient descent, how neural networks learn | Chapter 2, Deep learning

Introduction to Neural Networks with Example in HINDI | Artificial Intelligence

A friendly introduction to Deep Learning and Neural Networks

What is a Neural Network | Neural Networks Explained in 7 Minutes | Edureka Rewind

8 Tips on How to Choose Neural Network Architecture