One-Hot Encoding

preview_player
Показать описание
Рекомендации по теме
Комментарии
Автор

by size, he means magnitude. multiplying by 10 across all inputs would 'scatter' the inputs across a much larger area - think huge standard deviations - and so, the probabilities returned by softmax would tend towards the extremes. Dividing by 10 brings the inputs very close to each other and to the mean, thereby giving you a normal distribution of softmax probabilities.

kewlking
Автор

"Size of input"? We just multiplied the score values not size. Size would remain the same as classes, right?

ahsanshafiqchaudhry