Anti-Learning (So Bad, it's Good) - Computerphile

preview_player
Показать описание
How getting something completely wrong can actually help you out. Professor Uwe Aickelin explains anti-learning.

This video was filmed and edited by Sean Riley.

Рекомендации по теме
Комментарии
Автор

Reminds me of a joke from business school a professor once said. "If you find a financial advisor who consistently gets it wrong, hire him! Do exactly the opposite of what he says and you will get rich. The dangerous ones are the ones who are 50% right and 50% wrong"

ffanatic
Автор

Doesn't actually explain anti-learning. Explains how they thought of anti-learning, sure, but the closest we got to an actual explanation was "and then I flip it and it works"

robmckennie
Автор

You forgot to talk about anti-learning...

ZachBora
Автор

This is the plot to every Star Trek NG episode. "Mr. Crusher, reverse the shield polarity." "It's working!"

lmiddleman
Автор

As many others have pointed out in the comments, this video is quite uninformative as to what "Anti-Learning" is actually supposed to be. So I've looked at a couple of the papers of Aickelin and Roadknight and even though those are not particularly clear on what they're doing either, it seems like things are being misrepresented here.

So first of all, what they call "Anti-Learning" is not the reversal of the classifier's predictions, but the phenomenon that a classifier achieves less than chance performance on the test set (or sets, as they are cross-validating). They claim that instead of being a sign of overfitting to the training data, this situation arises because the structure of the population is such that many non-similar cases are summarized under a single label, such that the population ought to be misrepresented in the sample similar to a situation of XOR, where one of the four combinations (00, 01, 10, 11) is missing from the sample, which necessarily leads to an incorrect classification of new data points that show said combination.
They substantiate their distinction of anti-learning from overfitting by showing how, for a learnable data set, a neural network's performance on the training AND test set increase with an increase in flexibility of the model (more hidden nodes), but only up to the point where the test performance starts decreasing again (as the model starts overfitting), while for a data set that results in anti-learning, the test performance stays below 50% throughout, despite an increase in training set performance. (And for a random data set the performance on the test set stays around 50%).
(I don't really find this convincing. The classifier picks up on distinctions suggested by the sample that don't hold up in the general population - that's overfitting to me and this is just a particular case, where the sample is systematically unrepresentative.)

The classifiers they tried were not only linear classifiers, they used Bayesian Networks, Logistic Regression, Naive Bayes, Classification Tree Approaches, MLP and SVM (though I don't understand why this list doesn't include KNN), all of which performed poorly trained on sets of 35, 45 or 55 features. As stage I and IV could be classified reliably, their analyses focus on distinguishing stage II and III - so it's a binary classification problem. This is relevant, because their trick of flipping the classification wouldn't work otherwise. Reversing the predictions only makes sense when the reverse is actually specified (0 instead of 1 and vice versa), which it wouldn't be were there four classes.

Lastly, I don't see where the 78% accuracy he reports on come from, though. From their 2015 paper all I see is the accuracy they get when they use an ensemble of different classifiers (half of those are trained to perform well, while the other half is trained to perform terrible and gets reversed) and - and this is what he really should have mentioned if that's what he is talking about - they only get this higher accuracy in a subset of the cases, namely those where the respective ensemble agrees. So they get the highest accuracy (~90%) for those cases where all six classifiers give the same label, but that is also a very small subset of the sample (29 data points).

Kuchenrolle
Автор

Where's the part where he explains the method?
It's like having a book made of an introduction only :/

skit
Автор

The argument at the ending was so Bad, it was Good.

RSP
Автор

I expected an explanation of anti-learning, and all I got was “Just put the hay in the apple and eat the candle”.

Qbe_Root
Автор

... that was such an odd video, even as someone who has a fairly good grasp on XOR and logic and such, I'm left pretty confused by the whole thing. He took the wrong answer and he just reversed it? what?

atomheartother
Автор

Ugh. Video should be called "Computer science professor rediscovers nonlinear statistics and gets disproportionately excited about it."

jasondalton
Автор

So I work in this field... here's the gist of anti-learning using a football analogy. That's 'American football' to our UK friends.

Imagine that you have a friend, let's call him Alan, who grew up in New York and was a huge Bills fan. Your friend never played football in school, doesn't know anything about football strategy, and growing up he only watched games that the Bills were in. Your friend (slowly) notices that the Bills have a great running back, and that when they play against a team with a weak run defense, the Bills usually win. When they play against a team with a good run defense, they usually lose. Using this simple rule, he is able to correctly predict the outcome of Bills games 80% of the time. This is the modeling equivalent of getting a high accuracy score on the 'training' data set.

Your friend Alan then moves to Massachusetts and starts watching Patriots games. He applies the same theory about running backs and run defenses of opposing teams to try and predict the outcome of Patriots games. Because the Patriots rely heavily on the passing game, your friend's predictions are certifiably terrible, and he is only able to predict the outcomes of Patriots games correctly 30% of the time. This is the modeling equivalent of getting a low accuracy score on the 'test' (out of sample) data set.

Because of his incomplete picture of football dynamics, you notice that your friend has become very good at incorrectly predicting the winner of any football game in the NFL, because today the NFL is predominantly a passing league. In fact, for any given football game, your friend is wrong 80% of the time. Still, your friend Alan clings to his theory about running backs and run defense.

Now, it's the Superbowl, and you call up your friend and ask him his opinion of who is going to win. He gives you his opinion, and you bet on the other team.

That is the gist of anti-learning. In machine learning, anti-learning just means that the model is drawing incorrect conclusions from the data, for whatever reason (noisy and incomplete data, small sample size, over-fitting, etc.). By using anti-learning, you are trying to maximize the rate at which the model draws these wrong conclusions, so as to get the worst model possible. Then you simply invert the outcome, and voila!--you have your prediction.

erikm
Автор

Basically the machine was so bad at guessing that they had to start asking 'what don't you think it is?'

MrZebth
Автор

My understanding is that in this unique case is is actually quite easy to make a function that plots the wrong answers. When you check data in this case, you are checking to see if it does not fit in the function.

jeremyluna
Автор

So kinda like how the choice between doors on numberphile where you reject the first door you pick and pick a different one.

GingerJack.
Автор

This is simple. Data doesnt always fit in a particular model (categorization), so by process of elimination, you find out what model/relationship remains, which best fits your data. Anti-learning is esentially knowing you'll get an answer wrong, but using that answer as part of an overall strategy towards finding the solution.

dbsirius
Автор

I was trying to understand how the XOR example applies to what he's saying, and I don't really get it.

If you can draw a line that separates the goods and the bads, then you've solved it: anything above the line is good, below is bad. But you can't do that for XOR. So, he drew a line through the middle. This solution is "wrong" in that it's not a good solution, but not in that it's always wrong. If it was always wrong, you could do what he said. It seems like what he's done, based on that visual, is found a "solution" that is right 50% of the time and wrong 50% of the time. So, reversing it is the same and not reversing it. That's how I understood his example to apply to what he was saying. Is this not correct?

joealias
Автор

wait, you're a professor in data science/machine learning, and you look at this data for 2 years and never even think about checking whether it looks like it isn't linearly separable? wth?

Xylankant
Автор

How do they "reverse" it? The REVERSAL of XOR is XOR.

miri
Автор

Am I the oney one who shudders at the waste of paper? He's only drawing at every other sheet. And his examples are simple enough that you could fit many of them onto a single one.

KennethSorling
Автор

I might just be being dumb, but I don't really understand what he meant by reversing the wrong answer. If the wrong answer is a horizontal line, what's the reverse of that?

ButzPunk