Machine Learning course- Shai Ben-David: Lecture 4

preview_player
Показать описание
CS 485/685, University of Waterloo. Jan 16, 2015.
Extensions of the definition of PAC learnability to various more realistic scenarios. The basic notion of representative samples
and how it implies the success of ERM learners (a.k.a. learning by uniform convergence).
Рекомендации по теме
Комментарии
Автор

11:32 Weaker notion of learner's success
16:57 Agnostic PAC
23:52 Bigger vs. Smaller class to learn
28:32 General learning setup (other learning tasks)
35:45 More general setup for learning
43:20 Binary & Multilabel prediction
46:30 Regression
49:31 Representating data by k-code words (k-means clustering)
56:36 Learning via Uniform Convergence
57:56 ε-representative sample ("good" sample)
1:04:52 ε-representative sample => ERM_H is a good learning strategy (Claim)

dhanajit
Автор

I like when Instructors use black board to present lectures than predominantly reading of a slide, it gives insight into how they are thinking which is extremely valuable. Thanks Prof.

satyajit
Автор

The assumption that in PAC learning there exist a perefect h*(realizability assumption) which is equal to f, is manisfested in the form that we only consider those h's whose Ls(h)=0, since f too has Ls(h)=0 and hence minimum requirement has to be that, now out of these we calculate probabaility of this happening which comes out to be (1-epsilon)^m, multiplied by the union bound which. comes out to be sum and even further whose upper bound is |H|, hence our probabaility of getting that error is maximum
(1-epsilon)^m*|H|.

thesleepyhead
Автор

1:14:11 I don't think it's true the way it's written. the h that minimizes L_s(h) is not necessarily the same as that that minimizes L_D(h), so we can't necessarily apply eps-representativeness. Best way is just to let h* be the minimizer on the loss L_D(h) and use that already in the third inequality (rather than the minimizer of L_s(h)).

samlaf
Автор

In 26:25, he mentions that the learning algorithm A returns any function from X to {0, 1}. However, in the book definition (3.4), it is stated that A generates a hypothesis within H. Intuitively, I would think that the latter is the correct one, since we're bounding it to that specific class of algorithms wrt to its minimum. Otherwise, I cannot see why couldn't we derive any hypothesis that would do better than the best among a certain class.

ArturDeluca
Автор

Drinking game: Take a shot every time the prof says 'what?'

martonkardos
Автор

Do we know all of H or do we know the best h among H?

satyajit
Автор

I have a question.
In section: ε-representative sample => ERM_H is a good learning strategy (Claim),
L_S(h_S) + ε less than min_{h \in H}L_S(h) + ε, and h_S \in H,
since ERM, so we can get the best predictor h_S, and h_S \in H, hence L_S(h_S) + ε less than min_{h \in H}L_S(h) + ε,
here, the min_{h \in H}L_S(h) is equal h_S? because ERM, so we can get the smallest loss on training set S, so the minimize loss h \in H on training set S is equal h_S, right?
or something i am wrong, please tell me, thanks.

shaoe
Автор

Isn't it S is € representative of D w.r.t H..? As S should represent D for some chosen H

ramchebrolu