Multiclass Learnability and the ERM Principle Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz TheHebrewUniversityofJerusalem COLT 2011 Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 1/24 Surprise: The fundamental theorem is not true in the multiclass case. Then how can we learn in the multiclass setting? The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 Then how can we learn in the multiclass setting? The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Surprise: The fundamental theorem is not true in the multiclass case. Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 The Fundamental Theorem of Learning Theory Uniform ERM works Learnability Convergence Surprise: The fundamental theorem is not true in the multiclass case. Then how can we learn in the multiclass setting? Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 2/24 Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24 Setting X - instances, Y - labels, H ⊂ YX - hypotheses. Given a distribution D, err(h) = P [h(x) (cid:54)= y] (x,y)∼D S = {(X ,Y ),...,(X ,Y )} – an i.i.d. sample. Define 1 1 m m #{i : h(X ) (cid:54)= Y } i i err (h) = S m PAC Algorithm: For m ≥ m ((cid:15),δ), w.p. ≥ 1−δ, A err(A(S)) ≤ minerr(h)+(cid:15) h∈H ERM Algorithm: err (A(S)) is minimal. S For simplicity: assume the realizable case: ∃h∗ ∈ H, err(h∗) = 0 Danielyetal (HebrewU) MulticlassLearnabilityandtheERMPrinciple COLT’11 3/24
Description: