ebook img

Arti cial Neural Networks - Ronan Collobert PDF

34 Pages·2011·7.6 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Arti cial Neural Networks - Ronan Collobert

Artificial Neural Networks Ronan Collobert [email protected] Introduction: Neural Networks in 1980 2 Introduction: Neural Networks in 2011 x W1 tanh( ) W2 score × • • × • Stack matrix-vector multiplications interleaved with non-linearity Where does this come from? How to train them? Why does it generalize? What about real-life inputs (other than vectors x)? Any applications? 3 Biological Neuron Dendrites connected to other neurons through synapses Excitatory and inhibitory signals are integrated If stimulus reaches a threshold, the neuron fires along the axon 4 McCulloch and Pitts (1943) Neuron as linear threshold units Binary inputs x 0, 1 d, binary output, vector of weights w Rd ∈ { } ∈ (cid:26) 1 if w x > T f(x) = · 0 otherwise A unit can perform OR and AND operations Combine these units to represent any boolean function How to train them? 5 Perceptron: Rosenblatt (1957) wx+b=0 Input: retina x Rn ∈ Associative area: any kind of (fixed) function ϕ(x) Rd ∈ Decision function: (cid:26) 1 if w ϕ(x) > 0 f(x) = · 1 otherwise − Training: minimize (cid:80) max(0, yt wt ϕ(xt)), given (xt, yt) Rd 1, 1 t − · ∈ × {− } (cid:26) t t t t y ϕ(x ) if y w ϕ(x ) 0 t+1 t w = w + · ≤ 0 otherwise 6 Perceptron: Convergence (Novikoff, 1962) ∆ Cauchy-Schwarz (ρ = 2/ u )... max || || t t u w u w · ≤ || || || || 2 Assuming classes t w ≤ ρ || || max are separable u defines maximum 1 = margin separating hyperplane... x u R 0 x = u wt = u wt 1 + yt u xt u −1 · · − · = t 1 x u w + 1 u − ≥ · t ≥ When we do a “mistake”... t 2 t 1 2 t t 1 t t 2 w = w + 2y w x + x − − || || || || · || || 2/||u|| t 1 2 2 w + R − ≤ || || 2 t R ≤ We get: 2 4 R t 2 ≤ ρ max 7 Adaline: Widrow & Hoff (1960) Problems of the Perceptron: (cid:63) Separable case: does not find a hyperplane equidistant from the two classes (cid:63) Non-separable case: does not converge Adaline (Widrow & Hoff, 1960) minimizes 1 (cid:88) t t t 2 (y w ϕ(x )) 2 − · t Delta rule: t+1 t t t t t w = w + λ(y w x ) x − · 8 Perceptron: Margin See (Duda & Hart, 1973), (Krauth & M´ezard, 1987), (Collobert, 2004) Poor generalization capabilities in practice No control on the margin: 2 ρ max ρ = wT ≥ R2 || || (cid:80) t t t Margin Perceptron: minimize max(0, 1 y w ϕ(x )) t − · (cid:26) t t t t y ϕ(x ) if y w ϕ(x ) 1 t+1 t w = w + λ · ≤ 0 otherwise Finite number of updates: 4 2 2 t ( + R ) ≤ ρ2 λ max Control on the margin: 1 ρ ρ max 2 ≥ 2 + R λ 9 Perceptron: In Practice Original Perceptron (10/40/60 iter) 6 6 6 4 4 4 2 2 2 0 0 0 −2 −2 −2 −4 −4 −4 −6 −6 −6 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 Margin Perceptron (10/120/2000 iter) 6 6 6 4 4 4 2 2 2 0 0 0 −2 −2 −2 −4 −4 −4 −6 −6 −6 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 10

Description:
Introduction: Neural Networks in 2011 x W 1 tanh( ) W 2 score Stackmatrix-vector multiplicationsinterleaved withnon-linearity Where does this come from?
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.