## Elements

We have

• Unknown target function (underlying the data)
• Training examples (the data)
• Hypothesis set (possible approximations to the target function)
• Learning algorithm (for choosing the “best” from the hypothesis set)
• Final hypothesis (result of learning)

## Perceptrons

### Hypothesis in the vector form

$x$ is a vector here.

### Perceptron Learning Algorithm (PLA)

$w, x$ are vectors here.

1. Start with an arbitrary $w_0$, say $\textbf{0}$.
2. On the first case where $\text{sign}(w_t^T x_n) \ne y_n$, update $w$ like this: . This works because , or , or, which guarantees that the iteration of $w$ is in the direction of $y_n$.
3. Continue iterating until no mistake occurs for any $(x_n, y_n)$ in a full loop.

• Will it ever stop?
• When is stops, will the result be close to the target function (unknown)?
• Will it ever make a mistake (inside/outside your dataset)?

### Linear separability

PLA does not always halt:

### Solution guaranteed when the linear separable

Agreement between prediction and existing data can be summarized by:

, where $w_f$ is the "true and unknown" weight vector

Since $w_f^T x_n$ always has the same sign as $y_n$, this should be non-negative.
Hence existence of solution means that $\min_n y_n w_f^T x_n > 0$.

How do we know the $w_t$ in PLA will get close enough to $w_f$?

We can see the inner product $w_f^T w_t$ gets larger as $t$ grows, this is a good sign, but we still need to check whether this is because they become more and more similar in direction or because $w_t$ becomes larger in magnitude.

Since $t$ only grows when there is a mistake, we know $2y_n w_t^T x_n \le 0$, hence

, where $R^2$ is the max magnitude of all possible $\|x_n\|^2$. At least only limited growth is seen in $w_t$.

If we assume $w_0$ = 0, using telescope technique we can get
$\|w_t\|^2 \le tR^2$
, or
$\|w_t\| \le \sqrt{tR^2}$
If we let

, then

Using telescope collapsing we get .

To eliminate influence from magnitude we can normalize both $w_f$ and $w_t$ and check their inner product:

So, indeed, $w_f$ and $w_t$ are getting closer and closer to each other in direction. Let

.

And since $\frac{w_f}{\|w_f\|} \cdot \frac{w_t}{\|w_t\|}$ converges to 1, we know eventually

So, not only do we know that PLA will find the solution (provided there is a solution), we also know it will find it in so many (finite) steps.