Sunday, November 22, 2015

Machine learning foundation 1

Elements

We have

  • Unknown target function (underlying the data)
  • Training examples (the data)
  • Hypothesis set (possible approximations to the target function)
  • Learning algorithm (for choosing the “best” from the hypothesis set)
  • Final hypothesis (result of learning)

Perceptrons

Hypothesis in the vector form

is a vector here.

Perceptron Learning Algorithm (PLA)

are vectors here.

  1. Start with an arbitrary , say .
  2. On the first case where , update like this: . This works because , or , or, which guarantees that the iteration of is in the direction of .
  3. Continue iterating until no mistake occurs for any in a full loop.

Questions about the PLA

  • Will it ever stop?
  • When is stops, will the result be close to the target function (unknown)?
  • Will it ever make a mistake (inside/outside your dataset)?

Linear separability

PLA does not always halt:

Solution guaranteed when the linear separable

Agreement between prediction and existing data can be summarized by:

, where is the "true and unknown" weight vector

Since always has the same sign as , this should be non-negative.
Hence existence of solution means that .

How do we know the in PLA will get close enough to ?

We can see the inner product gets larger as grows, this is a good sign, but we still need to check whether this is because they become more and more similar in direction or because becomes larger in magnitude.

Since \(t\) only grows when there is a mistake, we know \(2y_n w_t^T x_n \le 0\), hence

, where is the max magnitude of all possible . At least only limited growth is seen in .

If we assume \(w_0\) = 0, using telescope technique we can get
\[\|w_t\|^2 \le tR^2\]
, or
\[\|w_t\| \le \sqrt{tR^2}\]
If we let

, then

Using telescope collapsing we get .

To eliminate influence from magnitude we can normalize both \(w_f\) and \(w_t\) and check their inner product:

So, indeed, and are getting closer and closer to each other in direction. Let

.

And since converges to 1, we know eventually

So, not only do we know that PLA will find the solution (provided there is a solution), we also know it will find it in so many (finite) steps.

0 comments: