VC dimension, sample size required and the generalization bound
Starting with the Vapnik-Chervonenkis inequality:
\begin{align} P[|\nu - \mu| > \epsilon] \le 4B(2N, k)e^{-(1/8)\epsilon^2 N} \end{align}And then simplify it a little bit, using only one polynomial term:
\begin{align} P[|\nu - \mu| > \epsilon] \le N^d e^{-(1/8)\epsilon^2 N} \end{align}where $d$ is the VC dimension. Fixing $d$ at different levels and let
\begin{align} \rho = N^d e^{-(1/8)\epsilon^2 N} \end{align}We then can plot $\rho$ against $N$ (the horizontal line is $\rho = 1$):
Using the log scale may be a little more convenient, we are only interested in $\rho < 1$, or $\log \rho < 0$:
Rearanging the terms:
Thus with probability $1-\delta$: $$\nu - \mu \le \Omega(N, H, \delta)$$ or $$\mu \ge \nu - \Omega(N, H, \delta)$$ giving a lower bound for our chance of getting things right.
You can also see it from a different angle, let $E_{out} = 1 - \mu$ and $E_{in} = 1 - \nu$ then $$ (1 - E_{in}) - (1 - E_{out}) \le \Omega(N, H, \delta) \\ E_{out} - E_{in} \le \Omega(N, H, \delta) $$ or $$ E_{out} \le E_{in} + \Omega(N, H, \delta) $$ which is an upper bound for our chance of getting things wrong. And this is called the generalization bound, the process of getting this bound is called VC analysis.
0 comments:
Post a Comment