## VC dimension, sample size required and the generalization bound

VC dimension, sample size required and the generalization bound

Starting with the Vapnik-Chervonenkis inequality:

\begin{align} P[|\nu - \mu| > \epsilon] \le 4B(2N, k)e^{-(1/8)\epsilon^2 N} \end{align}

And then simplify it a little bit, using only one polynomial term:

\begin{align} P[|\nu - \mu| > \epsilon] \le N^d e^{-(1/8)\epsilon^2 N} \end{align}

where $d$ is the VC dimension. Fixing $d$ at different levels and let

\begin{align} \rho = N^d e^{-(1/8)\epsilon^2 N} \end{align}

We then can plot $\rho$ against $N$ (the horizontal line is $\rho = 1$):

Using the log scale may be a little more convenient, we are only interested in $\rho < 1$, or $\log \rho < 0$:

Rearanging the terms:

Thus with probability $1-\delta$: $$\nu - \mu \le \Omega(N, H, \delta)$$ or $$\mu \ge \nu - \Omega(N, H, \delta)$$ giving a lower bound for our chance of getting things right.

You can also see it from a different angle, let $E_{out} = 1 - \mu$ and $E_{in} = 1 - \nu$ then $$(1 - E_{in}) - (1 - E_{out}) \le \Omega(N, H, \delta) \\ E_{out} - E_{in} \le \Omega(N, H, \delta)$$ or $$E_{out} \le E_{in} + \Omega(N, H, \delta)$$ which is an upper bound for our chance of getting things wrong. And this is called the generalization bound, the process of getting this bound is called VC analysis.