## De Méré's puzzle

import scala.collection.mutable.ArrayBuffer

object DeMere {
var sum11, sum12 = ArrayBuffer.empty[Array[Int]]
for{
i <- 1 to 6
j <- 1 to 6
k <- 1 to 6
} {
val s = i + j + k
s match {
case 11 => sum11 += Array(i, j, k)
case 12 => sum12 += Array(i, j, k)
case _ =>
}
}
val total = math.pow(6, 3)
sum11.length / total
sum12.length / total
}


## Hoefding’s inequality

Sample mean is unlikely to far from true mean when sample size is large.

, where $\nu$ is the sample mean and $\mu$ is the true mean. In other words, $\nu$ is probably approximately correct (PAC).

When applied to machine learning, this means:

## Elements

We have

• Unknown target function (underlying the data)
• Training examples (the data)
• Hypothesis set (possible approximations to the target function)
• Learning algorithm (for choosing the “best” from the hypothesis set)
• Final hypothesis (result of learning)

## Perceptrons

### Hypothesis in the vector form

$x$ is a vector here.

### Perceptron Learning Algorithm (PLA)

$w, x$ are vectors here.

1. Start with an arbitrary $w_0$, say $\textbf{0}$.
2. On the first case where $\text{sign}(w_t^T x_n) \ne y_n$, update $w$ like this: . This works because , or , or, which guarantees that the iteration of $w$ is in the direction of $y_n$.
3. Continue iterating until no mistake occurs for any $(x_n, y_n)$ in a full loop.

• Will it ever stop?
• When is stops, will the result be close to the target function (unknown)?
• Will it ever make a mistake (inside/outside your dataset)?

### Linear separability

PLA does not always halt:

### Solution guaranteed when the linear separable

Agreement between prediction and existing data can be summarized by:

, where $w_f$ is the "true and unknown" weight vector

Since $w_f^T x_n$ always has the same sign as $y_n$, this should be non-negative.
Hence existence of solution means that $\min_n y_n w_f^T x_n > 0$.

How do we know the $w_t$ in PLA will get close enough to $w_f$?

We can see the inner product $w_f^T w_t$ gets larger as $t$ grows, this is a good sign, but we still need to check whether this is because they become more and more similar in direction or because $w_t$ becomes larger in magnitude.

Since $t$ only grows when there is a mistake, we know $2y_n w_t^T x_n \le 0$, hence

, where $R^2$ is the max magnitude of all possible $\|x_n\|^2$. At least only limited growth is seen in $w_t$.

If we assume $w_0$ = 0, using telescope technique we can get
$\|w_t\|^2 \le tR^2$
, or
$\|w_t\| \le \sqrt{tR^2}$
If we let

, then

Using telescope collapsing we get .

To eliminate influence from magnitude we can normalize both $w_f$ and $w_t$ and check their inner product:

So, indeed, $w_f$ and $w_t$ are getting closer and closer to each other in direction. Let

.

And since $\frac{w_f}{\|w_f\|} \cdot \frac{w_t}{\|w_t\|}$ converges to 1, we know eventually

So, not only do we know that PLA will find the solution (provided there is a solution), we also know it will find it in so many (finite) steps.

## Use apache spark in intellij

Add this line in your build.sbt:

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2"


Do something like this:

object Script3 {
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
// local[4] means that you want spark to run locally with 4 threads
// you can use a cluster when your app is production ready, of course
val conf = new SparkConf().setAppName("appspark").setMaster("local[4]")
val sc = new SparkContext(conf)
val lines = sc.textFile(getClass.getResource("/mtcars.txt").toString)
val lineLengths = lines.map(x => x.length)
val totalLength = lineLengths.reduce(_ + _)
}

object Script4 {
import Script3._

// spark logs are too verbose by default
// i only want to messages when there is something wrong
import org.apache.log4j.Logger
import org.apache.log4j.Level
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)
println(lines)
}


## Visualize planet orbits with three.js

<!--
! Excerpted from "3D Game Programming for Kids",
! Copyrights apply to this code. It may not be used to create training material,
! courses, books, articles, and the like. Contact us if you are in doubt.
! We make no guarantees that this code is fit for any purpose.
! Visit http://www.pragmaticprogrammer.com/titles/csjava for more book information.
-->
<body></body>
<!--<script src="http://gamingJS.com/Three.js"></script>-->
<!--<script src="http://gamingJS.com/ChromeFixes.js"></script>-->
<script src="js/three.min.js"></script>
<script>
// This is where stuff in our game will happen:
var scene = new THREE.Scene();

// This is what sees the stuff:
var aspect_ratio = window.innerWidth / window.innerHeight;
var above_cam = new THREE.PerspectiveCamera(75, aspect_ratio, 1, 1e6);
above_cam.position.z = 1000;

var earth_cam = new THREE.PerspectiveCamera(75, aspect_ratio, 1, 1e6);

var camera = above_cam;

// This will draw what the camera sees onto the screen:
var renderer = new THREE.WebGLRenderer();
renderer.setSize(window.innerWidth, window.innerHeight);
document.body.appendChild(renderer.domElement);

// ******** START CODING ON THE NEXT LINE ********
document.body.style.backgroundColor = 'black';

var surface = new THREE.MeshPhongMaterial({ambient: 0xFFD700});
var star = new THREE.SphereGeometry(50, 28, 21);
var sun = new THREE.Mesh(star, surface);

var ambient = new THREE.AmbientLight(0xffffff);

var sunlight = new THREE.PointLight(0xffffff, 15, 1000, 1);

var surface = new THREE.MeshPhongMaterial({ambient: 0x1a1a1a, color: 0x0000cd});
var planet = new THREE.SphereGeometry(20, 120, 115);
var earth = new THREE.Mesh(planet, surface);
earth.position.set(250, 0, 0);

var surface = new THREE.MeshPhongMaterial({ambient: 0x1a1a1a, color: 0xb22222});
var planet = new THREE.SphereGeometry(20, 120, 115);
var mars = new THREE.Mesh(planet, surface);
mars.position.set(500, 0, 0);

clock = new THREE.Clock();

function animate() {
requestAnimationFrame(animate);

var time = clock.getElapsedTime();

var e_angle = time * 0.8;
earth.position.set(250 * Math.cos(e_angle), 250 * Math.sin(e_angle), 0);

var m_angle = time * 0.3;
mars.position.set(500 * Math.cos(m_angle), 500 * Math.sin(m_angle), 0);

var y_diff = mars.position.y - earth.position.y,
x_diff = mars.position.x - earth.position.x,
angle = Math.atan2(x_diff, y_diff);

// camera faces the same direction as Z-axis by default
earth_cam.rotation.set(Math.PI / 2, -angle, 0);
earth_cam.position.set(earth.position.x, earth.position.y, 22);

// Now, show what the camera sees on the screen:
renderer.render(scene, camera);
}

animate();

var stars = new THREE.Geometry();
while (stars.vertices.length < 1e4) {
var lat = Math.PI * Math.random() - Math.PI / 2;
var lon = 2 * Math.PI * Math.random();

stars.vertices.push(new THREE.Vector3(
1e5 * Math.cos(lon) * Math.cos(lat),
1e5 * Math.sin(lon) * Math.cos(lat),
1e5 * Math.sin(lat)
));
}
var star_stuff = new THREE.ParticleBasicMaterial({size: 500});
var star_system = new THREE.ParticleSystem(stars, star_stuff);

var code = event.keyCode;

if (code == 65) { // A
camera = above_cam;
}
if (code == 69) { // E
camera = earth_cam;
}
});

</script>


## Javascript execution weird

var mydata = []
d3.csv("https://raw.githubusercontent.com/kindlychung/cytob/master/data/hg19.csv", function (err, data) {
var data1 = data.filter(function (d) {
return d.chr == "chr22";
})
var innerdata = [];
for(var i = 0; i < data1.length; i++) {
mydata.push(data1[i].cyto);
innerdata.push(data1[i].cyto);
}
console.log(innerdata);
console.log("inside", mydata);
})
console.log("outside", mydata);


Result:

outside []
test.js:11 ["p13", "p12", "p11.2", "p11.1", "q11.1", "q11.21", "q11.22", "q11.23", "q12.1", "q12.2", "q12.3", "q13.1", "q13.2", "q13.31", "q13.32", "q13.33"]
test.js:12 inside ["p13", "p12", "p11.2", "p11.1", "q11.1", "q11.21", "q11.22", "q11.23", "q12.1", "q12.2", "q12.3", "q13.1", "q13.2", "q13.31", "q13.32", "q13.33"]


It’s strange that the outside log is actually executed before the inside log.