Thursday, November 26, 2015

De Méré's puzzle

import scala.collection.mutable.ArrayBuffer

object DeMere {
  var sum11, sum12 = ArrayBuffer.empty[Array[Int]]
  for{
  i <- 1 to 6
  j <- 1 to 6
  k <- 1 to 6
  } {
    val s = i + j + k
    s match {
      case 11 => sum11 += Array(i, j, k)
      case 12 => sum12 += Array(i, j, k)
      case _ =>
    }
  }
  val total = math.pow(6, 3)
  sum11.length / total
  sum12.length / total
}

Monday, November 23, 2015

Machine learning foundation 2

Hoefding’s inequality

Sample mean is unlikely to far from true mean when sample size is large.


, where is the sample mean and is the true mean. In other words, is probably approximately correct (PAC).

When applied to machine learning, this means:

Sunday, November 22, 2015

Machine learning foundation 1

Elements

We have

  • Unknown target function (underlying the data)
  • Training examples (the data)
  • Hypothesis set (possible approximations to the target function)
  • Learning algorithm (for choosing the “best” from the hypothesis set)
  • Final hypothesis (result of learning)

Perceptrons

Hypothesis in the vector form

is a vector here.

Perceptron Learning Algorithm (PLA)

are vectors here.

  1. Start with an arbitrary , say .
  2. On the first case where , update like this: . This works because , or , or, which guarantees that the iteration of is in the direction of .
  3. Continue iterating until no mistake occurs for any in a full loop.

Questions about the PLA

  • Will it ever stop?
  • When is stops, will the result be close to the target function (unknown)?
  • Will it ever make a mistake (inside/outside your dataset)?

Linear separability

PLA does not always halt:

Solution guaranteed when the linear separable

Agreement between prediction and existing data can be summarized by:

, where is the "true and unknown" weight vector

Since always has the same sign as , this should be non-negative.
Hence existence of solution means that .

How do we know the in PLA will get close enough to ?

We can see the inner product gets larger as grows, this is a good sign, but we still need to check whether this is because they become more and more similar in direction or because becomes larger in magnitude.

Since \(t\) only grows when there is a mistake, we know \(2y_n w_t^T x_n \le 0\), hence

, where is the max magnitude of all possible . At least only limited growth is seen in .

If we assume \(w_0\) = 0, using telescope technique we can get
\[\|w_t\|^2 \le tR^2\]
, or
\[\|w_t\| \le \sqrt{tR^2}\]
If we let

, then

Using telescope collapsing we get .

To eliminate influence from magnitude we can normalize both \(w_f\) and \(w_t\) and check their inner product:

So, indeed, and are getting closer and closer to each other in direction. Let

.

And since converges to 1, we know eventually

So, not only do we know that PLA will find the solution (provided there is a solution), we also know it will find it in so many (finite) steps.

Saturday, November 14, 2015

Use apache spark in intellij

Add this line in your build.sbt:

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2"

Do something like this:

object Script3 {
  import org.apache.spark.SparkContext
  import org.apache.spark.SparkConf
  // local[4] means that you want spark to run locally with 4 threads 
  // you can use a cluster when your app is production ready, of course
  val conf = new SparkConf().setAppName("appspark").setMaster("local[4]")
  val sc = new SparkContext(conf)
  val lines = sc.textFile(getClass.getResource("/mtcars.txt").toString)
  val lineLengths = lines.map(x => x.length)
  val totalLength = lineLengths.reduce(_ + _)
}

object Script4 {
  import Script3._

  // spark logs are too verbose by default
  // i only want to messages when there is something wrong
  import org.apache.log4j.Logger
  import org.apache.log4j.Level
  Logger.getLogger("org").setLevel(Level.WARN)
  Logger.getLogger("akka").setLevel(Level.WARN)
  println(lines)
}

Friday, November 13, 2015

Visualize planet orbits with three.js

<!--
 ! Excerpted from "3D Game Programming for Kids",
 ! published by The Pragmatic Bookshelf.
 ! Copyrights apply to this code. It may not be used to create training material,
 ! courses, books, articles, and the like. Contact us if you are in doubt.
 ! We make no guarantees that this code is fit for any purpose.
 ! Visit http://www.pragmaticprogrammer.com/titles/csjava for more book information.
-->
<body></body>
<!--<script src="http://gamingJS.com/Three.js"></script>-->
<!--<script src="http://gamingJS.com/ChromeFixes.js"></script>-->
<script src="js/three.min.js"></script>
<script>
    // This is where stuff in our game will happen:
    var scene = new THREE.Scene();

    // This is what sees the stuff:
    var aspect_ratio = window.innerWidth / window.innerHeight;
    var above_cam = new THREE.PerspectiveCamera(75, aspect_ratio, 1, 1e6);
    above_cam.position.z = 1000;
    scene.add(above_cam);

    var earth_cam = new THREE.PerspectiveCamera(75, aspect_ratio, 1, 1e6);
    scene.add(earth_cam);

    var camera = above_cam;

    // This will draw what the camera sees onto the screen:
    var renderer = new THREE.WebGLRenderer();
    renderer.setSize(window.innerWidth, window.innerHeight);
    document.body.appendChild(renderer.domElement);

    // ******** START CODING ON THE NEXT LINE ********
    document.body.style.backgroundColor = 'black';

    var surface = new THREE.MeshPhongMaterial({ambient: 0xFFD700});
    var star = new THREE.SphereGeometry(50, 28, 21);
    var sun = new THREE.Mesh(star, surface);
    scene.add(sun);

    var ambient = new THREE.AmbientLight(0xffffff);
    scene.add(ambient);

    var sunlight = new THREE.PointLight(0xffffff, 15, 1000, 1);
    sun.add(sunlight);

    var surface = new THREE.MeshPhongMaterial({ambient: 0x1a1a1a, color: 0x0000cd});
    var planet = new THREE.SphereGeometry(20, 120, 115);
    var earth = new THREE.Mesh(planet, surface);
    earth.position.set(250, 0, 0);
    scene.add(earth);

    var surface = new THREE.MeshPhongMaterial({ambient: 0x1a1a1a, color: 0xb22222});
    var planet = new THREE.SphereGeometry(20, 120, 115);
    var mars = new THREE.Mesh(planet, surface);
    mars.position.set(500, 0, 0);
    scene.add(mars);

    clock = new THREE.Clock();

    function animate() {
        requestAnimationFrame(animate);

        var time = clock.getElapsedTime();

        var e_angle = time * 0.8;
        earth.position.set(250 * Math.cos(e_angle), 250 * Math.sin(e_angle), 0);

        var m_angle = time * 0.3;
        mars.position.set(500 * Math.cos(m_angle), 500 * Math.sin(m_angle), 0);

        var y_diff = mars.position.y - earth.position.y,
                x_diff = mars.position.x - earth.position.x,
                angle = Math.atan2(x_diff, y_diff);

        // http://fs5.directupload.net/images/151113/aqz9jn7v.jpg
        // camera faces the same direction as Z-axis by default
        earth_cam.rotation.set(Math.PI / 2, -angle, 0);
        earth_cam.position.set(earth.position.x, earth.position.y, 22);

        // Now, show what the camera sees on the screen:
        renderer.render(scene, camera);
    }

    animate();

    var stars = new THREE.Geometry();
    while (stars.vertices.length < 1e4) {
        var lat = Math.PI * Math.random() - Math.PI / 2;
        var lon = 2 * Math.PI * Math.random();

        stars.vertices.push(new THREE.Vector3(
                1e5 * Math.cos(lon) * Math.cos(lat),
                1e5 * Math.sin(lon) * Math.cos(lat),
                1e5 * Math.sin(lat)
        ));
    }
    var star_stuff = new THREE.ParticleBasicMaterial({size: 500});
    var star_system = new THREE.ParticleSystem(stars, star_stuff);
    scene.add(star_system);

    document.addEventListener("keydown", function (event) {
        var code = event.keyCode;

        if (code == 65) { // A
            camera = above_cam;
        }
        if (code == 69) { // E
            camera = earth_cam;
        }
    });

</script>

Monday, November 2, 2015

Javascript execution weird

var mydata = []
d3.csv("https://raw.githubusercontent.com/kindlychung/cytob/master/data/hg19.csv", function (err, data) {
    var data1 = data.filter(function (d) {
        return d.chr == "chr22";
    })
    var innerdata = [];
    for(var i = 0; i < data1.length; i++) {
        mydata.push(data1[i].cyto);
        innerdata.push(data1[i].cyto);
    }
    console.log(innerdata);
    console.log("inside", mydata);
})
console.log("outside", mydata);

Result:

outside []
test.js:11 ["p13", "p12", "p11.2", "p11.1", "q11.1", "q11.21", "q11.22", "q11.23", "q12.1", "q12.2", "q12.3", "q13.1", "q13.2", "q13.31", "q13.32", "q13.33"]
test.js:12 inside ["p13", "p12", "p11.2", "p11.1", "q11.1", "q11.21", "q11.22", "q11.23", "q12.1", "q12.2", "q12.3", "q13.1", "q13.2", "q13.31", "q13.32", "q13.33"]

It’s strange that the outside log is actually executed before the inside log.