Friday, October 16, 2015

Set up Spark 1.4.1 in Intellij 14.1

Here is the build.sbt file:

name := "scalaChartTest"

version := "1.0"

scalaVersion := "2.10.5"

libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.4.1"

Note that Spark 1.4.1 depends on Scala 2.10.

Now create a file SparkTest.scala:

import org.apache.spark.{SparkContext, SparkConf}

/**
 * Created by IDEA on 05/09/2015.
 */
object SparkTest {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("sparktest").setMaster("local")
    val sc = new SparkContext(conf)
    val data = Array(1, 2, 3, 4, 5)
  }
}

If you run this (ctrl + shift + r in osx), you will get a warning:

 Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/09/05 18:18:03 INFO SparkContext: Running Spark version 1.4.1
15/09/05 18:18:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/05 18:18:04 INFO SecurityManager: Changing view acls to: kaiyin
15/09/05 18:18:04 INFO SecurityManager: Changing modify acls to: kaiyin
15/09/05 18:18:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kaiyin); users with modify permissions: Set(kaiyin)
15/09/05 18:18:04 INFO Slf4jLogger: Slf4jLogger started
15/09/05 18:18:04 INFO Remoting: Starting remoting
15/09/05 18:18:04 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.14:62624]
15/09/05 18:18:04 INFO Utils: Successfully started service 'sparkDriver' on port 62624.
15/09/05 18:18:04 INFO SparkEnv: Registering MapOutputTracker
15/09/05 18:18:04 INFO SparkEnv: Registering BlockManagerMaster
15/09/05 18:18:04 INFO DiskBlockManager: Created local directory at /private/var/folders/5d/44ctbbln4dsflgzxph1dm8wr0000gn/T/spark-dd028760-b0af-41f4-b964-684409f4ea69/blockmgr-eec5ca8f-e35f-4d76-a96b-d1323e5fd525
15/09/05 18:18:04 INFO MemoryStore: MemoryStore started with capacity 2.4 GB
15/09/05 18:18:04 INFO HttpFileServer: HTTP File server directory is /private/var/folders/5d/44ctbbln4dsflgzxph1dm8wr0000gn/T/spark-dd028760-b0af-41f4-b964-684409f4ea69/httpd-c9097cfb-fb79-4a2e-82d3-4dcbb5b1ec3b
15/09/05 18:18:04 INFO HttpServer: Starting HTTP Server
15/09/05 18:18:04 INFO Utils: Successfully started service 'HTTP file server' on port 62625.
15/09/05 18:18:04 INFO SparkEnv: Registering OutputCommitCoordinator
15/09/05 18:18:04 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/09/05 18:18:04 INFO SparkUI: Started SparkUI at http://192.168.1.14:4040
15/09/05 18:18:05 INFO Executor: Starting executor ID driver on host localhost
15/09/05 18:18:05 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62626.
15/09/05 18:18:05 INFO NettyBlockTransferService: Server created on 62626
15/09/05 18:18:05 INFO BlockManagerMaster: Trying to register BlockManager
15/09/05 18:18:05 INFO BlockManagerMasterEndpoint: Registering block manager localhost:62626 with 2.4 GB RAM, BlockManagerId(driver, localhost, 62626)
15/09/05 18:18:05 INFO BlockManagerMaster: Registered BlockManager

So native hadoop lib can’t be loaded on osx. That’s ok, that’s because hadoop native lib is not available on osx in the first place.

Now create a toy file:

echo -e 'a\nb\nc\nd' > /tmp/readme

Start a Scala console in intellij and do the following:

0 comments: