ubuntu - 在单机上安装 Spark

标签 ubuntu apache-spark

我需要在运行 Ubuntu 14.04 的单台计算机上安装 Spark,我需要它主要用于教育目的,所以我对高性能不是很感兴趣。

我没有足够的知识来学习教程http://spark.apache.org/docs/1.2.0/spark-standalone.html我不知道应该安装哪个版本的 Spark。

有人可以解释一下如何在我的机器上逐步设置一个可用的 Spark 系统吗?

编辑: 根据评论和当前答案,我可以运行 Spark 控制台并使用它。

    donbeo@donbeo-HP-EliteBook-Folio-9470m:~/Applications/spark/spark-1.1.0$ ./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/02/04 10:20:20 INFO SecurityManager: Changing view acls to: donbeo,
15/02/04 10:20:20 INFO SecurityManager: Changing modify acls to: donbeo,
15/02/04 10:20:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo, ); users with modify permissions: Set(donbeo, )
15/02/04 10:20:20 INFO HttpServer: Starting HTTP Server
15/02/04 10:20:20 INFO Utils: Successfully started service 'HTTP class server' on port 48135.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0

Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_75)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/04 10:20:23 WARN Utils: Your hostname, donbeo-HP-EliteBook-Folio-9470m resolves to a loopback address:; using instead (on interface wlan0)
15/02/04 10:20:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/02/04 10:20:23 INFO SecurityManager: Changing view acls to: donbeo,
15/02/04 10:20:23 INFO SecurityManager: Changing modify acls to: donbeo,
15/02/04 10:20:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(donbeo, ); users with modify permissions: Set(donbeo, )
15/02/04 10:20:23 INFO Slf4jLogger: Slf4jLogger started
15/02/04 10:20:23 INFO Remoting: Starting remoting
15/02/04 10:20:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f28182938099b6809b849780b2c3cbc0dcc3c4cadcc3dcc6c7" rel="noreferrer noopener nofollow">[email protected]</a>:34171]
15/02/04 10:20:23 INFO Remoting: Remoting now listens on addresses: [akka.tcp://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d8aba8b9aab39caab1aebdaa98e9e1eaf6e9eee0f6e9f6eced" rel="noreferrer noopener nofollow">[email protected]</a>:34171]
15/02/04 10:20:23 INFO Utils: Successfully started service 'sparkDriver' on port 34171.
15/02/04 10:20:23 INFO SparkEnv: Registering MapOutputTracker
15/02/04 10:20:23 INFO SparkEnv: Registering BlockManagerMaster
15/02/04 10:20:24 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150204102024-1e7b
15/02/04 10:20:24 INFO Utils: Successfully started service 'Connection manager for block manager' on port 44926.
15/02/04 10:20:24 INFO ConnectionManager: Bound socket to port 44926 with id = ConnectionManagerId(,44926)
15/02/04 10:20:24 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/02/04 10:20:24 INFO BlockManagerMaster: Trying to register BlockManager
15/02/04 10:20:24 INFO BlockManagerMasterActor: Registering block manager with 265.4 MB RAM
15/02/04 10:20:24 INFO BlockManagerMaster: Registered BlockManager
15/02/04 10:20:24 INFO HttpFileServer: HTTP File server directory is /tmp/spark-58772693-4106-4ff0-a333-6512bcfff504
15/02/04 10:20:24 INFO HttpServer: Starting HTTP Server
15/02/04 10:20:24 INFO Utils: Successfully started service 'HTTP file server' on port 51677.
15/02/04 10:20:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/02/04 10:20:24 INFO SparkUI: Started SparkUI at
15/02/04 10:20:24 INFO Executor: Using REPL class URI:
15/02/04 10:20:24 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="295a59485b426d5b405f4c5b6918101b07181f110718071d1c" rel="noreferrer noopener nofollow">[email protected]</a>:34171/user/HeartbeatReceiver
15/02/04 10:20:24 INFO SparkILoop: Created spark context..
Spark context available as sc.

scala> val x = 3
x: Int = 3


现在假设我想在 scala 文件中使用 Spark,例如

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))



如果您只是计划在单台机器上运行它以进行学习等,那么您可以使用 local (1 核)或 local[*](所有核心)“master”的值。然后它就像普通的 JVM 进程一样运行,甚至在 IDE、调试器等中也是如此。我编写了一个以这种方式工作的 DIY 研讨会,https://github.com/deanwampler/spark-workshop ,如果您需要示例。


关于ubuntu - 在单机上安装 Spark,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28308785/


linux - 错误 : [scripts/Makefile. modinst :33: arch/x86/crypto/aegis128-aesni. ko]

ubuntu - 通过 apt 获取 Ubuntu 中已安装应用程序的列表

scala - 如何更改 StructType 的 StructField 中列的数据类型?

python - 如何从python复制pyspark/hadoop中的文件

azure - 从 Azure IoT 中心路由和转换数据

php - 如何在 netbeans 的 ubuntu 16.04 中安装 xdebug?

linux - 在 ubuntu14.01 上安装 opencv 2.4.9 出错

linux - 从 mysql-slow.log 文件中删除重复行

apache-spark - 为什么 Spark 不根据读取时的 Parquet block 大小创建分区? (相反,它似乎按 Parquet 文件压缩大小进行分区)

apache-spark - 将具有多个相同 key 的流写入到 delta lake