日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

spark之1:快速入门

發布時間:2024/1/23 编程问答 28 豆豆
生活随笔 收集整理的這篇文章主要介紹了 spark之1:快速入门 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

spark之1:快速入門

@(SPARK)[spark, 大數據]

spark可以通過交互式命令行及編程兩種方式來進行調用:
前者支持scala與python
后者支持scala、python與java

本文參考https://spark.apache.org/docs/latest/quick-start.html,可作快速入門

再詳細資料及用法請見https://spark.apache.org/docs/latest/programming-guide.html

建議學習路徑:

1、安裝單機環境:http://blog.csdn.net/jediael_lu/article/details/45310321

2、快速入門,有簡單的印象:本文http://blog.csdn.net/jediael_lu/article/details/45333195

3、學習scala

4、深入一點:https://spark.apache.org/docs/latest/programming-guide.html

5、找其它專業資料或者在使用中學習

一、基礎介紹

1、spark的所有操作均是基于RDD(Resilient Distributed Dataset)進行的,其中R(彈性)的意思為可以方便的在內存和存儲間進行交換。

2、RDD的操作可以分為2類:transformation 和 action,其中前者從一個RDD生成另一個RDD(如filter),后者對RDD生成一個結果(如count)。

二、命令行方式

只要機器上有java與scala,spark無需任何安裝、配置,就可直接運行,可以首先通過運行:

./bin/run-example SparkPi 10

檢查是否正常,然后再使用下面介紹的shell來執行job。

1、快速入門

$ ./bin/spark-shell

(1)先將一個文件讀入一個RDD中,然后統計這個文件的行數及顯示第一行。

scala> var textFile = sc.textFile("/mnt/jediael/spark-1.3.1-bin-hadoop2.6/README.md") textFile: org.apache.spark.rdd.RDD[String] = /mnt/jediael/spark-1.3.1-bin-hadoop2.6/README.md MapPartitionsRDD[1] at textFile at <console>:21scala> textFile.count() res0: Long = 98scala> textFile.first(); res1: String = # Apache Spark

(2)統計包含spark的行數

scala> val linesWithSpark = textFile.filter(line => line.contains("Spark")) linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:23scala> linesWithSpark.count() res0: Long = 19

(3)以上的filter與count可以組合使用

scala> textFile.filter(line => line.contains("Spark")).count() res1: Long = 19

2、深入一點

(1)使用map統計每一行的單詞數量,reduce找出最大的那一行所包括的單詞數量

scala> textFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b) res2: Int = 14

(2)在scala中直接調用java包

scala> import java.lang.Math import java.lang.Mathscala> textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b)) res2: Int = 14

(3)wordcount的實現

scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:24scala> wordCounts.collect() res4: Array[(String, Int)] = Array((package,1), (For,2), (processing.,1), (Programs,1), (Because,1), (The,1), (cluster.,1), (its,1), ([run,1), (APIs,1), (computation,1), (Try,1), (have,1), (through,1), (several,1), (This,2), ("yarn-cluster",1), (graph,1), (Hive,2), (storage,1), (["Specifying,1), (To,2), (page](http://spark.apache.org/documentation.html),1), (Once,1), (application,1), (prefer,1), (SparkPi,2), (engine,1), (version,1), (file,1), (documentation,,1), (processing,,2), (the,21), (are,1), (systems.,1), (params,1), (not,1), (different,1), (refer,2), (Interactive,2), (given.,1), (if,4), (build,3), (when,1), (be,2), (Tests,1), (Apache,1), (all,1), (./bin/run-example,2), (programs,,1), (including,3), (Spark.,1), (package.,1), (1000).count(),1), (HDFS,1), (Versions,1), (Data.,1), (>...

3、緩存:將RDD寫入緩存會大大提高處理效率

scala> linesWithSpark.cache() res5: linesWithSpark.type = MapPartitionsRDD[2] at filter at <console>:23 scala> linesWithSpark.count() res8: Long = 19

三、編碼

scala代碼,還不熟悉,以后再運行

import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConfobject SimpleApp {def main(args: Array[String]) {val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your systemval conf = new SparkConf().setAppName("Simple Application")val sc = new SparkContext(conf)val logData = sc.textFile(logFile, 2).cache()val numAs = logData.filter(line => line.contains("a")).count()val numBs = logData.filter(line => line.contains("b")).count()println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))} }

總結

以上是生活随笔為你收集整理的spark之1:快速入门的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。