當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

sbt構建一個spark工程（scala+spark+sbt）

發布時間：2023/12/20 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 sbt構建一個spark工程（scala+spark+sbt）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

準備工作，文件結構如下：

(python2.7) appleyuchi@ubuntu:~/Desktop/WordCount$ tree
.
├── build.sbt
├── src
│? ? ? ? └── main
│? ? ? ? ? ? ? ? ? ? ?└── scala
│? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?└── WordCount.scala

其中WordCount.scala如下：
?

import org.apache.spark._ import org.apache.spark.SparkContext._object WordCount {def main(args: Array[String]) {val inputFile = args(0)val outputFile = args(1)val conf = new SparkConf().setAppName("wordCount")// Create a Scala Spark Context.val sc = new SparkContext(conf)// Load our input data.val input = sc.textFile(inputFile)// Split up into words.val words = input.flatMap(line => line.split(" "))// Transform into word and count.val counts = words.map(word => (word, 1)).reduceByKey{case (x, y) => x + y}// Save the word count back out to a text file, causing evaluation.counts.saveAsTextFile(outputFile)} }

build.sbt內容如下：

name := "learning-spark-mini-example"version := "1.0"scalaVersion := "2.11.8"libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0" % "provided"

注意，hdfs的文件系統在linux下面是看不到的。
也就是說用hdfs dfs -mkdir在linux下面是看不到的。

注意這個實驗的輸入和輸出都在hdfs文件系統中，linux下是看不到的。

下面是詳細的步驟

1.
注意運行這個例子前，還需要啟動hdfs，否則會connection refused
具體命令是：
./start-dfs.sh
然後jps命令看下，是否namenode和datanode都運行起來了。
2.把linux系統的文件README.txt拷貝到HDFS系統中

hdfs dfs -mkdir /user/appleyuchi
hdfs dfs -put README.txt /user/appleyuchi

3.sbt package

4.
/home/appleyuchi/bigdata/spark-2.3.1-bin-hadoop2.7/bin/spark-submit --class "WordCount" --master local /home/appleyuchi/Desktop/WordCount/target/scala-2.11/learning-spark-mini-example_2.11-1.0.jar hdfs://localhost:9000/user/appleyuchi/README.txt ./wordcounts

這裏注意下哈
hdfs://localhost:9000/user/appleyuchi/README.txt
這個東西不是linux上面的文件路徑，是第2個步驟中，把linux的文件傳入到hdfs系統中以後的路徑
也就是說這個文件在spark處理時，linux下我們直接看是看不見的。

5.
把hdfs系統中的運行結果拷貝到linux系統中
hadoop fs -get /user/appleyuchi/wordcounts ~/wordcounts

cd ~/wordcounts
cat part-00000
(Hadoop,1)
(Commodity,1)
(For,1)
(this,3)
(country,1)
(under,1)
(it,1)
(The,4)
(Jetty,1)
(Software,2)
(Technology,1)
(<http://www.wassenaar.org/>,1)
(have,1)
(http://wiki.apache.org/hadoop/,1)
(BIS,1)
(classified,1)
(This,1)
(following,1)
(which,2)
(security,1)
(See,1)
(encryption,3)
(Number,1)
(export,1)
(reside,1)
(for,3)
((BIS),,1)
(any,1)
(at:,2)
(software,2)
(makes,1)
(algorithms.,1)
(re-export,2)
(latest,1)
(your,1)
(SSL,1)
(the,8)
(Administration,1)
(includes,2)
(import,,2)
(provides,1)
(Unrestricted,1)
(country's,1)
(if,1)
(740.13),1)
(Commerce,,1)
(country,,1)
(software.,2)
(concerning,1)
(laws,,1)
(source,1)
(possession,,2)
(Apache,1)
(our,2)
(written,1)
(as,1)
(License,1)
(regulations,1)
(libraries,1)
(by,1)
(please,2)
(form,1)
(BEFORE,1)
(ENC,1)
(code.,1)
(both,1)
(5D002.C.1,,1)
(distribution,2)
(visit,1)
(is,1)
(about,1)
(website,1)
(currently,1)
(permitted.,1)
(check,1)
(Security,1)
(Section,1)
(on,2)
(performing,1)
((see,1)
(U.S.,1)
(with,1)
(in,1)
((ECCN),1)
(object,1)
(using,2)
(cryptographic,3)
(mortbay.org.,1)
(and/or,1)
(Department,1)
(manner,1)
(from,1)
(Core,1)
(has,1)
(may,1)
(Exception,1)
(Industry,1)
(restrictions,1)
(details,1)
(http://hadoop.apache.org/core/,1)
(project,1)
(you,1)
(another,1)
(or,2)
(use,,2)
(policies,1)
(uses,1)
(information,2)
(Hadoop,,1)
(to,2)
(code,1)
(software,,2)
(Regulations,,1)
(more,2)
(software:,1)
(see,1)
(,18)
(of,5)
(wiki,,1)
(Bureau,1)
(Control,1)
(exception,1)
(Government,1)
(eligible,1)
(Export,2)
(information.,1)
(Foundation,1)
(functions,1)
(and,6)
(included,1)
((TSU),1)
(asymmetric,1)

也可以是到hdfs系統中去查看結果，命令如下：

hdfs dfs -cat /user/appleyuchi/wordcounts/*

-------------------------------------------------------------
其他用到的命令：
hdfs dfs -rmr input刪除根目錄下面的input文件夾
hdfs dfs -ls

所以來總結下：
先要啓動HDFS系統，然後輸入的數據文件README.txt要傳入HDFS系統中，運行代碼前還要用sbt解決依賴問題，最後運行該代碼，
運行後的結果最初是放在HDFS系統中的，爲了觀察結果，把結果從HDFS系統中轉移到linux系統中，最後才結束。

參考文獻：
https://blog.csdn.net/coder__cs/article/details/78992764

總結

以上是生活随笔為你收集整理的sbt構建一個spark工程（scala+spark+sbt）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： pyspark读写SequenceFil
下一篇： maven构建scala工程并最终运行的