日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

intellij运行spark的maven方式运行WordCount

發布時間:2023/12/20 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 intellij运行spark的maven方式运行WordCount 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Intellij版本是2018.3.2
scala plugin使用的是:
scala-intellij-bin-2018.3.6.zip

1.Create New Project

2.Maven
選擇Create from archetype
選擇org.scala-tools.archetypes:scala-archetype-simple

3.GroupId和ArtifactId都填寫scala-learn
然后點擊Next

4.Next

5.Finish

6.進入主界面后,右下角會提示Maven projects need to be imported,
選擇Import Changes.
有上方會提示:
No Scala SDK in module,選擇選擇Setup Scala SDK

(記得在settings-plugin中確保已經安裝了scala plugin)
7.刪除整個test文件夾、整個scala-learn文件夾
這里如果沒有提前安裝scala插件,是不會出現test文件夾的

8.在main文件夾下面新建一個object,取名為
WordCountLocal.scala,代碼在附錄中。

9.修改pom.xml,完整內容如附錄所示。

實驗文件結構為:
$ tree
.
└── scala-learn
├── pom.xml
├── scala-learn.iml
└── src
└── main
└── WordCountLocal.scala

  • 啟動hadoop中的hdfs系統,具體的啟動辦法是:
    alias start="/bigdata/hadoop-2.7.7/sbin/start-dfs.sh&&/bigdata/hadoop-2.7.7/sbin/start-yarn.sh"

    然后終端輸入start即可啟動.
    當然如果這不是第一次啟動的話,你需要事先建立好namenode以及對namenode進行格式化.
    然后,
    新建一個hello.txt,然后
    hdfs dfs -mkdir -p /user/ds
    hdfs dfs -put hello.txt /user/ds

    11.按下Alt+Shift+F10,選擇WordCountLocal即可運行。
    運行結果如下:

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 19/03/21 19:12:14 INFO Remoting: Starting remoting 19/03/21 19:12:14 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.85.10.236:33403] (eat,1) (I,1) (to,1) (apple,2) (an,1) (yuchi,1) (want,1)Process finished with exit code 0最終效果如下:

    -----------------------------------------------附錄-----------------------------------

    WordCountLocal.scala代碼是:

    import org.apache.spark.mllib.linalg.{Matrices, Matrix} import org.apache.spark.{SparkContext, SparkConf} import org.apache.log4j.Logger import org.apache.log4j.Level/*** Created by Administrator on 2017/4/20.* xudong*/ object WordCountLocal {def main(args: Array[String]) {/*** SparkContext 的初始化需要一個SparkConf對象* SparkConf包含了Spark集群的配置的各種參數*/Logger.getLogger("org").setLevel(Level.OFF)Logger.getLogger("akka").setLevel(Level.OFF)Logger.getRootLogger().setLevel(Level.ERROR)val conf=new SparkConf().setMaster("local")//啟動本地化計算.setAppName("testRdd")//設置本程序名稱//Spark程序的編寫都是從SparkContext開始的val sc=new SparkContext(conf)//以上的語句等價與val sc=new SparkContext("local","testRdd")val data=sc.textFile("hdfs://master:9000/user/ds/hello.txt")//讀取本地文件data.flatMap(_.split(" "))//下劃線是占位符,flatMap是對行操作的方法,對讀入的數據進行分割.map((_,1))//將每一項轉換為key-value,數據是key,value是1.reduceByKey(_+_)//將具有相同key的項相加合并成一個.collect()//將分布式的RDD返回一個單機的scala array,在這個數組上運用scala的函數操作,并返回結果到驅動程序.foreach(println)//循環打印} }

    pom.xml:

    <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.xudong</groupId><artifactId>xudong</artifactId><version>1.0-SNAPSHOT</version><properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><spark.version>1.6.0</spark.version><scala.version>2.10</scala.version><hadoop.version>2.6.0</hadoop.version></properties><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_${scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_${scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_${scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_${scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.6.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming-kafka_${scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-mllib_${scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.39</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.12</version></dependency></dependencies><!-- maven官方 http://repo1.maven.org/maven2/ 或 http://repo2.maven.org/maven2/ (延遲低一些) --><repositories><repository><id>central</id><name>Maven Repository Switchboard</name><layout>default</layout><url>http://repo2.maven.org/maven2</url><snapshots><enabled>false</enabled></snapshots></repository></repositories><build><sourceDirectory>src/main/scala</sourceDirectory><testSourceDirectory>src/test/scala</testSourceDirectory></build></project>

    總結

    以上是生活随笔為你收集整理的intellij运行spark的maven方式运行WordCount的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。