spark wordcount完整工程代码(含pom.xml)
生活随笔
收集整理的這篇文章主要介紹了
spark wordcount完整工程代码(含pom.xml)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
工程目錄概覽
代碼
package com.zxl.spark.atguiguimport org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext}object L01_WordCount {def main(args: Array[String]): Unit = {// 創建 Spark 運行配置對象val sparkConf = new SparkConf().setMaster("local").setAppName("L01_WordCount")// 創建 Spark 上下文環境對象(連接對象)val sparkContext = new SparkContext(sparkConf)// 讀取文件數據sparkContext.setLogLevel("ERROR")// 將文件中的數據進行分詞val fileRDD: RDD[String] = sparkContext.textFile("src/main/input/word.txt")// 轉換數據結構 word => (word, 1)val wordRDD: RDD[String] = fileRDD.flatMap(_.split(","))// 將轉換結構后的數據按照相同的單詞進行分組聚合val word2OneRDD: RDD[(String, Int)] = wordRDD.map((_, 1))// 將數據聚合結果采集到內存中val word2CountRDD: RDD[(String, Int)] = word2OneRDD.reduceByKey(_ + _)// 打印結果val word2Count: Array[(String, Int)] = word2CountRDD.collect()word2Count.foreach(println)//阻塞以查看日志while (true){}sparkContext.stop()}}pom
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.zxl</groupId><artifactId>SparkDemo1</artifactId><version>1.0-SNAPSHOT</version><properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target><spark.version>3.1.1</spark.version><spark.scala.version>2.12</spark.scala.version></properties><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_${spark.scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_${spark.scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_${spark.scala.version}</artifactId><version>${spark.version}</version></dependency><!--<dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming-kafka-0-10_2.11</artifactId><version>2.4.4</version></dependency>--><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_${spark.scala.version}</artifactId><version>${spark.version}</version></dependency><!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java --><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.48</version></dependency><!--和其他框架整合使用時有可能報錯compiler--><dependency><groupId>org.codehaus.janino</groupId><artifactId>commons-compiler</artifactId><version>3.1.0</version></dependency></dependencies><build><plugins><!-- 該插件用于將 Scala 代碼編譯成 class 文件 --><plugin><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>3.2.2</version><executions><execution><!-- 聲明綁定到 maven 的 compile 階段 --><goals><goal>testCompile</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-assembly-plugin</artifactId><version>3.1.1</version><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build></project>input/word.txt
hello,zxl hello,zhangxueliang總結
以上是生活随笔為你收集整理的spark wordcount完整工程代码(含pom.xml)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: spark安装须知:SPARK_DIST
- 下一篇: spark设置分区(并行度):保存分区信