當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

spark wordcount完整工程代码（含pom.xml）

發布時間：2025/1/21 编程问答 22 豆豆

生活随笔收集整理的這篇文章主要介紹了 spark wordcount完整工程代码（含pom.xml）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

工程目錄概覽

代碼

package com.zxl.spark.atguiguimport org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext}object L01_WordCount {def main(args: Array[String]): Unit = {// 創建 Spark 運行配置對象val sparkConf = new SparkConf().setMaster("local").setAppName("L01_WordCount")// 創建 Spark 上下文環境對象（連接對象）val sparkContext = new SparkContext(sparkConf)// 讀取文件數據sparkContext.setLogLevel("ERROR")// 將文件中的數據進行分詞val fileRDD: RDD[String] = sparkContext.textFile("src/main/input/word.txt")// 轉換數據結構 word => (word, 1)val wordRDD: RDD[String] = fileRDD.flatMap(_.split(","))// 將轉換結構后的數據按照相同的單詞進行分組聚合val word2OneRDD: RDD[(String, Int)] = wordRDD.map((_, 1))// 將數據聚合結果采集到內存中val word2CountRDD: RDD[(String, Int)] = word2OneRDD.reduceByKey(_ + _)// 打印結果val word2Count: Array[(String, Int)] = word2CountRDD.collect()word2Count.foreach(println)//阻塞以查看日志while (true){}sparkContext.stop()}}

pom

<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.zxl</groupId><artifactId>SparkDemo1</artifactId><version>1.0-SNAPSHOT</version><properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target><spark.version>3.1.1</spark.version><spark.scala.version>2.12</spark.scala.version></properties><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_${spark.scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_${spark.scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_${spark.scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_${spark.scala.version}</artifactId><version>${spark.version}</version></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.48</version></dependency><dependency><groupId>org.codehaus.janino</groupId><artifactId>commons-compiler</artifactId><version>3.1.0</version></dependency></dependencies><build><plugins><plugin><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>3.2.2</version><executions><execution><goals><goal>testCompile</goal></goals></execution></executions></plugin><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-assembly-plugin</artifactId><version>3.1.1</version><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build></project>

input/word.txt

hello,zxl hello,zhangxueliang

總結

以上是生活随笔為你收集整理的spark wordcount完整工程代码（含pom.xml）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： spark安装须知：SPARK_DIST
下一篇： spark设置分区（并行度）：保存分区信