日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark入门(八)之WordCount

發布時間:2023/12/3 编程问答 22 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark入门(八)之WordCount 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

?一、WordCount

計算文本里面的每個單詞出現的個數,輸出結果。

?

二、maven設置

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.mk</groupId><artifactId>spark-test</artifactId><version>1.0</version><name>spark-test</name><url>http://spark.mk.com</url><properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><maven.compiler.source>1.8</maven.compiler.source><maven.compiler.target>1.8</maven.compiler.target><scala.version>2.11.1</scala.version><spark.version>2.4.4</spark.version><hadoop.version>2.6.0</hadoop.version></properties><dependencies><!-- scala依賴--><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>${scala.version}</version></dependency><!-- spark依賴--><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.11</version><scope>test</scope></dependency></dependencies><build><pluginManagement><plugins><plugin><artifactId>maven-clean-plugin</artifactId><version>3.1.0</version></plugin><plugin><artifactId>maven-resources-plugin</artifactId><version>3.0.2</version></plugin><plugin><artifactId>maven-compiler-plugin</artifactId><version>3.8.0</version></plugin><plugin><artifactId>maven-surefire-plugin</artifactId><version>2.22.1</version></plugin><plugin><artifactId>maven-jar-plugin</artifactId><version>3.0.2</version></plugin></plugins></pluginManagement></build> </project>

?

三、編程代碼?

public class WordCountApp implements SparkConfInfo{public static void main(String[]args){String filePath = "F:\\test\\log.txt";SparkSession sparkSession = new WordCountApp().getSparkConf("WordCount");Map<String, Integer> wordCountMap = sparkSession.sparkContext().textFile(filePath, 4).toJavaRDD().flatMap(v -> Arrays.asList(v.split("[(\\s+)(\r?\n),.。'’]")).iterator()).filter(v -> v.matches("[a-zA-Z-]+")).map(String::toLowerCase).mapToPair(v -> new Tuple2<>(v, 1)).reduceByKey(Integer::sum).collectAsMap();wordCountMap.forEach((k, v) -> System.out.println(k + ":" + v));sparkSession.stop();} }public interface SparkConfInfo {default SparkSession getSparkConf(String appName){SparkConf sparkConf = new SparkConf();if(System.getProperty("os.name").toLowerCase().contains("win")) {sparkConf.setMaster("local[4]");System.out.println("使用本地模擬是spark");}else{sparkConf.setMaster("spark://hadoop01:7077,hadoop02:7077,hadoop03:7077");sparkConf.set("spark.driver.host","192.168.150.1");//本地ip,必須與spark集群能夠相互訪問,如:同一個局域網sparkConf.setJars(new String[] {".\\out\\artifacts\\spark_test\\spark-test.jar"});//項目構建生成的路徑}SparkSession session = SparkSession.builder().appName(appName).config(sparkConf).config(sparkConf).getOrCreate();return session;} }

文件內容

Spark Streaming is an extension of the core Spark API that enables scalable,high-throughput, fault-tolerant stream processing of live 。data streams. Data, can be ,ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems,Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data. DStreams can be created either from input data streams from sources such as Kafka, Flume, and Kinesis, or by applying high-level operations on other DStreams. Internally, a DStream is represented as a sequence of RDDs.This guide shows you how to start writing Spark Streaming programs with DStreams. You can write Spark Streaming programs in Scala, Java or Python (introduced in Spark 1.2), all of which are presented in this guide. You will find tabs throughout this guide that let you choose between code snippets of different languages. databases, and live dashboards. In fact, you can apply Spark’s machine learning and graph processing algorithms on data streams.

輸出

created:1 continuous:1 high-level:3 either:1 reduce:1 many:1 writing:1 learning:1 sources:2 is:2 spark:7 can:6 high-throughput:1 filesystems:1 using:1 rdds:1 of:6 input:1 scala:1 you:5 operations:1 kinesis:2 fact:1 or:4 provides:1 pushed:1 how:1 will:1 join:1 databases:1 window:1 be:4 data:5 from:3 s:1 abstraction:1 languages:1 to:2 all:1 and:5 that:2 fault-tolerant:1 core:1 a:4 expressed:1 internally:1 streaming:4 on:2 dashboards:1 java:1 let:1 processed:2 with:2 write:1 by:1 between:1 in:4 live:2 like:2 represented:1 code:1 are:1 stream:3 algorithms:2 sequence:1 streams:3 graph:1 an:1 flume:2 apply:1 kafka:2 sockets:1 the:1 out:1 presented:1 snippets:1 extension:1 scalable:1 guide:3 dstream:2 choose:1 represents:1 dstreams:3 find:1 shows:1 programs:2 such:1 functions:1 called:1 tcp:1 machine:1 api:1 throughout:1 tabs:1 start:1 which:2 this:3 different:1 processing:2 applying:1 enables:1 complex:1 finally:1 introduced:1 python:1 other:1 discretized:1 map:1 as:2

?

總結

以上是生活随笔為你收集整理的Spark入门(八)之WordCount的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。