Spark Streaming之统计socket单词数
生活随笔
收集整理的這篇文章主要介紹了
Spark Streaming之统计socket单词数
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
一、統(tǒng)計(jì)socket單詞數(shù)
偵聽TCP套接字的數(shù)據(jù)服務(wù)器接收到的文本數(shù)據(jù)中的單詞數(shù)。
?
二、maven配置
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.mk</groupId><artifactId>spark-test</artifactId><version>1.0</version><name>spark-test</name><url>http://spark.mk.com</url><properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><maven.compiler.source>1.8</maven.compiler.source><maven.compiler.target>1.8</maven.compiler.target><scala.version>2.11.1</scala.version><spark.version>2.4.4</spark.version><hadoop.version>2.6.0</hadoop.version></properties><dependencies><!-- scala依賴--><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>${scala.version}</version></dependency><!-- spark依賴--><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.11</version><scope>test</scope></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><version>1.18.10</version></dependency> </dependencies><build><pluginManagement><plugins><plugin><artifactId>maven-clean-plugin</artifactId><version>3.1.0</version></plugin><plugin><artifactId>maven-resources-plugin</artifactId><version>3.0.2</version></plugin><plugin><artifactId>maven-compiler-plugin</artifactId><version>3.8.0</version></plugin><plugin><artifactId>maven-surefire-plugin</artifactId><version>2.22.1</version></plugin><plugin><artifactId>maven-jar-plugin</artifactId><version>3.0.2</version></plugin></plugins></pluginManagement></build> </project>?
三、編程代碼?
public class SocketApp implements SparkConfInfo {public static void main(String[] args) throws InterruptedException {JavaStreamingContext streamingContext = new SocketApp().getStreamingContext("SocketApp", 5);JavaReceiverInputDStream<String> lines = streamingContext.socketTextStream("localhost", 8891);JavaDStream<String> words = lines.flatMap(x -> Arrays.asList(x.split("\\s+")).stream().filter(v->v.length()>0).iterator());JavaPairDStream<String, Integer> pairs = words.mapToPair(s -> new Tuple2<>(s, 1));JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey(Integer::sum);wordCounts.foreachRDD(v->{v.foreach(s-> System.out.println(s._1+":" + s._2));System.out.println("---------------------------");});streamingContext.start();streamingContext.awaitTermination();} }public interface SparkConfInfo {default JavaStreamingContext getStreamingContext(String appName, int second){SparkConf sparkConf = getSparkConf();sparkConf.setAppName(appName);JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(second));return jssc;}default SparkSession getSparkSession(String appName){SparkConf sparkConf = getSparkConf();SparkSession session = SparkSession.builder().appName(appName).config(sparkConf).config(sparkConf).getOrCreate();return session;}default SparkConf getSparkConf() {SparkConf sparkConf = new SparkConf();if(System.getProperty("os.name").toLowerCase().contains("win")) {sparkConf.setMaster("local[4]");System.out.println("使用本地模擬是spark");}else{sparkConf.setMaster("spark://hadoop01:7077,hadoop02:7077,hadoop03:7077");sparkConf.set("spark.driver.host","192.168.150.1");//本地ip,必須與spark集群能夠相互訪問,如:同一個(gè)局域網(wǎng)sparkConf.setJars(new String[] {".\\out\\artifacts\\spark_test\\spark-test.jar"});//項(xiàng)目構(gòu)建生成的路徑}return sparkConf;} }?
輸入內(nèi)容
Tom Lucy Tom Jack Jone Lucy Jone Jack Lucy Mary Lucy Ben Jack Alice Jack Jesse Terry Alice Terry Jesse Philip Terry Philip Alma Mark Terry Mark Alma輸出結(jié)果
Mark:2 Tom:2 Jesse:2 Philip:2 Alice:2 Jone:2 Terry:4 Alma:2 Ben:1 Lucy:4 Mary:1 Jack:4 ---------------------------?
總結(jié)
以上是生活随笔為你收集整理的Spark Streaming之统计socket单词数的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Spark入门(十八)之多表关联
- 下一篇: Flowable学习笔记(一、入门)