日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark入门(十二)之最值

發布時間:2023/12/3 编程问答 44 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Spark入门(十二)之最值 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

?一、最值

計算文本里面的最值(最大值、最小值、平均值),輸出結果。

?

二、maven設置

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.mk</groupId><artifactId>spark-test</artifactId><version>1.0</version><name>spark-test</name><url>http://spark.mk.com</url><properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><maven.compiler.source>1.8</maven.compiler.source><maven.compiler.target>1.8</maven.compiler.target><scala.version>2.11.1</scala.version><spark.version>2.4.4</spark.version><hadoop.version>2.6.0</hadoop.version></properties><dependencies><!-- scala依賴--><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>${scala.version}</version></dependency><!-- spark依賴--><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.11</artifactId><version>${spark.version}</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.11</version><scope>test</scope></dependency></dependencies><build><pluginManagement><plugins><plugin><artifactId>maven-clean-plugin</artifactId><version>3.1.0</version></plugin><plugin><artifactId>maven-resources-plugin</artifactId><version>3.0.2</version></plugin><plugin><artifactId>maven-compiler-plugin</artifactId><version>3.8.0</version></plugin><plugin><artifactId>maven-surefire-plugin</artifactId><version>2.22.1</version></plugin><plugin><artifactId>maven-jar-plugin</artifactId><version>3.0.2</version></plugin></plugins></pluginManagement></build> </project>

?

三、編程代碼?

public class MaxApp implements SparkConfInfo{public static class IntegerComp implements Comparator<Integer>, Serializable{@Overridepublic int compare(Integer o1, Integer o2) {return o1.compareTo(o2);}}public static void main(String[]args){String filePath = "E:\\spark\\number.txt";SparkSession sparkSession = new MaxApp().getSparkConf("MaxApp");JavaRDD<Integer> numbers = sparkSession.sparkContext().textFile(filePath, 4).toJavaRDD().flatMap(v -> Arrays.asList(v.split("\n")).iterator()).map(Integer::new).cache();Integer max = numbers.max(new IntegerComp());Integer min = numbers.min(new IntegerComp());Integer sum = numbers.reduce(Integer::sum);long count = numbers.count();System.out.println("max:" + max);System.out.println("min:" + min);System.out.println("sum:" + sum);System.out.println("count:" + count);System.out.println("avg:" + sum * 1.0 / count);sparkSession.stop();} }public interface SparkConfInfo {default SparkSession getSparkConf(String appName){SparkConf sparkConf = new SparkConf();if(System.getProperty("os.name").toLowerCase().contains("win")) {sparkConf.setMaster("local[4]");System.out.println("使用本地模擬是spark");}else{sparkConf.setMaster("spark://hadoop01:7077,hadoop02:7077,hadoop03:7077");sparkConf.set("spark.driver.host","192.168.150.1");//本地ip,必須與spark集群能夠相互訪問,如:同一個局域網sparkConf.setJars(new String[] {".\\out\\artifacts\\spark_test\\spark-test.jar"});//項目構建生成的路徑}SparkSession session = SparkSession.builder().appName(appName).config(sparkConf).config(sparkConf).getOrCreate();return session;} }

number.txt文件內容

100 24 43 774 43 37 78 42 68 89 49 543 36 888 258 538 79 6 67 99

輸出

max:888 min:6 sum:3861 count:20 avg:193.05

?

遇到的問題

使用函數接口報錯

Integer max = numbers.max(Integer::compareTo); org.apache.spark.SparkException: Task not serializableCaused by: java.io.NotSerializableException: com.mk.MaxApp$$Lambda$11/501991708 Serialization stack:- object not serializable (class: com.mk.MaxApp$$Lambda$11/501991708, value: com.mk.MaxApp$$Lambda$11/501991708@7fd26ad8)- field (class: scala.math.LowPriorityOrderingImplicits$$anon$7, name: cmp$2, type: interface java.util.Comparator)- object (class scala.math.LowPriorityOrderingImplicits$$anon$7, scala.math.LowPriorityOrderingImplicits$$anon$7@63b3ee82)- field (class: org.apache.spark.rdd.RDD$$anonfun$max$1, name: ord$10, type: interface scala.math.Ordering)- object (class org.apache.spark.rdd.RDD$$anonfun$max$1, <function0>)- field (class: org.apache.spark.rdd.RDD$$anonfun$max$1$$anonfun$apply$50, name: $outer, type: class org.apache.spark.rdd.RDD$$anonfun$max$1)- object (class org.apache.spark.rdd.RDD$$anonfun$max$1$$anonfun$apply$50, <function2>)

原因是函數接口對象實現沒有序列化接口,需要實現序列化接口Serializable

Integer max = numbers.max(new IntegerComp());public static class IntegerComp implements Comparator<Integer>, Serializable{@Overridepublic int compare(Integer o1, Integer o2) {return o1.compareTo(o2);}}

?


?

總結

以上是生活随笔為你收集整理的Spark入门(十二)之最值的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。