日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

spark设置分区(并行度):保存分区信息文件

發布時間:2025/1/21 编程问答 20 豆豆
生活随笔 收集整理的這篇文章主要介紹了 spark设置分区(并行度):保存分区信息文件 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

代碼

package com.atguigu.bigdata.spark.core.rdd.builderimport org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext}object Spark01_RDD_Memory_Par {def main(args: Array[String]): Unit = {// TODO 準備環境val sparkConf = new SparkConf().setMaster("local[*]").setAppName("RDD")sparkConf.set("spark.default.parallelism", "5")val sc = new SparkContext(sparkConf)// TODO 創建RDD// RDD的并行度 & 分區// makeRDD方法可以傳遞第二個參數,這個參數表示分區的數量// 第二個參數可以不傳遞的,那么makeRDD方法會使用默認值 : defaultParallelism(默認并行度)// scheduler.conf.getInt("spark.default.parallelism", totalCores)// spark在默認情況下,從配置對象中獲取配置參數:spark.default.parallelism// 如果獲取不到,那么使用totalCores屬性,這個屬性取值為當前運行環境的最大可用核數//val rdd = sc.makeRDD(List(1,2,3,4),2)val rdd = sc.makeRDD(List(1,2,3,4))// 將處理的數據保存成分區文件rdd.saveAsTextFile("output")while (true){}// TODO 關閉環境sc.stop()} }

pom

<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.zxl.bigdata</groupId><artifactId>spark-core</artifactId><version>1.0.0</version><dependencies><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.12</artifactId><version>3.0.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-yarn_2.12</artifactId><version>3.0.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.12</artifactId><version>3.0.0</version></dependency><dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.27</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-hive_2.12</artifactId><version>3.0.0</version></dependency><dependency><groupId>org.apache.hive</groupId><artifactId>hive-exec</artifactId><version>1.2.1</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_2.12</artifactId><version>3.0.0</version></dependency><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming-kafka-0-10_2.12</artifactId><version>3.0.0</version></dependency><dependency><groupId>com.fasterxml.jackson.core</groupId><artifactId>jackson-core</artifactId><version>2.10.1</version></dependency><!-- https://mvnrepository.com/artifact/com.alibaba/druid --><dependency><groupId>com.alibaba</groupId><artifactId>druid</artifactId><version>1.1.10</version></dependency></dependencies></project>

log4j.properties

log4j.rootCategory=ERROR, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n# Set the default spark-shell log level to ERROR. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=ERROR# Settings to quiet third party logs that are too verbose log4j.logger.org.spark_project.jetty=ERROR log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

總結

以上是生活随笔為你收集整理的spark设置分区(并行度):保存分区信息文件的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。