當(dāng)前位置：首頁 > 运维知识 > windows >内容正文

windows

maven依赖 spark sql_window环境运行spark-xgboost 8.1踩到的坑

發(fā)布時間：2023/12/10 windows 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 maven依赖 spark sql_window环境运行spark-xgboost 8.1踩到的坑小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

在window 環(huán)境下使用spark - xgboost會出現(xiàn)一些問題，這里記錄一下。

環(huán)境：window 7 + spark 2.31 + xgboost 8.1 + idea + maven

一.依賴以及代碼

數(shù)據(jù)集下載地址

UCI Machine Learning Repository: Iris Data Set?archive.ics.uci.edu

pom依賴

<dependency><groupId>ml.dmlc</groupId><artifactId>xgboost4j</artifactId><version>0.81</version> </dependency> <dependency><groupId>ml.dmlc</groupId><artifactId>xgboost4j-spark</artifactId><version>0.81</version> </dependency>

測試代碼

import org.apache.spark.ml.feature.{StringIndexer} import org.apache.spark.sql.types.{DoubleType, StringType, StructField, StructType} import org.apache.spark.ml.feature.VectorAssembler import org.apache.spark.sql. SparkSession import ml.dmlc.xgboost4j.scala.spark.{XGBoostClassificationModel, XGBoostClassifier} /*** author ：wy* todo ： xgboost鳶尾花分類* Created by pc-admin on 2020-03-12 11:21**/ object xgboostIrisDataTest {def main(args: Array[String]): Unit = {val ss = SparkSession.builder().master("local[4]").appName("xgboostRisiDataTest").getOrCreate()val dataPath = "iris.data"val schema = new StructType(Array(StructField("sepal lenght", DoubleType, true),StructField("sepal width", DoubleType, true),StructField("petal lenght", DoubleType, true),StructField("petal width", DoubleType, true),StructField("class", StringType, true)))val rawInput = ss.read.schema(schema).csv(dataPath)// 把字符串class轉(zhuǎn)換成數(shù)字數(shù)字classval stringIndexer = new StringIndexer().setInputCol("class").setOutputCol("classIndex").fit(rawInput)// 執(zhí)行進行轉(zhuǎn)換,并把原有的字符串class刪除掉val labelTransformed = stringIndexer.transform(rawInput).drop("class")// 將多個字段合并成在一起,組成futureval vectorAssembler = new VectorAssembler().setInputCols(Array("sepal lenght", "sepal width", "petal lenght", "petal width")).setOutputCol("features")//將數(shù)據(jù)集切分成訓(xùn)集和測試集val xgbInput = vectorAssembler.transform(labelTransformed).select("features", "classIndex")val splitXgbInput = xgbInput.randomSplit(Array(0.9, 0.1))val trainXgbInput = splitXgbInput(0)val testXgbInput = splitXgbInput(1)// 注意!!!這個num_workers 必須小于等于 local[4] 線程數(shù),否則會出現(xiàn)程序卡死現(xiàn)象.val xgbParam = Map("eta" -> 0.1f,"max_depth" -> 2,"objective" -> "multi:softprob","num_class" -> 3,"num_round" -> 100,"num_workers" -> 4)// 創(chuàng)建xgboost函數(shù),指定特征向量和標(biāo)簽val xgbClassifier = new XGBoostClassifier(xgbParam).setFeaturesCol("features").setLabelCol("classIndex")// 開始訓(xùn)練val xgbClassificationModel: XGBoostClassificationModel = xgbClassifier.fit(trainXgbInput)// 預(yù)測val result = xgbClassificationModel.transform(testXgbInput)// 展示 result.show(1000)ss.stop()} }

二.出現(xiàn)的Bug 以及解決方法

1.java.io.FileNotFoundException: File /lib/xgboost4j.dll was not found inside JAR.

進入 $MAVEN_HOMEconfrepositorymldmlc 找到 xgboost4j

找到你使用的版本，這里使用的是8.1，點擊。

用winRAR打開.

發(fā)現(xiàn)確實缺少 File /lib/xgboost4j.dll文件。

進入點擊以下鏈接。選擇你使用的版本

criteo-forks/xgboost-jars?github.com

點擊紅框下載jar包。

下載完成后，解壓，你會在lib文件夾下找到這個文件。

用WinRAR打開xgboost4j-8.1.jar之后，把下載的 xgboost4j-0.81-criteo-20180821_2.11-win64.jarlib 中的xgboost4j.dll 直接拉進MAVEN_HOMEonfrepositorymldmlcxgboost4j0.81xgboost4j-8.1.jarbin里

在嘗試運行一下，問題解決。

如果提示文件正在被使用，無法修改，請關(guān)閉idea即可。

2. XGBoostModel training failed

Exception in thread "main" ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed atml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:363) at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainDistributed(XGBoost.scala:334) at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:139) at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:36) at org.apache.spark.ml.Predictor.fit(Predictor.scala:90) at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithDataFrame(XGBoost.scala:191)

如果你也出現(xiàn)了這個bug，那么恭喜你，咱們的節(jié)奏對上了，這個問題我搞了一下午總結(jié)一下網(wǎng)上的幾種說法。

運行環(huán)境存在多個scala 和 java 版本
spark版本和xgboost版本不對應(yīng)，比如xgboost 9.0 必須對應(yīng)spark 2.4以上版本，xgboost 8.1 必須對應(yīng)spark 2.31以上版本。

我一一驗證，最后的結(jié)論都是不行的，于是一氣之下我重啟了一下計算機，您猜怎么著？？？奇跡的問題解決了。。。

結(jié)論：先重啟一下計算機，如果問題解決，你將節(jié)省一下午時間。。。

3 . 程序運行卡著不動的情況

出現(xiàn)這種情況就是你在初始化spark master的時候給的線程數(shù)小于你的work_number，切記:

master('local[m]')

work_number(n)

一定要 m >= n

三,運行結(jié)果

原標(biāo)簽: classIndex

預(yù)測標(biāo)簽 : prediction

真特喵的不容易~~~

參考資料:

sgboost AIP官方文檔

XGBoost4J-Spark Tutorial (version 0.8+)?xgboost.readthedocs.io

一個情況和我類似的國際友人

https://github.com/dmlc/xgboost/issues/2780?github.com

總結(jié)

以上是生活随笔為你收集整理的maven依赖 spark sql_window环境运行spark-xgboost 8.1踩到的坑的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 2016微信还信用卡手续费怎么算
下一篇： potplayer 多个进程_操作系统