日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【算法竞赛学习】心跳信号分类预测-模型融合

發(fā)布時間:2023/12/15 编程问答 43 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【算法竞赛学习】心跳信号分类预测-模型融合 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

Task 5: 模型融合

此部分為零基礎(chǔ)入門數(shù)據(jù)挖掘之心電圖分類的 Task5 建模融合部分,帶你來了解各種模型融合方法及策略,歡迎大家后續(xù)多多交流。

賽題:零基礎(chǔ)入門數(shù)據(jù)挖掘 - 心電圖分類預(yù)測

項目地址:

比賽地址:

5.1 學(xué)習(xí)目標(biāo)

  • 學(xué)習(xí)融合策略
  • 完成相應(yīng)學(xué)習(xí)打卡任務(wù)

5.2 內(nèi)容介紹

https://mlwave.com/kaggle-ensembling-guide/
https://github.com/MLWave/Kaggle-Ensemble-Guide

模型融合是比賽后期一個重要的環(huán)節(jié),大體來說有如下的類型方式。

  • 簡單加權(quán)融合:

    • 回歸(分類概率):算術(shù)平均融合(Arithmetic mean),幾何平均融合(Geometric mean);
    • 分類:投票(Voting)
    • 綜合:排序融合(Rank averaging),log融合
  • stacking/blending:

    • 構(gòu)建多層模型,并利用預(yù)測結(jié)果再擬合預(yù)測。
  • boosting/bagging(在xgboost,Adaboost,GBDT中已經(jīng)用到):

    • 多樹的提升方法
  • 5.3 相關(guān)理論介紹

    stacking具體原理詳解

  • https://www.cnblogs.com/yumoye/p/11024137.html
  • https://zhuanlan.zhihu.com/p/26890738
  • 5.4 代碼實例

    5.4.1 回歸\分類概率-融合:

    (1) 簡單加權(quán)平均,結(jié)果直接融合

    import numpy as np import pandas as pd from sklearn import metrics## 生成一些簡單的樣本數(shù)據(jù),test_prei 代表第i個模型的預(yù)測值 test_pre1 = [1.2, 3.2, 2.1, 6.2] test_pre2 = [0.9, 3.1, 2.0, 5.9] test_pre3 = [1.1, 2.9, 2.2, 6.0]# y_test_true 代表第模型的真實值 y_test_true = [1, 3, 2, 6] ## 定義結(jié)果的加權(quán)平均函數(shù) def Weighted_method(test_pre1,test_pre2,test_pre3,w=[1/3,1/3,1/3]):Weighted_result = w[0]*pd.Series(test_pre1)+w[1]*pd.Series(test_pre2)+w[2]*pd.Series(test_pre3)return Weighted_result# 各模型的預(yù)測結(jié)果計算MAE print('Pred1 MAE:',metrics.mean_absolute_error(y_test_true, test_pre1)) print('Pred2 MAE:',metrics.mean_absolute_error(y_test_true, test_pre2)) print('Pred3 MAE:',metrics.mean_absolute_error(y_test_true, test_pre3))## 根據(jù)加權(quán)計算MAE w = [0.3,0.4,0.3] # 定義比重權(quán)值 Weighted_pre = Weighted_method(test_pre1,test_pre2,test_pre3,w) print('Weighted_pre MAE:',metrics.mean_absolute_error(y_test_true, Weighted_pre)) Pred1 MAE: 0.1750000000000001 Pred2 MAE: 0.07499999999999993 Pred3 MAE: 0.10000000000000009 Weighted_pre MAE: 0.05750000000000027

    可以發(fā)現(xiàn)加權(quán)結(jié)果相對于之前的結(jié)果是有提升的,這種我們稱其為簡單的加權(quán)平均。
    還有一些特殊的形式,比如mean平均,median平均

    ## 定義結(jié)果的加權(quán)平均函數(shù) def Mean_method(test_pre1,test_pre2,test_pre3):Mean_result = pd.concat([pd.Series(test_pre1),pd.Series(test_pre2),pd.Series(test_pre3)],axis=1).mean(axis=1)return Mean_resultMean_pre = Mean_method(test_pre1,test_pre2,test_pre3) print('Mean_pre MAE:',metrics.mean_absolute_error(y_test_true, Mean_pre))## 定義結(jié)果的加權(quán)平均函數(shù) def Median_method(test_pre1,test_pre2,test_pre3):Median_result = pd.concat([pd.Series(test_pre1),pd.Series(test_pre2),pd.Series(test_pre3)],axis=1).median(axis=1)return Median_resultMedian_pre = Median_method(test_pre1,test_pre2,test_pre3) print('Median_pre MAE:',metrics.mean_absolute_error(y_test_true, Median_pre)) Mean_pre MAE: 0.06666666666666693 Median_pre MAE: 0.07500000000000007

    (2) Stacking融合(回歸)

    from sklearn import linear_modeldef Stacking_method(train_reg1,train_reg2,train_reg3,y_train_true,test_pre1,test_pre2,test_pre3,model_L2= linear_model.LinearRegression()):model_L2.fit(pd.concat([pd.Series(train_reg1),pd.Series(train_reg2),pd.Series(train_reg3)],axis=1).values,y_train_true)Stacking_result = model_L2.predict(pd.concat([pd.Series(test_pre1),pd.Series(test_pre2),pd.Series(test_pre3)],axis=1).values)return Stacking_result## 生成一些簡單的樣本數(shù)據(jù),test_prei 代表第i個模型的預(yù)測值 train_reg1 = [3.2, 8.2, 9.1, 5.2] train_reg2 = [2.9, 8.1, 9.0, 4.9] train_reg3 = [3.1, 7.9, 9.2, 5.0] # y_test_true 代表第模型的真實值 y_train_true = [3, 8, 9, 5] test_pre1 = [1.2, 3.2, 2.1, 6.2] test_pre2 = [0.9, 3.1, 2.0, 5.9] test_pre3 = [1.1, 2.9, 2.2, 6.0]# y_test_true 代表第模型的真實值 y_test_true = [1, 3, 2, 6] model_L2= linear_model.LinearRegression() Stacking_pre = Stacking_method(train_reg1,train_reg2,train_reg3,y_train_true,test_pre1,test_pre2,test_pre3,model_L2) print('Stacking_pre MAE:',metrics.mean_absolute_error(y_test_true, Stacking_pre)) Stacking_pre MAE: 0.04213483146067404

    可以發(fā)現(xiàn)模型結(jié)果相對于之前有進(jìn)一步的提升,這是我們需要注意的一點是,對于第二層Stacking的模型不宜選取的過于復(fù)雜,這樣會導(dǎo)致模型在訓(xùn)練集上過擬合,從而使得在測試集上并不能達(dá)到很好的效果。

    5.4.2 分類模型融合

    import numpy as np import lightgbm as lgb from sklearn.datasets import make_blobs from sklearn import datasets from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import VotingClassifier from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.datasets import make_moons from sklearn.metrics import accuracy_score,roc_auc_score from sklearn.model_selection import cross_val_score from sklearn.model_selection import StratifiedKFold

    (1) Voting投票機(jī)制

    Voting即投票機(jī)制,分為軟投票和硬投票兩種,其原理采用少數(shù)服從多數(shù)的思想。

    ''' 硬投票:對多個模型直接進(jìn)行投票,不區(qū)分模型結(jié)果的相對重要度,最終投票數(shù)最多的類為最終被預(yù)測的類。 ''' iris = datasets.load_iris()x=iris.data y=iris.target x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)clf1 = lgb.LGBMClassifier(learning_rate=0.1, n_estimators=150, max_depth=3, min_child_weight=2, subsample=0.7,colsample_bytree=0.6, objective='binary:logistic') clf2 = RandomForestClassifier(n_estimators=200, max_depth=10, min_samples_split=10,min_samples_leaf=63,oob_score=True) clf3 = SVC(C=0.1)# 硬投票 eclf = VotingClassifier(estimators=[('lgb', clf1), ('rf', clf2), ('svc', clf3)], voting='hard') for clf, label in zip([clf1, clf2, clf3, eclf], ['LGB', 'Random Forest', 'SVM', 'Ensemble']):scores = cross_val_score(clf, x, y, cv=5, scoring='accuracy')print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label)) Accuracy: 0.95 (+/- 0.05) [LGB] Accuracy: 0.33 (+/- 0.00) [Random Forest] Accuracy: 0.92 (+/- 0.03) [SVM] Accuracy: 0.95 (+/- 0.05) [Ensemble]

    (2) 分類的Stacking\Blending融合:

    stacking是一種分層模型集成框架。

    以兩層為例,第一層由多個基學(xué)習(xí)器組成,其輸入為原始訓(xùn)練集,第二層的模型則是以第一層基學(xué)習(xí)器的輸出作為訓(xùn)練集進(jìn)行再訓(xùn)練,從而得到完整的stacking模型, stacking兩層模型都使用了全部的訓(xùn)練數(shù)據(jù)。

    ''' 5-Fold Stacking ''' from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier,GradientBoostingClassifier import pandas as pd #創(chuàng)建訓(xùn)練的數(shù)據(jù)集 data_0 = iris.data data = data_0[:100,:]target_0 = iris.target target = target_0[:100]#模型融合中使用到的各個單模型 clfs = [LogisticRegression(solver='lbfgs'),RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]#切分一部分?jǐn)?shù)據(jù)作為測試集 X, X_predict, y, y_predict = train_test_split(data, target, test_size=0.3, random_state=2020)dataset_blend_train = np.zeros((X.shape[0], len(clfs))) dataset_blend_test = np.zeros((X_predict.shape[0], len(clfs)))#5折stacking n_splits = 5 skf = StratifiedKFold(n_splits) skf = skf.split(X, y)for j, clf in enumerate(clfs):#依次訓(xùn)練各個單模型dataset_blend_test_j = np.zeros((X_predict.shape[0], 5))for i, (train, test) in enumerate(skf):#5-Fold交叉訓(xùn)練,使用第i個部分作為預(yù)測,剩余的部分來訓(xùn)練模型,獲得其預(yù)測的輸出作為第i部分的新特征。X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test]clf.fit(X_train, y_train)y_submission = clf.predict_proba(X_test)[:, 1]dataset_blend_train[test, j] = y_submissiondataset_blend_test_j[:, i] = clf.predict_proba(X_predict)[:, 1]#對于測試集,直接用這k個模型的預(yù)測值均值作為新的特征。dataset_blend_test[:, j] = dataset_blend_test_j.mean(1)print("val auc Score: %f" % roc_auc_score(y_predict, dataset_blend_test[:, j]))clf = LogisticRegression(solver='lbfgs') clf.fit(dataset_blend_train, y) y_submission = clf.predict_proba(dataset_blend_test)[:, 1]print("Val auc Score of Stacking: %f" % (roc_auc_score(y_predict, y_submission))) val auc Score: 1.000000 val auc Score: 0.500000 val auc Score: 0.500000 val auc Score: 0.500000 val auc Score: 0.500000 Val auc Score of Stacking: 1.000000

    Blending,其實和Stacking是一種類似的多層模型融合的形式

    • 其主要思路是把原始的訓(xùn)練集先分成兩部分,比如70%的數(shù)據(jù)作為新的訓(xùn)練集,剩下30%的數(shù)據(jù)作為測試集。
    • 在第一層,我們在這70%的數(shù)據(jù)上訓(xùn)練多個模型,然后去預(yù)測那30%數(shù)據(jù)的label,同時也預(yù)測test集的label。
    • 在第二層,我們就直接用這30%數(shù)據(jù)在第一層預(yù)測的結(jié)果做為新特征繼續(xù)訓(xùn)練,然后用test集第一層預(yù)測的label做特征,用第二層訓(xùn)練的模型做進(jìn)一步預(yù)測

    其優(yōu)點在于

    • 比stacking簡單(因為不用進(jìn)行k次的交叉驗證來獲得stacker feature)
    • 避開了一個信息泄露問題:generlizers和stacker使用了不一樣的數(shù)據(jù)集

    缺點在于:

    • 使用了很少的數(shù)據(jù)(第二階段的blender只使用training set10%的量)
    • blender可能會過擬合
    • stacking使用多次的交叉驗證會比較穩(wěn)健
    ''' Blending '''#創(chuàng)建訓(xùn)練的數(shù)據(jù)集 #創(chuàng)建訓(xùn)練的數(shù)據(jù)集 data_0 = iris.data data = data_0[:100,:]target_0 = iris.target target = target_0[:100]#模型融合中使用到的各個單模型 clfs = [LogisticRegression(solver='lbfgs'),RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),#ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]#切分一部分?jǐn)?shù)據(jù)作為測試集 X, X_predict, y, y_predict = train_test_split(data, target, test_size=0.3, random_state=2020)#切分訓(xùn)練數(shù)據(jù)集為d1,d2兩部分 X_d1, X_d2, y_d1, y_d2 = train_test_split(X, y, test_size=0.5, random_state=2020) dataset_d1 = np.zeros((X_d2.shape[0], len(clfs))) dataset_d2 = np.zeros((X_predict.shape[0], len(clfs)))for j, clf in enumerate(clfs):#依次訓(xùn)練各個單模型clf.fit(X_d1, y_d1)y_submission = clf.predict_proba(X_d2)[:, 1]dataset_d1[:, j] = y_submission#對于測試集,直接用這k個模型的預(yù)測值作為新的特征。dataset_d2[:, j] = clf.predict_proba(X_predict)[:, 1]print("val auc Score: %f" % roc_auc_score(y_predict, dataset_d2[:, j]))#融合使用的模型 clf = GradientBoostingClassifier(learning_rate=0.02, subsample=0.5, max_depth=6, n_estimators=30) clf.fit(dataset_d1, y_d2) y_submission = clf.predict_proba(dataset_d2)[:, 1] print("Val auc Score of Blending: %f" % (roc_auc_score(y_predict, y_submission))) val auc Score: 1.000000 val auc Score: 1.000000 val auc Score: 1.000000 val auc Score: 1.000000 val auc Score: 1.000000 Val auc Score of Blending: 1.000000

    5.4.3 一些其它方法

    將特征放進(jìn)模型中預(yù)測,并將預(yù)測結(jié)果變換并作為新的特征加入原有特征中再經(jīng)過模型預(yù)測結(jié)果 (Stacking變化)
    (可以反復(fù)預(yù)測多次將結(jié)果加入最后的特征中)

    def Ensemble_add_feature(train,test,target,clfs):# n_flods = 5# skf = list(StratifiedKFold(y, n_folds=n_flods))train_ = np.zeros((train.shape[0],len(clfs*2)))test_ = np.zeros((test.shape[0],len(clfs*2)))for j,clf in enumerate(clfs):'''依次訓(xùn)練各個單模型'''# print(j, clf)'''使用第1個部分作為預(yù)測,第2部分來訓(xùn)練模型,獲得其預(yù)測的輸出作為第2部分的新特征。'''# X_train, y_train, X_test, y_test = X[train], y[train], X[test], y[test]clf.fit(train,target)y_train = clf.predict(train)y_test = clf.predict(test)## 新特征生成train_[:,j*2] = y_train**2test_[:,j*2] = y_test**2train_[:, j+1] = np.exp(y_train)test_[:, j+1] = np.exp(y_test)# print("val auc Score: %f" % r2_score(y_predict, dataset_d2[:, j]))print('Method ',j)train_ = pd.DataFrame(train_)test_ = pd.DataFrame(test_)return train_,test_ from sklearn.model_selection import cross_val_score, train_test_split from sklearn.linear_model import LogisticRegression clf = LogisticRegression()data_0 = iris.data data = data_0[:100,:]target_0 = iris.target target = target_0[:100]x_train,x_test,y_train,y_test=train_test_split(data,target,test_size=0.3) x_train = pd.DataFrame(x_train) ; x_test = pd.DataFrame(x_test)#模型融合中使用到的各個單模型 clfs = [LogisticRegression(),RandomForestClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='gini'),ExtraTreesClassifier(n_estimators=5, n_jobs=-1, criterion='entropy'),GradientBoostingClassifier(learning_rate=0.05, subsample=0.5, max_depth=6, n_estimators=5)]New_train,New_test = Ensemble_add_feature(x_train,x_test,y_train,clfs)clf = LogisticRegression() # clf = GradientBoostingClassifier(learning_rate=0.02, subsample=0.5, max_depth=6, n_estimators=30) clf.fit(New_train, y_train) y_emb = clf.predict_proba(New_test)[:, 1]print("Val auc Score of stacking: %f" % (roc_auc_score(y_test, y_emb))) Method 0 Method 1 Method 2 Method 3 Method 4 Val auc Score of stacking: 1.000000

    5.5 本賽題示例

    5.5.1 準(zhǔn)備工作

    準(zhǔn)備工作進(jìn)行內(nèi)容有:

  • 導(dǎo)入數(shù)據(jù)集并進(jìn)行簡單的預(yù)處理
  • 將數(shù)據(jù)集劃分成訓(xùn)練集和驗證集
  • 構(gòu)建單模:Random Forest,LGB,NN
  • 讀取并演示如何利用融合模型生成可提交預(yù)測數(shù)據(jù)
  • import pandas as pd import numpy as np import warnings import matplotlib import matplotlib.pyplot as plt import seaborn as snswarnings.filterwarnings('ignore') %matplotlib inlineimport itertools import matplotlib.gridspec as gridspec from sklearn import datasets from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import RandomForestClassifier,RandomForestRegressor # from mlxtend.classifier import StackingClassifier from sklearn.model_selection import cross_val_score, train_test_split # from mlxtend.plotting import plot_learning_curves # from mlxtend.plotting import plot_decision_regionsfrom sklearn.model_selection import StratifiedKFold from sklearn.model_selection import train_test_split from sklearn.model_selection import StratifiedKFold from sklearn.model_selection import train_test_split import lightgbm as lgb from sklearn.neural_network import MLPClassifier,MLPRegressor from sklearn.metrics import mean_squared_error, mean_absolute_error

    這里引入一個降內(nèi)存的函數(shù)。

    def reduce_mem_usage(df):start_mem = df.memory_usage().sum() / 1024**2 print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))for col in df.columns:col_type = df[col].dtypeif col_type != object:c_min = df[col].min()c_max = df[col].max()if str(col_type)[:3] == 'int':if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:df[col] = df[col].astype(np.int8)elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:df[col] = df[col].astype(np.int16)elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:df[col] = df[col].astype(np.int32)elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:df[col] = df[col].astype(np.int64) else:if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:df[col] = df[col].astype(np.float16)elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:df[col] = df[col].astype(np.float32)else:df[col] = df[col].astype(np.float64)else:df[col] = df[col].astype('category')end_mem = df.memory_usage().sum() / 1024**2 print('Memory usage after optimization is: {:.2f} MB'.format(end_mem))print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))return df train = pd.read_csv('./data/train.csv') test = pd.read_csv('./data/testA.csv')# 簡單預(yù)處理 train_list = [] for items in train.values:train_list.append([items[0]] + [float(i) for i in items[1].split(',')] + [items[2]])test_list = [] for items in test.values:test_list.append([items[0]] + [float(i) for i in items[1].split(',')])train = pd.DataFrame(np.array(train_list)) test = pd.DataFrame(np.array(test_list))# id列不算入特征 features = ['s_'+str(i) for i in range(len(train_list[0])-2)] train.columns = ['id'] + features + ['label'] test.columns = ['id'] + featurestrain = reduce_mem_usage(train) test = reduce_mem_usage(test) Memory usage of dataframe is 157.93 MB Memory usage after optimization is: 39.67 MB Decreased by 74.9% Memory usage of dataframe is 31.43 MB Memory usage after optimization is: 7.90 MB Decreased by 74.9% # 根據(jù)8:2劃分訓(xùn)練集和校驗集 X_train = train.drop(['id','label'], axis=1) y_train = train['label']# 測試集 X_test = test.drop(['id'], axis=1)# 第一次運(yùn)行可以先用一個subdata,這樣速度會快些 X_train = X_train.iloc[:50000,:20] y_train = y_train.iloc[:50000] X_test = X_test.iloc[:,:20]# 劃分訓(xùn)練集和測試集 X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2) # 單模函數(shù) def build_model_rf(X_train,y_train):model = RandomForestRegressor(n_estimators = 100)model.fit(X_train, y_train)return modeldef build_model_lgb(X_train,y_train):model = lgb.LGBMRegressor(num_leaves=63,learning_rate = 0.1,n_estimators = 100)model.fit(X_train, y_train)return modeldef build_model_nn(X_train,y_train):model = MLPRegressor(alpha=1e-05, hidden_layer_sizes=(5, 2), random_state=1,solver='lbfgs')model.fit(X_train, y_train)return model # 這里針對三個單模進(jìn)行訓(xùn)練,其中subA_rf/lgb/nn都是可以提交的模型 # 單模沒有進(jìn)行調(diào)參,因此是弱分類器,效果可能不是很好。print('predict rf...') model_rf = build_model_rf(X_train,y_train) val_rf = model_rf.predict(X_val) subA_rf = model_rf.predict(X_test)print('predict lgb...') model_lgb = build_model_lgb(X_train,y_train) val_lgb = model_lgb.predict(X_val) subA_lgb = model_rf.predict(X_test)print('predict NN...') model_nn = build_model_nn(X_train,y_train) val_nn = model_nn.predict(X_val) subA_nn = model_rf.predict(X_test) predict rf... predict lgb... predict NN...

    5.5.2 加權(quán)融合

    首先我們嘗試加權(quán)融合模型:

    • 如果沒有給權(quán)重矩陣,就是均值融合模型
    • 權(quán)重矩陣可以進(jìn)行自定義,這里我們是用三個單模進(jìn)行融合。如果有更多需要更改矩陣size
    # 加權(quán)融合模型,如果w沒有變,就是均值融合 def Weighted_method(test_pre1,test_pre2,test_pre3,w=[1/3,1/3,1/3]):Weighted_result = w[0]*pd.Series(test_pre1)+w[1]*pd.Series(test_pre2)+w[2]*pd.Series(test_pre3)return Weighted_result# 初始權(quán)重,可以進(jìn)行自定義,這里我們隨便設(shè)置一個權(quán)重 w = [0.2, 0.3, 0.5]val_pre = Weighted_method(val_rf,val_lgb,val_nn,w) MAE_Weighted = mean_absolute_error(y_val,val_pre) print('MAE of Weighted of val:',MAE_Weighted) MAE of Weighted of val: 0.09326

    這里單獨(dú)展示一下將多個單模預(yù)測結(jié)果融合成融和模型結(jié)果

    ## 預(yù)測數(shù)據(jù)部分 subA = Weighted_method(subA_rf,subA_lgb,subA_nn,w)## 生成提交文件 sub = pd.DataFrame() sub['SaleID'] = X_test.index sub['price'] = subA sub.to_csv('./sub_Weighted.csv',index=False)

    5.5.3 Stacking融合

    ## Stacking## 第一層 train_rf_pred = model_rf.predict(X_train) train_lgb_pred = model_lgb.predict(X_train) train_nn_pred = model_nn.predict(X_train)stacking_X_train = pd.DataFrame() stacking_X_train['Method_1'] = train_rf_pred stacking_X_train['Method_2'] = train_lgb_pred stacking_X_train['Method_3'] = train_nn_predstacking_X_val = pd.DataFrame() stacking_X_val['Method_1'] = val_rf stacking_X_val['Method_2'] = val_lgb stacking_X_val['Method_3'] = val_nnstacking_X_test = pd.DataFrame() stacking_X_test['Method_1'] = subA_rf stacking_X_test['Method_2'] = subA_lgb stacking_X_test['Method_3'] = subA_nn stacking_X_test.head() Method_1Method_2Method_301234
    0.00.00.0
    2.02.02.0
    3.03.03.0
    0.00.00.0
    0.00.00.0
    # 第二層是用random forest model_lr_stacking = build_model_rf(stacking_X_train,y_train)## 訓(xùn)練集 train_pre_Stacking = model_lr_stacking.predict(stacking_X_train) print('MAE of stacking:',mean_absolute_error(y_train,train_pre_Stacking))## 驗證集 val_pre_Stacking = model_lr_stacking.predict(stacking_X_val) print('MAE of stacking:',mean_absolute_error(y_val,val_pre_Stacking))## 預(yù)測集 print('Predict stacking...') subA_Stacking = model_lr_stacking.predict(stacking_X_test) MAE of stacking: 0.0 MAE of stacking: 0.03384 Predict stacking...

    5.6 經(jīng)驗總結(jié)

    模型融合是數(shù)據(jù)挖掘比賽后期上分的主要方式,尤其是進(jìn)行隊伍合并后,模型融合有很多優(yōu)勢。總結(jié)一下三個方面:

  • 結(jié)果層面的融合,這種是最常見的融合方法,其可行的融合方法也有很多,比如根據(jù)結(jié)果的得分進(jìn)行加權(quán)融合,還可以做Log,exp處理等。在做結(jié)果融合的時候。有一個很重要的條件是模型結(jié)果的得分要比較近似但結(jié)果的差異要比較大,這樣的結(jié)果融合往往有比較好的效果提升。如果不滿足這個條件帶來的效果很低,甚至是負(fù)效果。

  • 特征層面的融合,這個層面叫融合融合并不準(zhǔn)確,主要是隊伍合并后大家可以相互學(xué)習(xí)特征工程。如果我們用同種模型訓(xùn)練,可以把特征進(jìn)行切分給不同的模型,然后在后面進(jìn)行模型或者結(jié)果融合有時也能產(chǎn)生比較好的效果。

  • 模型層面的融合,模型層面的融合可能就涉及模型的堆疊和設(shè)計,比如加stacking,部分模型的結(jié)果作為特征輸入等,這些就需要多實驗和思考了,基于模型層面的融合最好不同模型類型要有一定的差異,用同種模型不同的參數(shù)的收益一般是比較小的。

  • 總結(jié)

    以上是生活随笔為你收集整理的【算法竞赛学习】心跳信号分类预测-模型融合的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。