日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當(dāng)前位置: 首頁 > 编程语言 > python >内容正文

python

【Python学习系列十七】基于scikit-learn库逻辑回归训练模型(delta比赛代码2)

發(fā)布時間:2025/4/16 python 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【Python学习系列十七】基于scikit-learn库逻辑回归训练模型(delta比赛代码2) 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

機器學(xué)習(xí)任務(wù)流程:學(xué)習(xí)任務(wù)定義->數(shù)學(xué)建模->訓(xùn)練樣本采樣->特征分析和抽取->算法設(shè)計和代碼->模型訓(xùn)練和優(yōu)化(性能評估和度量)->泛化能力評估(重采樣和重建模);


算法思路:應(yīng)用半監(jiān)督學(xué)習(xí)思路,先用訓(xùn)練集訓(xùn)練出一個模型,然后用模型給預(yù)測集打標(biāo)簽,之后將打上標(biāo)簽的預(yù)測集也加入到訓(xùn)練集中用模型再訓(xùn)練,用f1-scror作為性能評估的依據(jù)。這個代碼和之前比,主要是增加model.predict_proba()函數(shù)返回正例概率,自己設(shè)置閾值來選擇正例樣本。代碼如下:

# -*- coding: utf-8 -*-import pandas as pd import time from sklearn import metrics from sklearn.linear_model import LogisticRegression from sklearn import preprocessing #from sklearn.tree import DecisionTreeClassifier def main():#省份和地市映射data = {"province":['河北省', '山西省', '內(nèi)蒙古自治區(qū)', '遼寧省', '吉林省', '黑龍江省', '江蘇省', '浙江省', '安徽省', '福建省', '江西省', '山東省', '河南省', '湖北省', '湖南省', '廣東省', '廣西壯族自治區(qū)', '海南省', '四川省', '貴州省', '云南省', '西藏自治區(qū)', '陜西省', '甘肅省', '青海省', '寧夏回族自治區(qū)', '新疆維吾爾自治區(qū)', '北京市', '天津市', '上海市', '重慶市'],"pro_code":[13,14,15,21,22,23,32,33,34,35,36,37,41,42,43,44,45,46,51,52,53,54,61,62,63,64,65,11,12,31,50]}province = pd.DataFrame(data, columns = ["province", "pro_code"])citydata=pd.read_csv(r"D:\city.csv")#加載地市映射表#加載帶標(biāo)記數(shù)據(jù)label_ds=pd.read_csv(r"D:\label.csv")label_ds = pd.merge(label_ds, province, how = "left", on = "province")label_ds = pd.merge(label_ds, citydata, how = "left", on = "city")label_df = pd.DataFrame(label_ds[['denomination','min_amount','pro_code','age','sex','account_age','txn_count','use_nums',\'txn_min_amount','txn_amount_mean','avg_discount','voucher_num','avg_txn_amt',\'use_ratio','voucher_ratio','batch_no','voucher_no','city_id','label']])label_df["denomination"] = label_df["denomination"].astype("int")label_df["min_amount"] = label_df["min_amount"].astype("int")label_df["pro_code"] = label_df["pro_code"].astype("int") label_df["age"] = label_df["age"].astype("int")label_df["sex"] = label_df["sex"].astype("int")label_df["account_age"] = label_df["account_age"].astype("int")label_df["txn_count"] = label_df["txn_count"].astype("int")label_df["use_nums"] = label_df["use_nums"].astype("int")label_df["txn_min_amount"] = label_df["txn_min_amount"].astype("int")label_df["txn_amount_mean"] = label_df["txn_amount_mean"].astype("int")label_df["avg_discount"] = label_df["avg_discount"].astype("int")label_df["voucher_num"] = label_df["voucher_num"].astype("int")label_df["avg_txn_amt"] = label_df["avg_txn_amt"].astype("int")label_df["use_ratio"] = label_df["use_ratio"].astype("float")label_df["voucher_ratio"] = label_df["voucher_ratio"].astype("float")label_df["batch_no"] = label_df["batch_no"].astype("int")label_df["voucher_no"] = label_df["voucher_no"].astype("str")label_df["city_id"] = label_df["city_id"].astype("int")label_df["label"] = label_df["label"].astype("int")#加載未標(biāo)記數(shù)據(jù)unlabel_ds=pd.read_csv(r"D:\unlabel.csv")unlabel_ds = pd.merge(unlabel_ds, province, how = "left", on = "province")unlabel_ds = pd.merge(unlabel_ds, citydata, how = "left", on = "city")unlabel_df = pd.DataFrame(unlabel_ds[['denomination','min_amount','pro_code','age','sex','account_age','txn_count','use_nums',\'txn_min_amount','txn_amount_mean','avg_discount','voucher_num','avg_txn_amt',\'use_ratio','voucher_ratio','batch_no','city_id','phone','voucher_no']]) unlabel_df["denomination"] = unlabel_df["denomination"].astype("int")unlabel_df["min_amount"] = unlabel_df["min_amount"].astype("int") unlabel_df["pro_code"] = unlabel_df["pro_code"].astype("int") unlabel_df["age"] = unlabel_df["age"].astype("int")unlabel_df["sex"] = unlabel_df["sex"].astype("int")unlabel_df["account_age"] = unlabel_df["account_age"].astype("int")unlabel_df["txn_count"] = unlabel_df["txn_count"].astype("int")unlabel_df["use_nums"] = unlabel_df["use_nums"].astype("int")unlabel_df["txn_min_amount"] = unlabel_df["txn_min_amount"].astype("int")unlabel_df["txn_amount_mean"] = unlabel_df["txn_amount_mean"].astype("int")unlabel_df["avg_discount"] = unlabel_df["avg_discount"].astype("int")unlabel_df["voucher_num"] = unlabel_df["voucher_num"].astype("int")unlabel_df["avg_txn_amt"] = unlabel_df["avg_txn_amt"].astype("int")unlabel_df["use_ratio"] = unlabel_df["use_ratio"].astype("float")unlabel_df["voucher_ratio"] = unlabel_df["voucher_ratio"].astype("float")unlabel_df["batch_no"] = unlabel_df["batch_no"].astype("int")unlabel_df["city_id"] = unlabel_df["city_id"].astype("int")unlabel_df["phone"] = unlabel_df["phone"].astype("str")unlabel_df["voucher_no"] = unlabel_df["voucher_no"].astype("str") #模型訓(xùn)練和預(yù)測f1_score_old=float(0)#f1-scoref1_score=float(0.3)#高于全部設(shè)置1的分?jǐn)?shù)outset=[]flag=int(1) label_df_cons=label_df#訓(xùn)練樣本數(shù)不變while (f1_score-f1_score_old)>0.0001 :#迭代收斂到f1-score不再提升if flag==0 :#第一次訓(xùn)練排除樣本數(shù)量帶來的問題f1_score_old=f1_score#訓(xùn)練數(shù)據(jù)采樣,80%訓(xùn)練,20%驗證 print "總樣本,有", label_df.shape[0], "行", label_df.shape[1], "列"train_label_df=label_df#全量訓(xùn)練,ample(frac=0.8) print "訓(xùn)練集,有", train_label_df.shape[0], "行", train_label_df.shape[1], "列"test_label_df=label_df_cons.sample(frac=0.3) #用訓(xùn)練集來測試f1-scoreprint "驗證集,有", test_label_df.shape[0], "行", test_label_df.shape[1], "列"#模型訓(xùn)練label_X = train_label_df[['pro_code','city_id','age','sex','account_age',\'txn_count','txn_amount_mean','txn_min_amount']]label_X = preprocessing.scale(label_X)#歸一化label_y = train_label_df['label']model = LogisticRegression()#if flag==0 :# model = LogisticRegression()#邏輯回歸,第一次預(yù)訓(xùn)練#else :# model = DecisionTreeClassifier()#決策樹model.fit(label_X, label_y)if flag==0 :#模型驗證,第一次訓(xùn)練不評分expected = test_label_df['label']predicted_X=test_label_df[['pro_code','city_id','age','sex','account_age',\'txn_count','txn_amount_mean','txn_min_amount']]predicted_X=preprocessing.scale(predicted_X)#歸一化predicted = model.predict(predicted_X)f1_score = metrics.f1_score(expected, predicted) #模型評估print f1_scoreflag=int(0)if f1_score_old<f1_score :#為未標(biāo)記樣本打上標(biāo)記,然后加入訓(xùn)練集unlabel_X=unlabel_df[['pro_code','city_id','age','sex','account_age',\'txn_count','txn_amount_mean','txn_min_amount']]unlabel_X_noScale=unlabel_Xunlabel_X=preprocessing.scale(unlabel_X)#歸一化unlabel_y=model.predict(unlabel_X)out_y=pd.DataFrame(unlabel_y.reshape(-1,1),columns=['label'])unlabel_X_new=unlabel_X_noScale.join(out_y,how='left')label_df=pd.DataFrame()#原樣本清空label_df=label_df_cons.append(unlabel_X_new)#構(gòu)成新的訓(xùn)練集else : #迭代訓(xùn)練結(jié)束,輸出結(jié)果unlabel_X=unlabel_df[['pro_code','city_id','age','sex','account_age',\'txn_count','txn_amount_mean','txn_min_amount']]unlabel_info = unlabel_df[['phone','voucher_no']]unlabel_X=preprocessing.scale(unlabel_X)#歸一化unlabel_y=model.predict_proba(unlabel_X)[:,1]#預(yù)測返回概率值,通過概率值閾值選擇正例樣本out_y=pd.DataFrame(unlabel_y,columns=['prob']) #返回判定正例的比例outset=unlabel_info.join(out_y,how='left')#輸出結(jié)果outset["label"] = outset.apply(lambda x: 0 if x["prob"] <0.57 else 1, axis = 1)outset= outset[outset['label']==1] outset=outset[['phone','voucher_no','label']]outsetds=pd.DataFrame(outset)outsetds.to_csv('D:\gd_delta.csv',index=False,header=None)#輸出預(yù)測數(shù)據(jù)#評價f1#unlabel_X=pd.DataFrame(unlabel_X,columns=['pro_code','city_id','age','sex','account_age',\# 'txn_count','txn_amount_mean','txn_min_amount'])#print unlabel_X.head(5)#outset=unlabel_X.join(out_y,how='left')#輸出結(jié)果#outset["label"] = outset.apply(lambda x: 0 if x["prob"] <0.57 else 1, axis = 1)#expected = outset['label']#predicted_X=outset[['pro_code','city_id','age','sex','account_age',\# 'txn_count','txn_amount_mean','txn_min_amount']]#predicted_X=preprocessing.scale(predicted_X)#歸一化#predicted = model.predict(predicted_X)#f1_score = metrics.f1_score(expected, predicted) #模型評估#print f1_score#0.855946148093#退出循環(huán)break#執(zhí)行 if __name__ == '__main__': start = time.clock() main()end = time.clock() print('finish all in %s' % str(end - start))
繼續(xù)提升有三點:

1)可以嘗試給預(yù)測集打標(biāo)簽用一個模型,迭代訓(xùn)練用另一個模型;

2)可以嘗試抽取不同的特征來建模,其次對特征值做離散化處理;

3)可以嘗試用部分特征來預(yù)訓(xùn)練,另一部分特征來做訓(xùn)練模型,可以降低過擬合問題;

總結(jié)

以上是生活随笔為你收集整理的【Python学习系列十七】基于scikit-learn库逻辑回归训练模型(delta比赛代码2)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 欧美性插动态图 | 九九九九精品九九九九 | 精品免费视频一区二区 | 亚洲欧美在线综合 | 黄色一及片 | 成人黄色激情小说 | 亚洲永久无码精品 | 黄色片在哪看 | 色播一区二区 | 久热综合 | 黄色片视频免费 | 肮脏的交易在线观看 | 女人张开双腿让男人捅 | 日韩一页 | 丝袜ol美脚秘书在线播放 | 国产超碰人人 | 亚洲免费视频大全 | 69亚洲精品久久久蜜桃小说 | 久久大胆视频 | 娇小萝被两个黑人用半米长 | 黄色网址在线免费播放 | 成人久久18免费网站图片 | 91一区 | 外国a级片 | 国产剧情一区二区三区 | 麻豆av影视 | 老司机黄色片 | 午夜精品久久99蜜桃的功能介绍 | 一级黄色录像大片 | 免费av网站在线观看 | 重口变态虐黄网站 | 欧美性大战久久久 | 亚洲影音| 美日韩丰满少妇在线观看 | 日本久久精品 | 快播怡红院 | 人妻 丝袜美腿 中文字幕 | 九七av| 激情五月综合网 | 国产精品国产精品国产专区不片 | 97人人爽人人爽人人爽人人爽 | 手机看片一区二区三区 | 欧洲精品一区二区三区久久 | 韩国av免费在线观看 | 成人黄色免费视频 | 国产肥熟 | 成人毛片18女人毛片免费 | 午夜福利视频合集1000 | 爆操白虎 | 久久高清精品 | 亚洲国产理论 | 中文字幕一区二区三三 | 欧美xxxx83d | 亚洲妇熟xx妇色黄蜜桃 | 国产特级av | www插插插| 欧美日韩视频在线观看免费 | 欧美xxxx日本和非洲 | 成人黄色激情网 | 97精品视频在线观看 | 国产精品美女久久久久图片 | 免费在线观看毛片视频 | 亚洲天堂手机在线观看 | 欧美日批视频 | 老妇裸体性激交老太视频 | 九九av| 亚洲欧美伦理 | 插综合| 日韩av中文在线观看 | 黄色精彩视频 | av作品在线 | 少妇又色又紧又黄又刺激免费 | 日日噜噜噜夜夜爽爽狠狠 | 日精品| 日本一本久草 | 不卡av影院 | 好吊妞视频在线 | 亚洲欧洲综合网 | 欧美日韩精品一二三区 | 黄色99 | 强开乳罩摸双乳吃奶羞羞www | 亚洲激情视频网站 | 国产高清亚洲 | 欧美成人不卡 | 4438成人网 | 毛片基地在线播放 | 久久女同互慰一区二区三区 | 久久视频热 | 亲吻刺激视频 | 亚洲aaaaa特级 | 一级片日韩 | 国产欧美精品一区 | 奇米影视一区 | 日韩aaaaa| 又欲又污又肉又黄短文 | 一级片免费网站 | 亚洲成人伦理 | 粗喘呻吟撞击猛烈疯狂 | 国产福利小视频 |