日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

【Python学习系列二十四】scikit-learn库逻辑回归实现唯品会用户购买行为预测

發布時間:2025/4/16 python 46 豆豆
生活随笔 收集整理的這篇文章主要介紹了 【Python学习系列二十四】scikit-learn库逻辑回归实现唯品会用户购买行为预测 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1、背景:http://www.datafountain.cn/#/competitions/260/intro

? ? ? ? ? ? ? ? ? DataFountain上的唯品會用戶購買行為預測比賽題目,筆者用邏輯回歸實現,分數是0.48比較弱,代碼這里參考。


2、通過比賽提取的特征如下:

? ?

特征類別特征名特征說明訓練說明
基本特征u_id用戶唯一標識
spu_id商品唯一標識
brand_id商品所屬的品牌標識
cat_id商品所屬的品類標識
人的特征u_buy_num購買次數
u_click_num點擊次數
u_buy_date購買天數
u_click_date點擊天數
u_num_ratio購買點擊次數比,購買次數/點擊次數
u_date_ratio購買點擊天數比,購買天數/點擊天數
u_buy_freq購買頻率,購買次數/90天
u_click_freq點擊頻率,購買次數/90天
商品的特征spu_buy_num購買次數
spu_click_num點擊次數
spu_buy_date購買天數
spu_click_date點擊天數
spu_num_ratio購買點擊次數比,購買次數/點擊次數
spu_date_ratio購買點擊天數比,購買天數/點擊天數
spu_buy_freq購買頻率,購買次數/90天
spu_click_freq點擊頻率,購買次數/90天
品牌的特征brand_buy_num購買次數
brand_click_num點擊次數
brand_buy_date購買天數
brand_click_date點擊天數
brand_num_ratio購買點擊次數比,購買次數/點擊次數
brand_date_ratio購買點擊天數比,購買天數/點擊天數
brand_buy_freq購買頻率,購買次數/90天
brand_click_freq點擊頻率,購買次數/90天
品類的特征cat_buy_num購買次數
cat_click_num點擊次數
cat_buy_date購買天數
cat_click_date點擊天數
cat_num_ratio購買點擊次數比,購買次數/點擊次數
cat_date_ratio購買點擊天數比,購買天數/點擊天數
cat_buy_freq購買頻率,購買次數/90天
cat_click_freq點擊頻率,購買次數/90天
標記action_type該用戶是否會在當日購買此商品(0否,1是)

3、邏輯回歸參考代碼如下:

# -*- coding: utf-8 -*-import pandas as pd import time from sklearn import preprocessing from sklearn.linear_model import LogisticRegression from sklearn import metrics def main():#第一步:加載訓練集和測試集#加載帶標記數據label_ds=pd.read_csv(r"train_features_0714.txt",sep='\t',encoding='utf8',names=["u_id","u_buy_num","u_click_num","u_buy_date","u_click_date","u_num_ratio","u_date_ratio","u_buy_freq","u_click_freq","u_last_date",\"spu_id","spu_buy_num","spu_click_num","spu_buy_date","spu_click_date","spu_num_ratio","spu_date_ratio","spu_buy_freq","spu_click_freq","spu_last_date",\"brand_id","brand_buy_num","brand_click_num","brand_buy_date","brand_click_date","brand_num_ratio","brand_date_ratio","brand_buy_freq","brand_click_freq","brand_last_date",\"cat_id","cat_buy_num","cat_click_num","cat_buy_date","cat_click_date","cat_num_ratio","cat_date_ratio","cat_buy_freq","cat_click_freq","cat_last_date",\"action_type"]) #人特征label_ds["u_id"] = label_ds["u_id"].astype("int")label_ds["u_buy_num"] = label_ds["u_buy_num"].astype("int")label_ds["u_click_num"] = label_ds["u_click_num"].astype("int")label_ds["u_buy_date"] = label_ds["u_buy_date"].astype("int")label_ds["u_click_date"] = label_ds["u_click_date"].astype("int")label_ds["u_num_ratio"] = label_ds["u_num_ratio"].astype("float")label_ds["u_date_ratio"] = label_ds["u_date_ratio"].astype("float")label_ds["u_buy_freq"] = label_ds["u_buy_freq"].astype("float")label_ds["u_click_freq"] = label_ds["u_click_freq"].astype("float")label_ds["u_last_date"] = label_ds["u_last_date"].astype("int")#商品特征label_ds["spu_id"] = label_ds["spu_id"].astype("int")label_ds["spu_buy_num"] = label_ds["spu_buy_num"].astype("int")label_ds["spu_click_num"] = label_ds["spu_click_num"].astype("int")label_ds["spu_buy_date"] = label_ds["spu_buy_date"].astype("int")label_ds["spu_click_date"] = label_ds["spu_click_date"].astype("int")label_ds["spu_num_ratio"] = label_ds["spu_num_ratio"].astype("float")label_ds["spu_date_ratio"] = label_ds["spu_date_ratio"].astype("float")label_ds["spu_buy_freq"] = label_ds["spu_buy_freq"].astype("float")label_ds["spu_click_freq"] = label_ds["spu_click_freq"].astype("float")label_ds["spu_last_date"] = label_ds["spu_last_date"].astype("int")#品牌特征label_ds["brand_id"] = label_ds["brand_id"].astype("int")label_ds["brand_buy_num"] = label_ds["brand_buy_num"].astype("int")label_ds["brand_click_num"] = label_ds["brand_click_num"].astype("int")label_ds["brand_buy_date"] = label_ds["brand_buy_date"].astype("int")label_ds["brand_click_date"] = label_ds["brand_click_date"].astype("int")label_ds["brand_num_ratio"] = label_ds["brand_num_ratio"].astype("float")label_ds["brand_date_ratio"] = label_ds["brand_date_ratio"].astype("float")label_ds["brand_buy_freq"] = label_ds["brand_buy_freq"].astype("float")label_ds["brand_click_freq"] = label_ds["brand_click_freq"].astype("float")label_ds["brand_last_date"] = label_ds["brand_last_date"].astype("int")#品類特征label_ds["cat_id"] = label_ds["cat_id"].astype("int")label_ds["cat_buy_num"] = label_ds["cat_buy_num"].astype("int")label_ds["cat_click_num"] = label_ds["cat_click_num"].astype("int")label_ds["cat_buy_date"] = label_ds["cat_buy_date"].astype("int")label_ds["cat_click_date"] = label_ds["cat_click_date"].astype("int")label_ds["cat_num_ratio"] = label_ds["cat_num_ratio"].astype("float")label_ds["cat_date_ratio"] = label_ds["cat_date_ratio"].astype("float")label_ds["cat_buy_freq"] = label_ds["cat_buy_freq"].astype("float")label_ds["cat_click_freq"] = label_ds["cat_click_freq"].astype("float")label_ds["cat_last_date"] = label_ds["cat_last_date"].astype("int")#標記label_ds["action_type"] = label_ds["action_type"].astype("int")print "訓練集,有", label_ds.shape[0], "行", label_ds.shape[1], "列" #加載未標記數據unlabel_ds=pd.read_csv(r"test_features_0714.txt",sep='\t',encoding='utf8',names=["id","uid","spu_id","brand_id","cat_id",\"u_buy_num","u_click_num","u_buy_date","u_click_date","u_num_ratio","u_date_ratio","u_buy_freq","u_click_freq","u_last_date",\"spu_buy_num","spu_click_num","spu_buy_date","spu_click_date","spu_num_ratio","spu_date_ratio","spu_buy_freq","spu_click_freq","spu_last_date",\"brand_buy_num","brand_click_num","brand_buy_date","brand_click_date","brand_num_ratio","brand_date_ratio","brand_buy_freq","brand_click_freq","brand_last_date",\"cat_buy_num","cat_click_num","cat_buy_date","cat_click_date","cat_num_ratio","cat_date_ratio","cat_buy_freq","cat_click_freq","cat_last_date",]) #人特征unlabel_ds["id"] = unlabel_ds["id"].astype("int")unlabel_ds["u_id"] = unlabel_ds["u_id"].astype("int")unlabel_ds["u_buy_num"] = unlabel_ds["u_buy_num"].astype("int")#391萬unlabel_ds["u_click_num"] = unlabel_ds["u_click_num"].astype("int")unlabel_ds["u_buy_date"] = unlabel_ds["u_buy_date"].astype("int")unlabel_ds["u_click_date"] = unlabel_ds["u_click_date"].astype("int")unlabel_ds["u_num_ratio"] = unlabel_ds["u_num_ratio"].astype("float")unlabel_ds["u_date_ratio"] = unlabel_ds["u_date_ratio"].astype("float")unlabel_ds["u_buy_freq"] = unlabel_ds["u_buy_freq"].astype("float")unlabel_ds["u_click_freq"] = unlabel_ds["u_click_freq"].astype("float")unlabel_ds["u_last_date"] = unlabel_ds["u_last_date"].astype("int")#商品特征unlabel_ds["spu_id"] = unlabel_ds["spu_id"].astype("int")unlabel_ds["spu_buy_num"] = unlabel_ds["spu_buy_num"].astype("int")unlabel_ds["spu_click_num"] = unlabel_ds["spu_click_num"].astype("int")unlabel_ds["spu_buy_date"] = unlabel_ds["spu_buy_date"].astype("int")unlabel_ds["spu_click_date"] = unlabel_ds["spu_click_date"].astype("int")unlabel_ds["spu_num_ratio"] = unlabel_ds["spu_num_ratio"].astype("float")#241萬unlabel_ds["spu_date_ratio"] = unlabel_ds["spu_date_ratio"].astype("float")unlabel_ds["spu_buy_freq"] = unlabel_ds["spu_buy_freq"].astype("float")unlabel_ds["spu_click_freq"] = unlabel_ds["spu_click_freq"].astype("float")unlabel_ds["spu_last_date"] = unlabel_ds["spu_last_date"].astype("int")#品牌特征unlabel_ds["brand_id"] = unlabel_ds["brand_id"].astype("int")unlabel_ds["brand_buy_num"] = unlabel_ds["brand_buy_num"].astype("int")unlabel_ds["brand_click_num"] = unlabel_ds["brand_click_num"].astype("int")unlabel_ds["brand_buy_date"] = unlabel_ds["brand_buy_date"].astype("int")unlabel_ds["brand_click_date"] = unlabel_ds["brand_click_date"].astype("int")unlabel_ds["brand_num_ratio"] = unlabel_ds["brand_num_ratio"].astype("float")unlabel_ds["brand_date_ratio"] = unlabel_ds["brand_date_ratio"].astype("float")unlabel_ds["brand_buy_freq"] = unlabel_ds["brand_buy_freq"].astype("float")unlabel_ds["brand_click_freq"] = unlabel_ds["brand_click_freq"].astype("float")unlabel_ds["brand_last_date"] = unlabel_ds["brand_last_date"].astype("int")#品類特征unlabel_ds["cat_id"] = unlabel_ds["cat_id"].astype("int")unlabel_ds["cat_buy_num"] = unlabel_ds["cat_buy_num"].astype("int")unlabel_ds["cat_click_num"] = unlabel_ds["cat_click_num"].astype("int")unlabel_ds["cat_buy_date"] = unlabel_ds["cat_buy_date"].astype("int")unlabel_ds["cat_click_date"] = unlabel_ds["cat_click_date"].astype("int")unlabel_ds["cat_num_ratio"] = unlabel_ds["cat_num_ratio"].astype("float")unlabel_ds["cat_date_ratio"] = unlabel_ds["cat_date_ratio"].astype("float")unlabel_ds["cat_buy_freq"] = unlabel_ds["cat_buy_freq"].astype("float")unlabel_ds["cat_click_freq"] = unlabel_ds["cat_click_freq"].astype("float")unlabel_ds["cat_last_date"] = unlabel_ds["cat_last_date"].astype("int")print "測試集,有", unlabel_ds.shape[0], "行", unlabel_ds.shape[1], "列" #模型訓練ds_0=label_ds[label_ds['action_type']==0]#標記為0的樣本ds_0_train=ds_0.sample(frac=0.01)#抽0.01出來訓練ds_1=label_ds[label_ds['action_type']==1]#標記為1的樣本ds_train=ds_1.append(ds_0_train)label_X=ds_train[['u_num_ratio','spu_num_ratio','brand_num_ratio','cat_num_ratio']]label_X_scale=preprocessing.scale(label_X)#歸一化label_y = ds_train['action_type']#類別 ds=label_ds[label_ds['action_type']==0]model =LogisticRegression()#ensemble.GradientBoostingClassifier()model.fit(label_X_scale, label_y) #第五步:模型驗證和選擇test_df=ds_train.sample(frac=0.2)#抽0.2驗證test_X=test_df[['u_num_ratio','spu_num_ratio','brand_num_ratio','cat_num_ratio']]test_X_scale=preprocessing.scale(test_X)#歸一化test_y=test_df['action_type']#類別predicted = model.predict(test_X_scale) f1_score = metrics.f1_score(test_y, predicted) #模型評估 print f1_score#第六步:模型預測unlabe_X = unlabel_ds[['u_num_ratio','spu_num_ratio','brand_num_ratio','cat_num_ratio']]unlabe_X_scale=preprocessing.scale(unlabe_X)#歸一化unlabel_y=model.predict_proba(unlabe_X_scale)[:,1]#預測返回概率值,通過概率值閾值選擇正例樣本 out_y=pd.DataFrame(unlabel_y,columns=['prob']) #返回判定正例的比例 out_y["prob"]=out_y["prob"].apply(lambda x: '{0:.3f}'.format(x))out_1=out_y[out_y["prob"]>'0.5'] #看大于0.5的個數print out_1.shapeout_y['prob'].value_counts() #看值分布out_y.to_csv('fangjs/outvip.txt',index=False,header=None)#輸出預測數據 #執行 if __name__ == '__main__': start = time.clock() main()end = time.clock() print('finish all in %s' % str(end - start))

總結

以上是生活随笔為你收集整理的【Python学习系列二十四】scikit-learn库逻辑回归实现唯品会用户购买行为预测的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 国产美女又黄又爽又色视频免费 | 天堂av一区二区 | 成人欧美在线视频 | 国产在线视频在线观看 | 午夜影院一区二区三区 | 中文字幕一区二区三区视频 | 国产一级片a | 日韩av首页 | 成人自拍视频在线观看 | mm1313亚洲国产精品美女 | 欧美性生活一级 | 中国女人毛茸茸 | 亚洲欧美日本国产 | 日韩av无码久久 | h小视频在线观看 | 日本国产一区 | 日本一本视频 | 久久国产精品偷 | 国产无遮挡又黄又爽免费网站 | 久久激情免费视频 | 免费观看久久 | 在线xxxx | 日韩免费影视 | 黄色草逼视频 | 神马午夜视频 | 黄色片免费播放 | 91视频高清 | 五月天超碰 | 粗大挺进潘金莲身体在线播放 | 国产做爰xxxⅹ高潮视频12p | 亚洲av无码电影在线播放 | 在线午夜av | 无码h肉动漫在线观看 | 免费高清视频一区二区三区 | 欧洲最强rapper网站直播 | 国产精品激情偷乱一区二区∴ | 免费观看黄色av | 国产亚洲自拍av | 久久久久久久av | 欧美一级艳片视频免费观看 | 国产精品白嫩极品美女 | 精品国产999 | 亚洲v国产v欧美v久久久久久 | 精品国产99久久久久久 | 精品国产999久久久免费 | 国产真实的和子乱拍在线观看 | 一级黄色伦理片 | 真实人妻互换毛片视频 | 午夜香蕉| 欧美成人综合色 | 香蕉视频免费在线观看 | 在厨房拨开内裤进入毛片 | 激情一区二区三区 | 五月天婷婷综合 | 国产精品无码AV无码国产 | 久久久999精品视频 国产在线xx | 欧美极品少妇 | 麻豆成人91精品二区三区 | 中文字幕在线成人 | 亚洲韩国精品 | 国产乱人伦精品 | 一道本久在线中文字幕 | 欧美日韩久久 | 中文字幕一区二 | 久久这里只有精品23 | 中文字字幕在线中文 | 最近中文字幕在线中文高清版 | 欧美成人黑人xx视频免费观看 | 久久66热这里只有精品 | 一区二区三区日 | 久久久精品久久久久 | 日韩手机在线观看 | 中文字幕第18页 | 日本少妇激三级做爰在线 | n0659极腔濑亚美莉在线播放播放 | 国产成人无码精品久在线观看 | 成人免费观看a | 蜜臀av性久久久久av蜜臀妖精 | 久久久久国产视频 | 国产精品jizz在线观看老狼 | 日本精品少妇 | 免费黄色看片网站 | 国产精品福利影院 | 亚洲午夜无码久久久久 | 在线国产一区二区 | 日韩一区二区精品视频 | 国产精品国产三级国产aⅴ无密码 | 日本理论中文字幕 | 国产精品网站在线 | 性活交片大全免费看 | 国产美女特级嫩嫩嫩bbb | 新呦u视频一区二区 | 天天干视频在线观看 | 99久久影视| 欧美mv日韩mv国产网站 | 国产911| 3d动漫精品啪啪一区二区三区免费 | 久久中文字幕在线 | 深夜福利影院 |