用Python生成人人贷借款理由词云图
目錄
一、寫在前面
二、關(guān)于代碼
三、一些詞云圖
3.1 篩選條件:無
3.2 篩選條件:性別-男
3.3 篩選條件:性別-女
3.4 篩選條件:教程程度-研究生及其以上
3.5 篩選條件:教育程度-本科?
3.6 篩選條件:籍貫-福建
3.7 篩選條件:籍貫-廣東
3.8 篩選條件:借款理由-含“蘋果”兩字
?四、代碼
4.1 導(dǎo)入庫
4.2 導(dǎo)入數(shù)據(jù)
4.3 設(shè)置停用詞
4.4 生成詞云圖代碼
五、寫在最后
一、寫在前面
關(guān)于人人貸的歷史博文:人人貸散標(biāo)爬蟲實例_小zhan柯基-CSDN博客_人人貸爬蟲、人人貸散標(biāo)爬蟲實例進(jìn)階-使用異步io_小zhan柯基-CSDN博客、用python處理28萬條人人貸數(shù)據(jù),告訴你最詳細(xì)的借款人結(jié)構(gòu)分布情況_小zhan柯基-CSDN博客
上篇關(guān)于人人貸的博文中提到3點,一是可以繼續(xù)挖掘數(shù)據(jù),比如分析各個年齡段的學(xué)歷分布什么的;二是可以利用人人貸的數(shù)據(jù)訓(xùn)練信用評價的神經(jīng)網(wǎng)絡(luò)模型;三是可以利用借款理由這一列數(shù)據(jù)生成詞云圖。
由于最近忙著進(jìn)行區(qū)塊鏈與供應(yīng)鏈金融的相關(guān)研究,所以這次就先挑軟柿子捏吧,生成個詞云圖還是特別快的。
最后,有需要人人貸貸款數(shù)據(jù)的私信我!
二、關(guān)于代碼
生成詞語圖的方法就不贅述啦哈,網(wǎng)上一搜教程一大堆,例如Python制作炫酷的詞云圖(包含停用詞、詞頻統(tǒng)計)!!!_gjgfjgy的博客-CSDN博客_停用詞分析、繪制詞云圖
EDG奪冠,用Python分析一波:粉絲都炸鍋了_數(shù)據(jù)分析與統(tǒng)計學(xué)之美-CSDN博客
這里提一點關(guān)于
pandas一個比較常用的用法:篩選包含某個關(guān)鍵詞的行/列!
首先數(shù)據(jù)如上圖,共含有284316條借款理由的數(shù)據(jù),如果我要找出借款理由里含有“蘋果”兩個字的數(shù)據(jù)應(yīng)該怎么做呢?
conciseData[conciseData["借款理由"].str.contains("蘋果",na=False)]["借款理由"]從上圖可以看出,借款買蘋果手機(jī)的數(shù)據(jù)只有646條,占比0.23%,看來買借款買蘋果手機(jī)的并不多哈哈哈哈。
三、一些詞云圖
3.1 篩選條件:無
3.2 篩選條件:性別-男
3.3 篩選條件:性別-女
3.4 篩選條件:教程程度-研究生及其以上
3.5 篩選條件:教育程度-本科?
3.6 篩選條件:籍貫-福建
3.7 篩選條件:籍貫-廣東
3.8 篩選條件:借款理由-含“蘋果”兩字
?四、代碼
4.1 導(dǎo)入庫
import numpy as np import matplotlib.pyplot as plt import pandas as pdimport matplotlib.ticker as ticker import mpl_toolkits.axisartist as AA from mpl_toolkits.axisartist.axislines import SubplotZero import pylabimport jieba from wordcloud import WordCloudpylab.mpl.rcParams['font.sans-serif'] = ['SimHei'] #顯示中文 plt.rcParams['axes.unicode_minus']=False #用于解決不能顯示負(fù)號的問題4.2 導(dǎo)入數(shù)據(jù)
data = pd.read_csv("all.csv",encoding="gbk",header=None,parse_dates=True) data.columns = ["id","借款時間(月)","剩余還款時間(月)","借款金額","notPayInterest","productRepayType","貸款類型","利率","性別","籍貫","出生日期","教育程度","工作單位","行業(yè)","公司規(guī)模","職位","收入","車貸","汽車數(shù)量","婚姻狀況","房貸","房子數(shù)量","信用等級","none","none","none","借款理由"]conciseData = data[["id","借款時間(月)","剩余還款時間(月)","借款金額","貸款類型","利率","性別","籍貫","出生日期","教育程度","工作單位","行業(yè)","公司規(guī)模","職位","收入","車貸","汽車數(shù)量","婚姻狀況","房貸","房子數(shù)量","信用等級","借款理由"]] conciseData = conciseData.set_index("id") conciseData = conciseData.dropna(how="all")4.3 設(shè)置停用詞
stopWords = ["人人","真實有效","同時","符合","借款人","提供","上述","考察","實地","已經(jīng)","希望","大家","認(rèn)證","審核","此次","公司","眾信","借款","謝謝","比較","第一次","壓力","貸","的","標(biāo)準(zhǔn)","方友","業(yè)","還款","收入","用于","信息","以上","問題","好","一下","通過","穩(wěn)定","全國","企業(yè)","位于","該","為","自己","現(xiàn)居","工作","單位","但","高","一些","還清","行業(yè)","主要","從事","有","無","良好","貸款","累計","自","放心","家里","吱吱","為了","放款","多","在","年","所","抵押","無擔(dān)保","服務(wù)","本人","多多","小額貸款","想","與","借","給","建立""支持","至今","安信","良好","最","多","探索","大","小","證大速貸","成立","于","信用","成立","每月","流水","一家","因為","我","和","是","做","所以","迅速","以來","需""快速","簡便","可以","專門","資料","經(jīng)","了","也","現(xiàn)在","由于","測試","需要","元","也","還","個","月","人","申請","等","能","了","及","沒有","現(xiàn)在","就","進(jìn)行","都","各位","急急","每個","準(zhǔn)備","有限公司","目前","保證","按時","因","可","持續(xù)","一個","上","到","萬","要","現(xiàn)","來","想","個人","左右","不","年底","能力",]4.4 生成詞云圖代碼
由于28W條數(shù)據(jù)過多,此處采用步距為3對數(shù)據(jù)切片!
txt = ""for each in conciseData[conciseData["性別"]=="男"]["借款理由"][::3]:if isinstance(each,str):txt += each + " "words = jieba.cut(txt) #分詞result = "" for each in words:if each not in stopWords:result += each + " "wordshow = WordCloud(background_color='black',width=800,height=800,max_words=800,max_font_size=100,font_path="msyh.ttc", ).generate(result)wordshow.to_file('男.png')五、寫在最后
眾生皆苦,不止你一個,放下即是自在。
總結(jié)
以上是生活随笔為你收集整理的用Python生成人人贷借款理由词云图的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Flex + BlazeDS 学习笔记
- 下一篇: 谈网页游戏外挂之用python模拟游戏(