當前位置：首頁 > 编程语言 > python >内容正文

python

python信用卡违约预测分析_Python数据分析及可视化实例之银行信用卡违约预测（24）...

發布時間：2024/9/27 python 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 python信用卡违约预测分析_Python数据分析及可视化实例之银行信用卡违约预测（24）... 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.項目背景：

銀行體系對于信用可違約進行預測，原始數據集如下：

2.分析步驟：

(1)數據清洗(Data Cleaning)

(2) 探索性可視化(Exploratory Visualization)

(3) 特征工程(Feature Engineering)

(4)基本建模&評估(Basic Modeling& Evaluation)

3.源碼：

數據集下載：易一網絡科技 - 付費文章?www.intumu.com

加載數據

import pandas as pd

df=pd.read_excel('LRGWFB.xls')

df.head()

年齡教育工齡地址收入負債率信用卡負債其他負債違約 0 41 3 17 12 176 9.3 11.359392 5.008608 1 1 27 1 10 6 31 17.3 1.362202 4.000798 0 2 40 1 15 14 55 5.5 0.856075 2.168925 0 3 41 1 15 14 120 2.9 2.658720 0.821280 0 4 24 2 2 0 28 17.3 1.787436 3.056564 1

是否有空值

df.isnull().any()

年齡 False

教育 False

工齡 False

地址 False

收入 False

負債率 False

信用卡負債 False

其他負債 False

違約 False

dtype: bool

目標集分類

df['違約'].unique()

array([1, 0], dtype=int64)

訓練集、目標集分割

X, y = df.iloc[:,1:-1],df.iloc[:,-1]

特征相關性

classes = X.columns.tolist()

classes

['教育', '工齡', '地址', '收入', '負債率', '信用卡負債', '其他負債']

from yellowbrick.features import Rank2D

visualizer = Rank2D(algorithm='pearson',size=(800, 600),title="7特征向量的皮爾森相關系數")

visualizer.fit(X, y)

visualizer.transform(X)

visualizer.poof()

E:\Anaconda3\lib\site-packages\yellowbrick\features\rankd.py:262: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.

X = X.as_matrix()

特征重要性

from sklearn.ensemble import RandomForestClassifier

from yellowbrick.features.importances import FeatureImportances

model = RandomForestClassifier(n_estimators=10)

viz = FeatureImportances(model,size=(800, 600),title="隨機森林算法分類訓練特征重要性",xlabel='重要性評分')

viz.fit(X, y)

viz.poof()

分類報告

訓練集、測試集分割

from sklearn.model_selection import train_test_split as tts

X_train, X_test, y_train, y_test = tts(X, y, test_size =0.2, random_state=10)

分類結果報告

from sklearn.ensemble import RandomForestClassifier

from yellowbrick.classifier import ClassificationReport

model = RandomForestClassifier(n_estimators=10)

visualizer = ClassificationReport(model, support=True,size=(800, 600),title="機森林算法分類報告")

visualizer.fit(X_train.values, y_train)

print('得分：',visualizer.score(X_test.values, y_test))

visualizer.poof()

得分： 0.7714285714285715

持久化保存

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=10)

model.fit(X_train.values, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',

max_depth=None, max_features='auto', max_leaf_nodes=None,

min_impurity_decrease=0.0, min_impurity_split=None,

min_samples_leaf=1, min_samples_split=2,

min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,

oob_score=False, random_state=None, verbose=0,

warm_start=False)

from sklearn.externals import joblib

joblib.dump(model,'model.pickle') #保存

['model.pickle']

載入訓練模型

model = joblib.load('model.pickle') #載入

model.predict(X_test) # 輸出每組數據的預測結果的標簽值

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,

0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,

1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,

0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,

0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

1, 0, 1, 1, 0, 0, 0, 0], dtype=int64)

model.predict_proba(X_test) # 輸出的是二維矩陣，第i行j列表示測試數據第i行測試數據在每個label上的概率

array([[1. , 0. ],

[0.9, 0.1],

[0.8, 0.2],

[1. , 0. ],

[0.9, 0.1],

[1. , 0. ],

[0.5, 0.5],

[0.8, 0.2],

[0.9, 0.1],

[1. , 0. ],

[0.4, 0.6],

[1. , 0. ],

[0.6, 0.4],

[0.3, 0.7],

[1. , 0. ],

[0.6, 0.4],

[0.9, 0.1],

[0.7, 0.3],

[1. , 0. ],

[0.9, 0.1],

[0.4, 0.6],

[0.5, 0.5],

[1. , 0. ],

[0.8, 0.2],

[1. , 0. ],

[0.9, 0.1],

[0.5, 0.5],

[0.1, 0.9],

[0.9, 0.1],

[0.8, 0.2],

[0.6, 0.4],

[0.8, 0.2],

[0.9, 0.1],

[0.7, 0.3],

[1. , 0. ],

[0.2, 0.8],

[0.9, 0.1],

[1. , 0. ],

[0.9, 0.1],

[0.4, 0.6],

[0.7, 0.3],

[0.4, 0.6],

[0.9, 0.1],

[0.5, 0.5],

[0.1, 0.9],

[1. , 0. ],

[0.8, 0.2],

[0.7, 0.3],

[1. , 0. ],

[0.5, 0.5],

[0.8, 0.2],

[0.7, 0.3],

[0.9, 0.1],

[0.8, 0.2],

[0.3, 0.7],

[0.9, 0.1],

[1. , 0. ],

[0.9, 0.1],

[0.8, 0.2],

[0.9, 0.1],

[1. , 0. ],

[0.9, 0.1],

[0.4, 0.6],

[0.5, 0.5],

[0.9, 0.1],

[0.8, 0.2],

[0.6, 0.4],

[0.8, 0.2],

[1. , 0. ],

[0.8, 0.2],

[1. , 0. ],

[0.9, 0.1],

[0.6, 0.4],

[1. , 0. ],

[0.7, 0.3],

[1. , 0. ],

[0.8, 0.2],

[1. , 0. ],

[0.3, 0.7],

[0.9, 0.1],

[0.7, 0.3],

[0.5, 0.5],

[0.4, 0.6],

[1. , 0. ],

[0.9, 0.1],

[0.8, 0.2],

[0.9, 0.1],

[0.8, 0.2],

[0.2, 0.8],

[0.7, 0.3],

[0.4, 0.6],

[0.6, 0.4],

[0.7, 0.3],

[0.8, 0.2],

[1. , 0. ],

[0.5, 0.5],

[0.8, 0.2],

[1. , 0. ],

[0.9, 0.1],

[0.5, 0.5],

[0.8, 0.2],

[0.6, 0.4],

[0.8, 0.2],

[0.9, 0.1],

[0.6, 0.4],

[0.8, 0.2],

[0.9, 0.1],

[0.1, 0.9],

[1. , 0. ],

[0.9, 0.1],

[0.6, 0.4],

[1. , 0. ],

[0.8, 0.2],

[0.7, 0.3],

[0.9, 0.1],

[0.5, 0.5],

[1. , 0. ],

[0.2, 0.8],

[0.9, 0.1],

[0.4, 0.6],

[0.2, 0.8],

[0.8, 0.2],

[1. , 0. ],

[0.8, 0.2],

[0.8, 0.2]])

新手可查閱歷史目錄：yeayee：Python數據分析及可視化實例目錄?zhuanlan.zhihu.com

最后，別只收藏不關注哈

總結

以上是生活随笔為你收集整理的python信用卡违约预测分析_Python数据分析及可视化实例之银行信用卡违约预测（24）...的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： mysql分页查询所有数据库_MySQL
下一篇： python同时输出多个值_怎样在pyt