日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

泰坦尼克号项目

發布時間:2024/3/12 编程问答 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 泰坦尼克号项目 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
import pandas as pd df_train,df_test = pd.read_csv("F:/Python CODE/Kaggle_Titanic/train.csv"),pd.read_csv("F:/Python CODE/Kaggle_Titanic/test.csv") In?[2]: df_train.head()#查看表格的后5行 Out[2]: ?PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked01234
103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen, Mr. William Henrymale35.0003734508.0500NaNS

SibSp -- 同船配偶以及兄弟姐妹的人數

Parch -- 同船父母或者子女的人數

Ticket -- 船票

Fare -- 票價

Cabin -- 艙位

Embarked -- 登船港口

In?[3]: df_train.info() #查看數據表的整體信息 <class 'pandas.core.frame.DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): PassengerId 891 non-null int64 Survived 891 non-null int64 Pclass 891 non-null int64 Name 891 non-null object Sex 891 non-null object Age 714 non-null float64 SibSp 891 non-null int64 Parch 891 non-null int64 Ticket 891 non-null object Fare 891 non-null float64 Cabin 204 non-null object Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.6+ KB In?[4]: df_train.describe() #描述性統計 Out[4]: ?PassengerIdSurvivedPclassAgeSibSpParchFarecountmeanstdmin25%50%75%max
891.000000891.000000891.000000714.000000891.000000891.000000891.000000
446.0000000.3838382.30864229.6991180.5230080.38159432.204208
257.3538420.4865920.83607114.5264971.1027430.80605749.693429
1.0000000.0000001.0000000.4200000.0000000.0000000.000000
223.5000000.0000002.00000020.1250000.0000000.0000007.910400
446.0000000.0000003.00000028.0000000.0000000.00000014.454200
668.5000001.0000003.00000038.0000001.0000000.00000031.000000
891.0000001.0000003.00000080.0000008.0000006.000000512.329200
In?[5]: df_train[["Name","Sex","Ticket","Cabin","Embarked"]].describe()#對于object類型(python對象)同樣用describe()處理 Out[5]: ?NameSexTicketCabinEmbarkedcountuniquetopfreq
891891891204889
89126811473
Green, Mr. George HenrymaleCA. 2343G6S
157774644
In?[6]: #特征分析,在11個特征中,找哪些是和幸存相關 import numpy as np import matplotlib.pyplot as plt Pclass_Survied = pd.crosstab(df_train['Pclass'],df_train['Survived'])#生成Pclass_Survied的列聯表 In?[7]: Pclass_Survied Out[7]: Survived01Pclass??123
80136
9787
372119
In?[8]: Pclass_Survied.plot(kind = 'bar',stacked = True) #堆積柱形圖 plt.show() In?[9]: Pclass_Survied.count() Out[9]: Survived 0 3 1 3 dtype: int64 In?[10]: Pclass_Survied.index Out[10]: Int64Index([1, 2, 3], dtype='int64', name='Pclass') In?[11]: Survied_len = len(Pclass_Survied.count()) Pclass_index = np.arange(len(Pclass_Survied.index)) In?[12]: Pclass_index Out[12]: array([0, 1, 2]) In?[13]: Pclass_Survied Out[13]: Survived01Pclass??123
80136
9787
372119
In?[14]: Pclass_Survied.plot(kind = 'bar',stacked = True) #堆積柱形圖 Sum1 = 0 for i in range(Survied_len): SurvivedName = Pclass_Survied.columns[i] PclassCount = Pclass_Survied[SurvivedName] Sum1,Sum2 = Sum1+PclassCount,Sum1 Zsum =Sum2+(Sum1 - Sum2)/2 for x,y,z in zip(Pclass_index,PclassCount,Zsum): plt.text(x,z, '%.0f'%y, ha = 'center',va='center' )#添加數據標簽 #修改x軸標簽 plt.xticks(Pclass_Survied.index-1, Pclass_Survied.index, rotation=360) plt.title('Survived status by pclass') plt.show() In?[15]: a = df_train.Pclass[df_train['Survived']==0].value_counts() b = df_train.Pclass[df_train['Survived']==1].value_counts() Pclass_Survived = pd.DataFrame({ 0: a, 1: b}) In?[16]: Pclass_Survived Out[16]: ?01123
80136
9787
372119
In?[17]: import re df_train['Appellation'] = df_train.Name.apply(lambda x: re.search('\w+\.', x).group()).str.replace('.', '') df_train.Appellation.unique() Out[17]: array(['Mr', 'Mrs', 'Miss', 'Master', 'Don', 'Rev', 'Dr', 'Mme', 'Ms','Major', 'Lady', 'Sir', 'Mlle', 'Col', 'Capt', 'Countess','Jonkheer'], dtype=object) In?[18]: Application_Sex = pd.crosstab(df_train.Sex,df_train.Appellation) Application_Sex Out[18]: AppellationCaptColCountessDonDrJonkheerLadyMajorMasterMissMlleMmeMrMrsMsRevSirSex?????????????????femalemale
001010100182210125100
12016102400005170061
In?[19]: df_train['Appellation'] = df_train['Appellation'].replace(['Capt','Col','Countess','Don','Dr','Jonkheer','Lady','Major','Rev','Sir'], 'Rare') df_train['Appellation'] = df_train['Appellation'].replace(['Mlle','Ms'], 'Miss') df_train['Appellation'] = df_train['Appellation'].replace('Mme', 'Mrs') df_train.Appellation.unique() Out[19]: array(['Mr', 'Mrs', 'Miss', 'Master', 'Rare'], dtype=object) In?[44]: Appellation_Survived = pd.crosstab(df_train['Appellation'], df_train['Survived']) Appellation_Survived.plot(kind = 'bar') plt.xticks(np.arange(len(Appellation_Survived.index)), Appellation_Survived.index, rotation = 360) plt.title('Survived status by Appellation') plt.show() In?[24]: Sex_Survived = pd.crosstab(df_train['Sex'],df_train['Survived']) In?[45]: #生成列聯表 Sex_Survived = pd.crosstab(df_train['Sex'], df_train['Survived']) Survived_len = len(Sex_Survived.count()) Sex_index = np.arange(len(Sex_Survived.index)) single_width = 0.35 for i in range(Survived_len): SurvivedName = Sex_Survived.columns[i] SexCount = Sex_Survived[SurvivedName] SexLocation = Sex_index * 1.05 + (i - 1/2)*single_width #繪制柱形圖 plt.bar(SexLocation, SexCount, width = single_width) for x, y in zip(SexLocation, SexCount): #添加數據標簽 plt.text(x, y, '%.0f'%y, ha='center', va='bottom') index = Sex_index * 1.05 plt.xticks(index, Sex_Survived.index, rotation=360) plt.title('Survived status by sex') plt.show() In?[46]: SibSp_Survived = pd.crosstab(df_train['SibSp'], df_train['Survived']) SibSp_Survived.plot(kind = 'bar') plt.xticks(SibSp_Survived.index,SibSp_Survived.index,rotation = 360) plt.title('Survived status by SibSp') plt.show() In?[47]: SibSp_Survived = pd.crosstab(df_train.SibSp[df_train['SibSp']>2], df_train['Survived']) SibSp_Survived.plot(kind = 'bar') plt.xticks([0,1,2,3],SibSp_Survived.index,rotation = 360) plt.title('Survived status by SibSp') plt.show() In?[28]: Ticket_Count = df_train.groupby('Ticket',as_index=False)['PassengerId'].count() In?[29]: Ticket_Count.head() Out[29]: ?TicketPassengerId01234
1101523
1104133
1104652
1105641
1108131
In?[30]: #解釋上行代碼中的groupg中的as_index=False df = pd.DataFrame(data={'books':['bk1','bk1','bk1','bk2','bk2','bk3'], 'price': [12,12,12,15,15,17]}) print(df) print("*********************") print (df.groupby('books', as_index=True).sum()) print("*********************") print (df.groupby('books', as_index=False).sum()) books price 0 bk1 12 1 bk1 12 2 bk1 12 3 bk2 15 4 bk2 15 5 bk3 17 *********************price books bk1 36 bk2 30 bk3 17 *********************books price 0 bk1 36 1 bk2 30 2 bk3 17 In?[31]: Ticket_Count_0 = Ticket_Count[Ticket_Count.PassengerId == 1]['Ticket'] In?[32]: Ticket_Count_0.head() Out[32]: 3 110564 4 110813 5 111240 6 111320 8 111369 Name: Ticket, dtype: object In?[33]: df_train['GroupTicket'] = np.where(df_train.Ticket.isin(Ticket_Count_0),0,1) In?[34]: GroupTicket_Survived = pd.crosstab(df_train['GroupTicket'],df_train['Survived']) GroupTicket_Survived.plot(kind='bar') plt.xticks(rotation =360) Out[34]: (array([0, 1]), <a list of 2 Text xticklabel objects>) In?[35]: bins = [0, 60, 120, 180, 240, 300, 360, 420, 480, 540, 600] df_train['GroupFare'] = pd.cut(df_train.Fare,bins,right=False) GroupFare_Survived = pd.crosstab(df_train['GroupFare'],df_train['Survived']) GroupFare_Survived.plot(kind = 'bar') Out[35]: <matplotlib.axes._subplots.AxesSubplot at 0xac47eb8> In?[36]: GroupFare_Survived.iloc[2:].plot(kind = 'bar') Out[36]: <matplotlib.axes._subplots.AxesSubplot at 0xa7a4ef0> In?[?]: #以上所有操作都是對特征中無缺失部分進行分析 #下一步則會在特征工程中對缺失部分進行處理Age、Cabin、Embarked In?[37]: df_train['Embarked'].mode() Out[37]: 0 S dtype: object In?[38]: #df_train['Embarked'].mode()[0] 眾數可能有多個,[0]代表取第一個 train = df_train.copy() train['Embarked'] = train['Embarked'].fillna(train['Embarked'].mode()[0]) In?[39]: train['Cabin'] = train['Cabin'].fillna('NO') In?[40]: Age_Appellation_median = train.groupby('Appellation')['Age'].median() In?[52]: Age_Appellation_median Out[52]: Appellation Master 3.5 Miss 21.0 Mr 30.0 Mrs 35.0 Rare 48.5 Name: Age, dtype: float64 In?[59]: train.set_index('Appellation', inplace = True) #在當前表填充缺失值 train.Age.fillna(Age_Appellation_median, inplace = True) #重置索引 train.reset_index(inplace = True) In?[60]: train Out[60]: ?AppellationPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedGroupTicketGroupFare01234567891011121314151617181920212223242526272829...861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890
Mr103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NOS0[0, 60)
Mrs211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C0[60, 120)
Miss313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NOS0[0, 60)
Mrs411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S1[0, 60)
Mr503Allen, Mr. William Henrymale35.0003734508.0500NOS0[0, 60)
Mr603Moran, Mr. Jamesmale30.0003308778.4583NOQ0[0, 60)
Mr701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S0[0, 60)
Master803Palsson, Master. Gosta Leonardmale2.03134990921.0750NOS1[0, 60)
Mrs913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NOS1[0, 60)
Mrs1012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NOC1[0, 60)
Miss1113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S1[0, 60)
Miss1211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S0[0, 60)
Mr1303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NOS0[0, 60)
Mr1403Andersson, Mr. Anders Johanmale39.01534708231.2750NOS1[0, 60)
Miss1503Vestrom, Miss. Hulda Amanda Adolfinafemale14.0003504067.8542NOS0[0, 60)
Mrs1612Hewlett, Mrs. (Mary D Kingcome)female55.00024870616.0000NOS0[0, 60)
Master1703Rice, Master. Eugenemale2.04138265229.1250NOQ1[0, 60)
Mr1812Williams, Mr. Charles Eugenemale30.00024437313.0000NOS0[0, 60)
Mrs1903Vander Planke, Mrs. Julius (Emelia Maria Vande...female31.01034576318.0000NOS0[0, 60)
Mrs2013Masselmani, Mrs. Fatimafemale35.00026497.2250NOC0[0, 60)
Mr2102Fynney, Mr. Joseph Jmale35.00023986526.0000NOS1[0, 60)
Mr2212Beesley, Mr. Lawrencemale34.00024869813.0000D56S0[0, 60)
Miss2313McGowan, Miss. Anna "Annie"female15.0003309238.0292NOQ0[0, 60)
Mr2411Sloper, Mr. William Thompsonmale28.00011378835.5000A6S0[0, 60)
Miss2503Palsson, Miss. Torborg Danirafemale8.03134990921.0750NOS1[0, 60)
Mrs2613Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...female38.01534707731.3875NOS1[0, 60)
Mr2703Emir, Mr. Farred Chehabmale30.00026317.2250NOC0[0, 60)
Mr2801Fortune, Mr. Charles Alexandermale19.03219950263.0000C23 C25 C27S1[240, 300)
Miss2913O'Dwyer, Miss. Ellen "Nellie"female21.0003309597.8792NOQ0[0, 60)
Mr3003Todoroff, Mr. Laliomale30.0003492167.8958NOS0[0, 60)
.............................................
Mr86202Giles, Mr. Frederick Edwardmale21.0102813411.5000NOS0[0, 60)
Mrs86311Swift, Mrs. Frederick Joel (Margaret Welles Ba...female48.0001746625.9292D17S0[0, 60)
Miss86403Sage, Miss. Dorothy Edith "Dolly"female21.082CA. 234369.5500NOS1[60, 120)
Mr86502Gill, Mr. John Williammale24.00023386613.0000NOS0[0, 60)
Mrs86612Bystrom, Mrs. (Karolina)female42.00023685213.0000NOS0[0, 60)
Miss86712Duran y More, Miss. Asuncionfemale27.010SC/PARIS 214913.8583NOC0[0, 60)
Mr86801Roebling, Mr. Washington Augustus IImale31.000PC 1759050.4958A24S0[0, 60)
Mr86903van Melkebeke, Mr. Philemonmale30.0003457779.5000NOS0[0, 60)
Master87013Johnson, Master. Harold Theodormale4.01134774211.1333NOS1[0, 60)
Mr87103Balkic, Mr. Cerinmale26.0003492487.8958NOS0[0, 60)
Mrs87211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.0111175152.5542D35S1[0, 60)
Mr87301Carlsson, Mr. Frans Olofmale33.0006955.0000B51 B53 B55S0[0, 60)
Mr87403Vander Cruyssen, Mr. Victormale47.0003457659.0000NOS0[0, 60)
Mrs87512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NOC1[0, 60)
Miss87613Najib, Miss. Adele Kiamie "Jane"female15.00026677.2250NOC0[0, 60)
Mr87703Gustafsson, Mr. Alfred Ossianmale20.00075349.8458NOS1[0, 60)
Mr87803Petroff, Mr. Nedeliomale19.0003492127.8958NOS0[0, 60)
Mr87903Laleff, Mr. Kristomale30.0003492177.8958NOS0[0, 60)
Mrs88011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C1[60, 120)
Mrs88112Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NOS1[0, 60)
Mr88203Markun, Mr. Johannmale33.0003492577.8958NOS0[0, 60)
Miss88303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NOS0[0, 60)
Mr88402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NOS0[0, 60)
Mr88503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NOS0[0, 60)
Mrs88603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NOQ1[0, 60)
Rare88702Montvila, Rev. Juozasmale27.00021153613.0000NOS0[0, 60)
Miss88811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S0[0, 60)
Miss88903Johnston, Miss. Catherine Helen "Carrie"female21.012W./C. 660723.4500NOS1[0, 60)
Mr89011Behr, Mr. Karl Howellmale26.00011136930.0000C148C0[0, 60)
Mr89103Dooley, Mr. Patrickmale32.0003703767.7500NOQ0[0, 60)

891 rows × 15 columns

In?[62]: train.Age.isnull().sum() Out[62]: 0 In?[64]: train.Age.isnull().any() Out[64]: False In?[65]: train.Age.describe() Out[65]: count 891.000000 mean 29.392447 std 13.268389 min 0.420000 25% 21.000000 50% 30.000000 75% 35.000000 max 80.000000 Name: Age, dtype: float64 In?[66]: Embarked_Survived = pd.crosstab(train['Embarked'],train['Survived']) In?[68]: Embarked_Survived.plot(kind = 'bar') plt.xticks(rotation = 360) plt.title('Survived status by Embarked') plt.show() In?[69]: train Out[69]: ?AppellationPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedGroupTicketGroupFare01234567891011121314151617181920212223242526272829...861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890
Mr103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NOS0[0, 60)
Mrs211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C0[60, 120)
Miss313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NOS0[0, 60)
Mrs411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S1[0, 60)
Mr503Allen, Mr. William Henrymale35.0003734508.0500NOS0[0, 60)
Mr603Moran, Mr. Jamesmale30.0003308778.4583NOQ0[0, 60)
Mr701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S0[0, 60)
Master803Palsson, Master. Gosta Leonardmale2.03134990921.0750NOS1[0, 60)
Mrs913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NOS1[0, 60)
Mrs1012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NOC1[0, 60)
Miss1113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S1[0, 60)
Miss1211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S0[0, 60)
Mr1303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NOS0[0, 60)
Mr1403Andersson, Mr. Anders Johanmale39.01534708231.2750NOS1[0, 60)
Miss1503Vestrom, Miss. Hulda Amanda Adolfinafemale14.0003504067.8542NOS0[0, 60)
Mrs1612Hewlett, Mrs. (Mary D Kingcome)female55.00024870616.0000NOS0[0, 60)
Master1703Rice, Master. Eugenemale2.04138265229.1250NOQ1[0, 60)
Mr1812Williams, Mr. Charles Eugenemale30.00024437313.0000NOS0[0, 60)
Mrs1903Vander Planke, Mrs. Julius (Emelia Maria Vande...female31.01034576318.0000NOS0[0, 60)
Mrs2013Masselmani, Mrs. Fatimafemale35.00026497.2250NOC0[0, 60)
Mr2102Fynney, Mr. Joseph Jmale35.00023986526.0000NOS1[0, 60)
Mr2212Beesley, Mr. Lawrencemale34.00024869813.0000D56S0[0, 60)
Miss2313McGowan, Miss. Anna "Annie"female15.0003309238.0292NOQ0[0, 60)
Mr2411Sloper, Mr. William Thompsonmale28.00011378835.5000A6S0[0, 60)
Miss2503Palsson, Miss. Torborg Danirafemale8.03134990921.0750NOS1[0, 60)
Mrs2613Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...female38.01534707731.3875NOS1[0, 60)
Mr2703Emir, Mr. Farred Chehabmale30.00026317.2250NOC0[0, 60)
Mr2801Fortune, Mr. Charles Alexandermale19.03219950263.0000C23 C25 C27S1[240, 300)
Miss2913O'Dwyer, Miss. Ellen "Nellie"female21.0003309597.8792NOQ0[0, 60)
Mr3003Todoroff, Mr. Laliomale30.0003492167.8958NOS0[0, 60)
.............................................
Mr86202Giles, Mr. Frederick Edwardmale21.0102813411.5000NOS0[0, 60)
Mrs86311Swift, Mrs. Frederick Joel (Margaret Welles Ba...female48.0001746625.9292D17S0[0, 60)
Miss86403Sage, Miss. Dorothy Edith "Dolly"female21.082CA. 234369.5500NOS1[60, 120)
Mr86502Gill, Mr. John Williammale24.00023386613.0000NOS0[0, 60)
Mrs86612Bystrom, Mrs. (Karolina)female42.00023685213.0000NOS0[0, 60)
Miss86712Duran y More, Miss. Asuncionfemale27.010SC/PARIS 214913.8583NOC0[0, 60)
Mr86801Roebling, Mr. Washington Augustus IImale31.000PC 1759050.4958A24S0[0, 60)
Mr86903van Melkebeke, Mr. Philemonmale30.0003457779.5000NOS0[0, 60)
Master87013Johnson, Master. Harold Theodormale4.01134774211.1333NOS1[0, 60)
Mr87103Balkic, Mr. Cerinmale26.0003492487.8958NOS0[0, 60)
Mrs87211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.0111175152.5542D35S1[0, 60)
Mr87301Carlsson, Mr. Frans Olofmale33.0006955.0000B51 B53 B55S0[0, 60)
Mr87403Vander Cruyssen, Mr. Victormale47.0003457659.0000NOS0[0, 60)
Mrs87512Abelson, Mrs. Samuel (Hannah Wizosky)female28.010P/PP 338124.0000NOC1[0, 60)
Miss87613Najib, Miss. Adele Kiamie "Jane"female15.00026677.2250NOC0[0, 60)
Mr87703Gustafsson, Mr. Alfred Ossianmale20.00075349.8458NOS1[0, 60)
Mr87803Petroff, Mr. Nedeliomale19.0003492127.8958NOS0[0, 60)
Mr87903Laleff, Mr. Kristomale30.0003492177.8958NOS0[0, 60)
Mrs88011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C1[60, 120)
Mrs88112Shelley, Mrs. William (Imanita Parrish Hall)female25.00123043326.0000NOS1[0, 60)
Mr88203Markun, Mr. Johannmale33.0003492577.8958NOS0[0, 60)
Miss88303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NOS0[0, 60)
Mr88402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NOS0[0, 60)
Mr88503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NOS0[0, 60)
Mrs88603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NOQ1[0, 60)
Rare88702Montvila, Rev. Juozasmale27.00021153613.0000NOS0[0, 60)
Miss88811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S0[0, 60)
Miss88903Johnston, Miss. Catherine Helen "Carrie"female21.012W./C. 660723.4500NOS1[0, 60)
Mr89011Behr, Mr. Karl Howellmale26.00011136930.0000C148C0[0, 60)
Mr89103Dooley, Mr. Patrickmale32.0003703767.7500NOQ0[0, 60)

891 rows × 15 columns

In?[80]: train['GroupCabin'] = np.where(train['Cabin'] == 'NO',0,1) In?[82]: GroupCabin_Survived = pd.crosstab(train['GroupCabin'],train['Survived']) GroupCabin_Survived.plot(kind = 'bar') plt.title('Survived status by GroupCabin') plt.xticks(rotation=360) plt.show() In?[86]: #對Age進行分組: 2**10>891分成10組, 組距為(最大值80-最小值0)/10 =8取9 bins = [0, 9, 18, 27, 36, 45, 54, 63, 72, 81, 90] train['GroupAge'] = pd.cut(train.Age, bins) GroupAge_Survived = pd.crosstab(train['GroupAge'], train['Survived']) GroupAge_Survived.plot(kind = 'bar') plt.title('Survived status by GroupAge') plt.show() In?[87]: train['Appellation'] = train.Appellation.map({'Mr': 0, 'Mrs': 1, 'Miss': 2, 'Master': 3, 'Rare': 4}) train.Appellation.unique() Out[87]: array([0, 1, 2, 3, 4], dtype=int64) In?[89]: train['Sex'] = train.Sex.map({'female':0,'male':1}) In?[90]: train.head() Out[90]: ?AppellationPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedGroupTicketGroupFareGroupCabinGroupAge01234
0103Braund, Mr. Owen Harris122.010A/5 211717.2500NOS0[0, 60)0(18, 27]
1211Cumings, Mrs. John Bradley (Florence Briggs Th...038.010PC 1759971.2833C85C0[60, 120)1(36, 45]
2313Heikkinen, Miss. Laina026.000STON/O2. 31012827.9250NOS0[0, 60)0(18, 27]
1411Futrelle, Mrs. Jacques Heath (Lily May Peel)035.01011380353.1000C123S1[0, 60)1(27, 36]
0503Allen, Mr. William Henry135.0003734508.0500NOS0[0, 60)0(27, 36]
In?[95]: train.loc[train['Age'] < 9, 'Age']=0 train.loc[(train['Age'] >= 9) & (train['Age'] < 18), 'Age'] = 1 train.loc[(train['Age'] >= 18) & (train['Age'] < 27), 'Age'] = 2 train.loc[(train['Age'] >= 27) & (train['Age'] < 36), 'Age'] = 3 train.loc[(train['Age'] >= 36) & (train['Age'] < 45), 'Age'] = 4 train.loc[(train['Age'] >= 45) & (train['Age'] < 54), 'Age'] = 5 train.loc[(train['Age'] >= 54) & (train['Age'] < 63), 'Age'] = 6 train.loc[(train['Age'] >= 63) & (train['Age'] < 72), 'Age'] = 7 train.loc[(train['Age'] >= 72) & (train['Age'] < 81), 'Age'] = 8 train.loc[(train['Age'] >= 81) & (train['Age'] < 90), 'Age'] = 9 train.Age.unique() Out[95]: array([ 2., 4., 3., 6., 0., 1., 7., 5., 8.]) In?[96]: train.head() Out[96]: ?AppellationPassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedGroupTicketGroupFareGroupCabinGroupAge01234
0103Braund, Mr. Owen Harris12.010A/5 211717.2500NOS0[0, 60)0(18, 27]
1211Cumings, Mrs. John Bradley (Florence Briggs Th...04.010PC 1759971.2833C85C0[60, 120)1(36, 45]
2313Heikkinen, Miss. Laina02.000STON/O2. 31012827.9250NOS0[0, 60)0(18, 27]
1411Futrelle, Mrs. Jacques Heath (Lily May Peel)03.01011380353.1000C123S1[0, 60)1(27, 36]
0503Allen, Mr. William Henry13.0003734508.0500NOS0[0, 60)0(27, 36]
In?[97]: #當SibSp和Parch都為0時, 則孤身一人. train['FamilySize'] = train['SibSp'] + train['Parch'] + 1 train.FamilySize.unique() Out[97]: array([ 2, 1, 5, 3, 7, 6, 4, 8, 11], dtype=int64) In?[98]: train.loc[train['Fare'] < 60, 'Fare'] = 0 train.loc[(train['Fare'] >= 60) & (train['Fare'] < 120), 'Fare'] = 1 train.loc[(train['Fare'] >= 120) & (train['Fare'] < 180), 'Fare'] = 2 train.loc[(train['Fare'] >= 180) & (train['Fare'] < 240), 'Fare'] = 3 train.loc[(train['Fare'] >= 240) & (train['Fare'] < 300), 'Fare'] = 4 train.loc[(train['Fare'] >= 300) & (train['Fare'] < 360), 'Fare'] = 5 train.loc[(train['Fare'] >= 360) & (train['Fare'] < 420), 'Fare'] = 6 train.loc[(train['Fare'] >= 420) & (train['Fare'] < 480), 'Fare'] = 7 train.loc[(train['Fare'] >= 480) & (train['Fare'] < 540), 'Fare'] = 8 train.loc[(train['Fare'] >= 540) & (train['Fare'] < 600), 'Fare'] = 9 train.Fare.unique() Out[98]: array([ 0., 1., 4., 2., 8., 3.]) In?[99]: train['Embarked'] = train.Embarked.map({'S': 0, 'C': 1, 'Q': 2}) In?[100]: train.drop(['PassengerId', 'Name', 'GroupAge', 'SibSp', 'Parch', 'Ticket', 'GroupFare', 'Cabin'], axis = 1, inplace =True) In?[110]: from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split X=train[['Pclass', 'Appellation', 'Sex', 'Age', 'FamilySize', 'GroupTicket', 'Fare', 'GroupCabin', 'Embarked']] y=train['Survived'] #隨機劃分訓練集和測試集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) #邏輯回歸模型初始化 lg = LogisticRegression() #訓練邏輯回歸模型 lg.fit(X_train, y_train) #用測試數據檢驗模型好壞 lg.score(X_test, y_test) Out[110]: 0.78212290502793291 In?[111]: from sklearn.tree import DecisionTreeClassifier #樹的最大深度為15, 內部節點再劃分所需最小樣本數為2, 葉節點最小樣本數1, 最大葉子節點數10, 每次分類的最大特征數6 dt = DecisionTreeClassifier(max_depth=15, min_samples_split=2, min_samples_leaf=1, max_leaf_nodes=10, max_features=6) dt.fit(X_train, y_train) dt.score(X_test, y_test) Out[111]: 0.79329608938547491 In?[126]: #支持向量機SVM from sklearn.cross_validation import cross_val_score, KFold from scipy.stats import sem # 構造一個便于交叉驗證模型性能的函數(模塊) def evaluate_cross_validation(clf, X, y, K): # KFold 函數需要如下參數:數據量, 叉驗次數, 是否洗牌 cv = KFold(len(y), K, shuffle=True, random_state = 0) # 采用上述的分隔方式進行交叉驗證,測試模型性能,對于分類問題,這些得分默認是accuracy,也可以修改為別的 scores = cross_val_score(clf, X, y, cv=cv) print (scores) print ('Mean score: %.3f (+/-%.3f)' % (scores.mean(), sem(scores))) # 使用線性核的SVC (后面會說到不同的核,結果可能大不相同) svc_linear = SVC(kernel='rbf')#‘linear’:線性核函數‘poly’:多項式核函數‘rbf’:徑像核函數/高斯核‘sigmod’:sigmod核函數‘precomputed’:核矩陣 # 五折交叉驗證 K = 5 evaluate_cross_validation(svc_linear, X_train, y_train, 5) [ 0.82517483 0.86013986 0.80985915 0.83802817 0.87323944] Mean score: 0.841 (+/-0.011) In?[118]: #線性分類器 from sklearn.linear_model import SGDClassifier # 選擇使用SGD分類器,適合大規模數據,隨機梯度下降方法估計參數 clf = SGDClassifier() clf.fit(X_train, y_train) # 導入評價包 from sklearn import metrics y_train_predict = clf.predict(X_train) # 內測,使用訓練樣本進行準確性能評估 print(metrics.accuracy_score(y_train, y_train_predict)) # 標準外測,使用測試樣本進行準確性能評估 y_predict = clf.predict(X_test) print(metrics.accuracy_score(y_test, y_predict)) 0.651685393258 0.659217877095 In?[123]: #樸素貝葉斯分類器 from sklearn.naive_bayes import GaussianNB clf = GaussianNB() clf.fit(X_train, y_train) y_predict =clf.predict(X_test) from sklearn.metrics import accuracy_score print(accuracy_score(y_test, y_predict)) 0.765363128492

轉載于:https://www.cnblogs.com/USTC-ZCC/p/10018777.html

總結

以上是生活随笔為你收集整理的泰坦尼克号项目的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。