日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

八、Pandas的基本使用

發布時間:2024/7/5 编程问答 26 豆豆
生活随笔 收集整理的這篇文章主要介紹了 八、Pandas的基本使用 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Pandas的基本使用

點擊標題即可獲取文章源代碼和筆記

4.1.0 概要

Pandas基礎處理Pandas是什么?為什么用?核心數據結構DataFramePanelSeries基本操作運算畫圖文件的讀取與存儲高級處理4.1Pandas介紹4.1.1 Pandas介紹 - 數據處理工具panel + data + analysispanel面板數據 - 計量經濟學 三維數據4.1.2 為什么使用Pandas便捷的數據處理能力讀取文件方便封裝了Matplotlib、Numpy的畫圖和計算4.1.3 DataFrame結構:既有行索引,又有列索引的二維數組屬性:shapeindexcolumnsvaluesT方法:head()tail()3 DataFrame索引的設置1)修改行列索引值2)重設索引3)設置新索引2 PanelDataFrame的容器3 Series帶索引的一維數組屬性indexvalues總結:DataFrame是Series的容器Panel是DataFrame的容器 4.2 基本數據操作4.2.1 索引操作1)直接索引先列后行2)按名字索引loc3)按數字索引iloc4)組合索引數字、名字4.2.3 排序對內容排序dataframeseries對索引排序dataframeseries 4.3 DataFrame運算算術運算邏輯運算邏輯運算符布爾索引邏輯運算函數query()isin()統計運算min max mean median var stdnp.argmax()np.argmin()自定義運算apply(func, axis=0)Truefunc:自定義函數 4.4 Pandas畫圖sr.plot() 4.5 文件讀取與存儲4.5.1 CSVpd.read_csv(path)usecols=names=dataframe.to_csv(path)columns=[]index=Falseheader=False4.5.2 HDF5hdf5 存儲 3維數據的文件key1 dataframe1二維數據key2 dataframe2二維數據pd.read_hdf(path, key=)df.to_hdf(path, key=)4.5.3 JSONpd.read_json(path)orient="records"lines=Truedf.to_json(patn)orient="records"lines=True

4.1.3 DataFrame

import numpy as np # 創建一個符合正態分布的10個股票5天的漲跌幅數據 stock_change = np.random.normal(0,1,(10,5)) stock_change array([[ 0.77072465, 1.30408183, -0.44043464, 0.8900768 , -0.80947118],[ 0.92407994, 0.01646795, -1.26614793, 1.52393669, -0.85373051],[-1.68378051, 0.4302981 , 0.8069393 , 0.60557427, -0.03960376],[ 0.75708007, -0.39899325, 0.23027082, -0.89585658, -1.86590247],[-0.41516245, -1.31841546, 0.16256478, -0.67449097, -1.26234013],[-0.27687242, -0.74154521, -0.03755446, 1.24182603, -0.79444361],[-0.2549323 , -0.41034663, -1.85076521, -1.28663451, -0.28566877],[ 1.22453612, -1.60200055, -1.83171522, -0.85322799, -1.70950421],[ 2.00461483, 1.49338564, 0.33928513, -0.1776084 , -0.39698965],[ 0.2184662 , -0.03868143, -0.21432675, 0.00604093, 1.35011139]]) import pandas as pd pd.DataFrame(stock_change) 012340123456789
0.7707251.304082-0.4404350.890077-0.809471
0.9240800.016468-1.2661481.523937-0.853731
-1.6837810.4302980.8069390.605574-0.039604
0.757080-0.3989930.230271-0.895857-1.865902
-0.415162-1.3184150.162565-0.674491-1.262340
-0.276872-0.741545-0.0375541.241826-0.794444
-0.254932-0.410347-1.850765-1.286635-0.285669
1.224536-1.602001-1.831715-0.853228-1.709504
2.0046151.4933860.339285-0.177608-0.396990
0.218466-0.038681-0.2143270.0060411.350111
# 構造行索引序列 stock_code = ['股票' + str(i) for i in range(stock_change.shape[0])] stock_code ['股票0', '股票1', '股票2', '股票3', '股票4', '股票5', '股票6', '股票7', '股票8', '股票9'] # 添加行索引 data = pd.DataFrame(stock_change,index=stock_code) data 01234股票0股票1股票2股票3股票4股票5股票6股票7股票8股票9
0.7707251.304082-0.4404350.890077-0.809471
0.9240800.016468-1.2661481.523937-0.853731
-1.6837810.4302980.8069390.605574-0.039604
0.757080-0.3989930.230271-0.895857-1.865902
-0.415162-1.3184150.162565-0.674491-1.262340
-0.276872-0.741545-0.0375541.241826-0.794444
-0.254932-0.410347-1.850765-1.286635-0.285669
1.224536-1.602001-1.831715-0.853228-1.709504
2.0046151.4933860.339285-0.177608-0.396990
0.218466-0.038681-0.2143270.0060411.350111
# 添加列索引 date = pd.date_range(start="20200618",periods=5,freq="B") # start 開始時間, periods 間隔時間,freq 按照什么間隔 d w 5h date DatetimeIndex(['2020-06-18', '2020-06-19', '2020-06-22', '2020-06-23','2020-06-24'],dtype='datetime64[ns]', freq='B') # 添加列索引 data = pd.DataFrame(stock_change,index=stock_code,columns=date) data 2020-06-182020-06-192020-06-222020-06-232020-06-24股票0股票1股票2股票3股票4股票5股票6股票7股票8股票9
0.7707251.304082-0.4404350.890077-0.809471
0.9240800.016468-1.2661481.523937-0.853731
-1.6837810.4302980.8069390.605574-0.039604
0.757080-0.3989930.230271-0.895857-1.865902
-0.415162-1.3184150.162565-0.674491-1.262340
-0.276872-0.741545-0.0375541.241826-0.794444
-0.254932-0.410347-1.850765-1.286635-0.285669
1.224536-1.602001-1.831715-0.853228-1.709504
2.0046151.4933860.339285-0.177608-0.396990
0.218466-0.038681-0.2143270.0060411.350111

DataFrame屬性

data.shape (10, 5) data.index Index(['股票0', '股票1', '股票2', '股票3', '股票4', '股票5', '股票6', '股票7', '股票8', '股票9'], dtype='object') data.columns DatetimeIndex(['2020-06-18', '2020-06-19', '2020-06-22', '2020-06-23','2020-06-24'],dtype='datetime64[ns]', freq='B') data.values array([[ 0.77072465, 1.30408183, -0.44043464, 0.8900768 , -0.80947118],[ 0.92407994, 0.01646795, -1.26614793, 1.52393669, -0.85373051],[-1.68378051, 0.4302981 , 0.8069393 , 0.60557427, -0.03960376],[ 0.75708007, -0.39899325, 0.23027082, -0.89585658, -1.86590247],[-0.41516245, -1.31841546, 0.16256478, -0.67449097, -1.26234013],[-0.27687242, -0.74154521, -0.03755446, 1.24182603, -0.79444361],[-0.2549323 , -0.41034663, -1.85076521, -1.28663451, -0.28566877],[ 1.22453612, -1.60200055, -1.83171522, -0.85322799, -1.70950421],[ 2.00461483, 1.49338564, 0.33928513, -0.1776084 , -0.39698965],[ 0.2184662 , -0.03868143, -0.21432675, 0.00604093, 1.35011139]]) data.T 股票0股票1股票2股票3股票4股票5股票6股票7股票8股票92020-06-182020-06-192020-06-222020-06-232020-06-24
0.7707250.924080-1.6837810.757080-0.415162-0.276872-0.2549321.2245362.0046150.218466
1.3040820.0164680.430298-0.398993-1.318415-0.741545-0.410347-1.6020011.493386-0.038681
-0.440435-1.2661480.8069390.2302710.162565-0.037554-1.850765-1.8317150.339285-0.214327
0.8900771.5239370.605574-0.895857-0.6744911.241826-1.286635-0.853228-0.1776080.006041
-0.809471-0.853731-0.039604-1.865902-1.262340-0.794444-0.285669-1.709504-0.3969901.350111

DataFrame方法

data.head() # 返回前5行數據 2020-06-182020-06-192020-06-222020-06-232020-06-24股票0股票1股票2股票3股票4
0.7707251.304082-0.4404350.890077-0.809471
0.9240800.016468-1.2661481.523937-0.853731
-1.6837810.4302980.8069390.605574-0.039604
0.757080-0.3989930.230271-0.895857-1.865902
-0.415162-1.3184150.162565-0.674491-1.262340
data.tail() # 返回后5行數據 2020-06-182020-06-192020-06-222020-06-232020-06-24股票5股票6股票7股票8股票9
-0.276872-0.741545-0.0375541.241826-0.794444
-0.254932-0.410347-1.850765-1.286635-0.285669
1.224536-1.602001-1.831715-0.853228-1.709504
2.0046151.4933860.339285-0.177608-0.396990
0.218466-0.038681-0.2143270.0060411.350111

3 DataFrame索引的設置

  • 修改行列索引值
data.index[2] '股票2' data.index[2] = "股票88" # 注意:單獨修改每一列的索引是不行的,在DataFrame中,只能對索引進行整體的修改 ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-19-9e95917cc4d9> in <module> ----> 1 data.index[2] = "股票88"D:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)3908 3909 def __setitem__(self, key, value): -> 3910 raise TypeError("Index does not support mutable operations")3911 3912 def __getitem__(self, key):TypeError: Index does not support mutable operations stock_ = ["股票_{}".format(i) for i in range(10)] data.index = stock_ data.index Index(['股票_0', '股票_1', '股票_2', '股票_3', '股票_4', '股票_5', '股票_6', '股票_7', '股票_8','股票_9'],dtype='object')

重設索引

  • reset_index(drop=False)
  • 設置新的下標索引
  • drop:默認為False,不刪除原來索引,如果為True,刪除原來的索引值
# 重置索引,drop=False data.reset_index() index2020-06-18 00:00:002020-06-19 00:00:002020-06-22 00:00:002020-06-23 00:00:002020-06-24 00:00:000123456789
股票_00.7707251.304082-0.4404350.890077-0.809471
股票_10.9240800.016468-1.2661481.523937-0.853731
股票_2-1.6837810.4302980.8069390.605574-0.039604
股票_30.757080-0.3989930.230271-0.895857-1.865902
股票_4-0.415162-1.3184150.162565-0.674491-1.262340
股票_5-0.276872-0.741545-0.0375541.241826-0.794444
股票_6-0.254932-0.410347-1.850765-1.286635-0.285669
股票_71.224536-1.602001-1.831715-0.853228-1.709504
股票_82.0046151.4933860.339285-0.177608-0.396990
股票_90.218466-0.038681-0.2143270.0060411.350111
# 重置索引,drop=True data.reset_index(drop=True) 2020-06-182020-06-192020-06-222020-06-232020-06-240123456789
0.7707251.304082-0.4404350.890077-0.809471
0.9240800.016468-1.2661481.523937-0.853731
-1.6837810.4302980.8069390.605574-0.039604
0.757080-0.3989930.230271-0.895857-1.865902
-0.415162-1.3184150.162565-0.674491-1.262340
-0.276872-0.741545-0.0375541.241826-0.794444
-0.254932-0.410347-1.850765-1.286635-0.285669
1.224536-1.602001-1.831715-0.853228-1.709504
2.0046151.4933860.339285-0.177608-0.396990
0.218466-0.038681-0.2143270.0060411.350111

以某列值設置為新的索引

  • set_index(keys,drop=True)
  • keys:列索引名或者列索引名稱的列表
  • drop:boolean,default True 當作新的索引,刪除原來的索引列

設置新索引案例

  • 1.創建
df = pd.DataFrame({'month':[1,4,7,10],'year':[2012,2014,2013,2014],'sale':[55,40,84,31] }) df monthyearsale0123
1201255
4201440
7201384
10201431
  • 2、以月份設置新的索引
df.set_index('month') yearsalemonth14710
201255
201440
201384
201431
  • 設置多個索引,以年和月份
new_df = df.set_index(['year','month']) new_df saleyearmonth201212014420137201410
55
40
84
31
new_df.index MultiIndex([(2012, 1),(2014, 4),(2013, 7),(2014, 10)],names=['year', 'month'])

4.1.4 MultiIndex 與 Panel的關系

1 Multilndex多級或分層索引對象。

  • index屬性

names: levels的名稱

levels:每個level的元組值

new_df.index.names FrozenList(['year', 'month']) new_df.index.levels FrozenList([[2012, 2013, 2014], [1, 4, 7, 10]])

2 Panel

p = pd.Panel() p # 新版本已移除該函數 D:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version"""Entry point for launching an IPython kernel.<pandas.__getattr__.<locals>.Panel at 0x203fd31ea08> data 2020-06-182020-06-192020-06-222020-06-232020-06-24股票_0股票_1股票_2股票_3股票_4股票_5股票_6股票_7股票_8股票_9
0.7707251.304082-0.4404350.890077-0.809471
0.9240800.016468-1.2661481.523937-0.853731
-1.6837810.4302980.8069390.605574-0.039604
0.757080-0.3989930.230271-0.895857-1.865902
-0.415162-1.3184150.162565-0.674491-1.262340
-0.276872-0.741545-0.0375541.241826-0.794444
-0.254932-0.410347-1.850765-1.286635-0.285669
1.224536-1.602001-1.831715-0.853228-1.709504
2.0046151.4933860.339285-0.177608-0.396990
0.218466-0.038681-0.2143270.0060411.350111

Series

data.iloc[1,:] # 帶索引的一維數組 2020-06-18 0.924080 2020-06-19 0.016468 2020-06-22 -1.266148 2020-06-23 1.523937 2020-06-24 -0.853731 Freq: B, Name: 股票_1, dtype: float64 type(data.iloc[1,:]) pandas.core.series.Series

屬性

data.iloc[1,:].index DatetimeIndex(['2020-06-18', '2020-06-19', '2020-06-22', '2020-06-23','2020-06-24'],dtype='datetime64[ns]', freq='B') data.iloc[1,:].values array([ 0.92407994, 0.01646795, -1.26614793, 1.52393669, -0.85373051])

1. 創建Series

通過已有數據創建

  • 指定內容,默認索引
pd.Series(np.arange(10)) 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 dtype: int32
  • 指定索引
pd.Series([6.7,5.6,3,10,2],index=[1,2,3,4,5]) 1 6.7 2 5.6 3 3.0 4 10.0 5 2.0 dtype: float64
  • 通過字典數據創建
pd.Series({'red':100,'blue':200,'green':500,'yellow':1000 }) red 100 blue 200 green 500 yellow 1000 dtype: int64

總結

  • DataFrame 是 Series的容器
  • Panel 是 DataFrame的容器

4.2 基本數據操作

datas = pd.read_excel("./datas/szfj_baoan.xls") datas districtroomnumhallAREAC_floorfloor_numschoolsubwayper_price01234...12461247124812491250
baoan3289.3middle31007.0773
baoan42127.0high31006.9291
baoan1128.0low39003.9286
baoan1128.0middle30003.3568
baoan2278.0middle8115.0769
...........................
baoan4289.3low8004.2553
baoan2167.0middle30003.8060
baoan2267.4middle29105.3412
baoan2273.1low15105.9508
baoan3286.2middle32014.5244

1251 rows × 9 columns

datas.columns Index(['district', 'roomnum', 'hall', 'AREA', 'C_floor', 'floor_num', 'school','subway', 'per_price'],dtype='object') # 刪除列 datas = datas.drop(columns=[ 'school','subway',],axis=0) datas districtroomnumhallAREAC_floorfloor_numper_price01234...12461247124812491250
baoan3289.3middle317.0773
baoan42127.0high316.9291
baoan1128.0low393.9286
baoan1128.0middle303.3568
baoan2278.0middle85.0769
.....................
baoan4289.3low84.2553
baoan2167.0middle303.8060
baoan2267.4middle295.3412
baoan2273.1low155.9508
baoan3286.2middle324.5244

1251 rows × 7 columns

4.2.1 索引操作

1.直接使用行列索引(先列后行)

datas["per_price"][0] 7.0773

2. 按名字索引(先行后列)

datas.loc[0]["per_price"] 7.0773 datas.loc[0,"per_price"] 7.0773

3.按數字索引

datas.iloc[0,6] 7.0773 # 通過索引值獲取行名 datas.index[0:4] RangeIndex(start=0, stop=4, step=1) datas.loc[datas.index[0:4],["district","roomnum"]] districtroomnum0123
baoan3
baoan4
baoan1
baoan1
# datas.columns.get_indexer() 通過列名獲取索引值 datas.columns.get_indexer(["district","roomnum"]) array([0, 1], dtype=int64) datas.iloc[0:4,datas.columns.get_indexer(["district","roomnum"])] districtroomnum0123
baoan3
baoan4
baoan1
baoan1

4.2.2 賦值操作

# 直接修改原來的值 datas["hall"] = 5 datas.head() districtroomnumhallAREAC_floorfloor_numper_price01234
baoan3589.3middle317.0773
baoan45127.0high316.9291
baoan1528.0low393.9286
baoan1528.0middle303.3568
baoan2578.0middle85.0769
# 或者 datas.hall = 1 datas.head() districtroomnumhallAREAC_floorfloor_numper_price01234
baoan3189.3middle317.0773
baoan41127.0high316.9291
baoan1128.0low393.9286
baoan1128.0middle303.3568
baoan2178.0middle85.0769
datas.iloc[0,0] = "zzzz" datas.head() districtroomnumhallAREAC_floorfloor_numper_price01234
zzzz3189.3middle317.0773
baoan41127.0high316.9291
baoan1128.0low393.9286
baoan1128.0middle303.3568
baoan2178.0middle85.0769

4.2.3 排序

# 對內容進行排序, ascending=False降序排列 ,默認為True升序排列 datas.sort_values(by="per_price",ascending=False) districtroomnumhallAREAC_floorfloor_numper_price917356576296186...91184111886841047
baoan4193.59high2821.9040
baoan81248.99low721.2860
baoan1121.95middle2219.3622
baoan4193.59high2819.2328
baoan31113.60middle3116.5493
.....................
baoan2189.00middle161.6854
baoan2175.00high71.6667
baoan31110.00middle331.5909
baoan3189.00middle261.2247
baoan3198.90middle261.1931

1251 rows × 7 columns

datas.sort_values(by="per_price") districtroomnumhallAREAC_floorfloor_numper_price10476841188841911...186296576356917
baoan3198.90middle261.1931
baoan3189.00middle261.2247
baoan31110.00middle331.5909
baoan2175.00high71.6667
baoan2189.00middle161.6854
.....................
baoan31113.60middle3116.5493
baoan4193.59high2819.2328
baoan1121.95middle2219.3622
baoan81248.99low721.2860
baoan4193.59high2821.9040

1251 rows × 7 columns

# 按照多個字段進行排序 # 先按照“district”字段的內容進行排序,如果值相同,再按照“per_price”字段的內容進行排序 datas.sort_values(by=["district","per_price"]) districtroomnumhallAREAC_floorfloor_numper_price10476841188841911...2965763569170
baoan3198.90middle261.1931
baoan3189.00middle261.2247
baoan31110.00middle331.5909
baoan2175.00high71.6667
baoan2189.00middle161.6854
.....................
baoan4193.59high2819.2328
baoan1121.95middle2219.3622
baoan81248.99low721.2860
baoan4193.59high2821.9040
zzzz3189.30middle317.0773

1251 rows × 7 columns

# 按照行索引大小進行排序,默認從小到大排序 datas.sort_index() districtroomnumhallAREAC_floorfloor_numper_price01234...12461247124812491250
zzzz3189.3middle317.0773
baoan41127.0high316.9291
baoan1128.0low393.9286
baoan1128.0middle303.3568
baoan2178.0middle85.0769
.....................
baoan4189.3low84.2553
baoan2167.0middle303.8060
baoan2167.4middle295.3412
baoan2173.1low155.9508
baoan3186.2middle324.5244

1251 rows × 7 columns

sr = datas["per_price"] sr 0 7.0773 1 6.9291 2 3.9286 3 3.3568 4 5.0769... 1246 4.2553 1247 3.8060 1248 5.3412 1249 5.9508 1250 4.5244 Name: per_price, Length: 1251, dtype: float64 # 對Series類型的數據的內容進行排序 sr.sort_values() 1047 1.1931 684 1.2247 1188 1.5909 841 1.6667 911 1.6854... 186 16.5493 296 19.2328 576 19.3622 356 21.2860 917 21.9040 Name: per_price, Length: 1251, dtype: float64 # 對Series類型的數據的索引進行排序 sr.sort_index() 0 7.0773 1 6.9291 2 3.9286 3 3.3568 4 5.0769... 1246 4.2553 1247 3.8060 1248 5.3412 1249 5.9508 1250 4.5244 Name: per_price, Length: 1251, dtype: float64

4.3 DataFrame運算

  • 算術運算
  • # 對Series類型進行操作 datas["roomnum"] + 3 0 6 1 7 2 4 3 4 4 5.. 1246 7 1247 5 1248 5 1249 5 1250 6 Name: roomnum, Length: 1251, dtype: int64 datas["roomnum"].add(3).head() 0 6 1 7 2 4 3 4 4 5 Name: roomnum, dtype: int64 datas.iloc[:,1:4] roomnumhallAREA01234...12461247124812491250
    3189.3
    41127.0
    1128.0
    1128.0
    2178.0
    .........
    4189.3
    2167.0
    2167.4
    2173.1
    3186.2

    1251 rows × 3 columns

    # 對DataFrame類型進行操作 datas.iloc[:,1:4] + 10 roomnumhallAREA01234...12461247124812491250
    131199.3
    1411137.0
    111138.0
    111138.0
    121188.0
    .........
    141199.3
    121177.0
    121177.4
    121183.1
    131196.2

    1251 rows × 3 columns

  • 邏輯運算
  • # 邏輯判斷的結果可以作為篩選的依據 datas['AREA'] > 100 0 False 1 True 2 False 3 False 4 False... 1246 False 1247 False 1248 False 1249 False 1250 False Name: AREA, Length: 1251, dtype: bool # 可以進行布爾索引 datas[datas['AREA'] > 100] districtroomnumhallAREAC_floorfloor_numper_price15162526...12321238123912411243
    baoan41127.00high316.9291
    baoan41125.17middle155.8161
    baoan31151.00high204.9669
    baoan31116.00high185.0000
    baoan51151.25high307.6033
    .....................
    baoan51127.17low245.1113
    baoan41130.74low3013.0029
    baoan31102.10middle2810.8717
    baoan51151.30high297.2703
    baoan41142.25high326.3269

    322 rows × 7 columns

    # 多個邏輯判斷 # 篩選面積大于100 并且 放假小于40000的數據 (datas["AREA"]>100) & (datas["per_price"]< 40000) 0 False 1 True 2 False 3 False 4 False... 1246 False 1247 False 1248 False 1249 False 1250 False Length: 1251, dtype: bool # 布爾索引 datas[(datas["AREA"]>100) & (datas["per_price"]< 40000)] districtroomnumhallAREAC_floorfloor_numper_price15162526...12321238123912411243
    baoan41127.00high316.9291
    baoan41125.17middle155.8161
    baoan31151.00high204.9669
    baoan31116.00high185.0000
    baoan51151.25high307.6033
    .....................
    baoan51127.17low245.1113
    baoan41130.74low3013.0029
    baoan31102.10middle2810.8717
    baoan51151.30high297.2703
    baoan41142.25high326.3269

    322 rows × 7 columns

    邏輯運算函數

    # 條件查詢函數 datas.query("AREA>100 & per_price<40000") districtroomnumhallAREAC_floorfloor_numper_price15162526...12321238123912411243
    baoan41127.00high316.9291
    baoan41125.17middle155.8161
    baoan31151.00high204.9669
    baoan31116.00high185.0000
    baoan51151.25high307.6033
    .....................
    baoan51127.17low245.1113
    baoan41130.74low3013.0029
    baoan31102.10middle2810.8717
    baoan51151.30high297.2703
    baoan41142.25high326.3269

    322 rows × 7 columns

    datas["roomnum"].isin([4,5]) 0 False 1 True 2 False 3 False 4 False... 1246 True 1247 False 1248 False 1249 False 1250 False Name: roomnum, Length: 1251, dtype: bool # 可以指定值進行判斷,從而進行篩選操作 # 篩選出房間數量為4或者5的數據 datas[datas["roomnum"].isin([4,5])] districtroomnumhallAREAC_floorfloor_numper_price15262936...12321238124112431246
    baoan41127.00high316.9291
    baoan41125.17middle155.8161
    baoan51151.25high307.6033
    baoan41143.45middle256.9711
    baoan41134.60middle329.1828
    .....................
    baoan51127.17low245.1113
    baoan41130.74low3013.0029
    baoan51151.30high297.2703
    baoan41142.25high326.3269
    baoan4189.30low84.2553

    224 rows × 7 columns

  • 統計運算
  • # 計算每一列的總數,均值,標準差,最小值,分位數,最大值等 datas.describe() roomnumhallAREAfloor_numper_pricecountmeanstdmin25%50%75%max
    1251.0000001251.01251.0000001251.0000001251.000000
    2.9064751.092.40997624.5987216.643429
    0.9406630.037.7981229.3321192.435132
    1.0000001.021.9500001.0000001.193100
    2.0000001.075.00000017.0000005.075850
    3.0000001.087.80000028.0000005.906800
    3.0000001.0101.37500031.0000007.761950
    8.0000001.0352.90000053.00000021.904000

    統計函數

    # axis=0 求每一列的最大值 axis=1求每一行的最大值 datas.max(axis=0) district zzzz roomnum 8 hall 1 AREA 352.9 C_floor middle floor_num 53 per_price 21.904 dtype: object # 方差 datas.var(axis=0) roomnum 0.884846 hall 0.000000 AREA 1428.698032 floor_num 87.088446 per_price 5.929870 dtype: float64 # 標準差 datas.std(axis=0) roomnum 0.940663 hall 0.000000 AREA 37.798122 floor_num 9.332119 per_price 2.435132 dtype: float64 datas.iloc[:,3] 0 89.3 1 127.0 2 28.0 3 28.0 4 78.0... 1246 89.3 1247 67.0 1248 67.4 1249 73.1 1250 86.2 Name: AREA, Length: 1251, dtype: float64 # 求最大值所在的下標(索引) datas.iloc[:,3].idxmax(axis=0) 759 datas.iloc[759,3] 352.9 # 求最小值所在的下標(索引) datas.iloc[:,3].idxmin(axis=0) 576 datas.iloc[576,3] 21.95

    累計統計函數

    datas["per_price"] 0 7.0773 1 6.9291 2 3.9286 3 3.3568 4 5.0769... 1246 4.2553 1247 3.8060 1248 5.3412 1249 5.9508 1250 4.5244 Name: per_price, Length: 1251, dtype: float64 # 累加 datas["per_price"].cumsum() 0 7.0773 1 14.0064 2 17.9350 3 21.2918 4 26.3687... 1246 8291.3076 1247 8295.1136 1248 8300.4548 1249 8306.4056 1250 8310.9300 Name: per_price, Length: 1251, dtype: float64 datas["per_price"].sort_index().cumsum().plot() <matplotlib.axes._subplots.AxesSubplot at 0x2039a3a3dc8>

    import matplotlib.pyplot as plt datas["per_price"].sort_index().cumsum().plot() plt.show()

  • 自定義運算
  • # 自定義一個計算最大值-最小值的函數 datas[["per_price"]].apply(lambda x : x.max()-x.min(),axis=0) per_price 20.7109 dtype: float64

    4.4 Pandas畫圖

    # 查看面積和房價之間的關系 datas.plot(x="AREA",y="per_price",kind="scatter") <matplotlib.axes._subplots.AxesSubplot at 0x203a343dec8>

    # 查看樓層和房價之間的關系 datas.plot(x="floor_num",y="per_price",kind="scatter") <matplotlib.axes._subplots.AxesSubplot at 0x203a3a81bc8>

    datas.plot(x="AREA",y="per_price",kind="barh") <matplotlib.axes._subplots.AxesSubplot at 0x203a2147f08>

    4.5 文件的讀取與存儲

    1.讀取csv文件 read_csv()

    iris_data = pd.read_csv("./datas/iris.data.csv") iris_data.head() feature1feature2feature3feature4result01234
    5.13.51.40.2Iris-setosa
    4.93.01.40.2Iris-setosa
    4.73.21.30.2Iris-setosa
    4.63.11.50.2Iris-setosa
    5.03.61.40.2Iris-setosa
    # usecols:指定讀取的列名,列表形式 iris_data1 = pd.read_csv("./datas/iris.data.csv",usecols=["feature1","feature2","result"]) iris_data1.head() feature1feature2result01234
    5.13.5Iris-setosa
    4.93.0Iris-setosa
    4.73.2Iris-setosa
    4.63.1Iris-setosa
    5.03.6Iris-setosa
    iris_data2 = pd.read_csv("./datas/iris.data2.csv") iris_data2.head() 5.13.51.40.2Iris-setosa01234
    4.93.01.40.2Iris-setosa
    4.73.21.30.2Iris-setosa
    4.63.11.50.2Iris-setosa
    5.03.61.40.2Iris-setosa
    5.43.91.70.4Iris-setosa
    # names:如果數據集本身沒有列名,可以自己指定列名 iris_data2 = pd.read_csv("./datas/iris.data2.csv",names=["feature1","feature2","feature3","feature4","result"]) iris_data2.head() feature1feature2feature3feature4result01234
    5.13.51.40.2Iris-setosa
    4.93.01.40.2Iris-setosa
    4.73.21.30.2Iris-setosa
    4.63.11.50.2Iris-setosa
    5.03.61.40.2Iris-setosa
    datas.head(5) districtroomnumhallAREAC_floorfloor_numper_price01234
    zzzz3189.3middle317.0773
    baoan41127.0high316.9291
    baoan1128.0low393.9286
    baoan1128.0middle303.3568
    baoan2178.0middle85.0769
    # 保存per_price列的數據 # 保存的時候index=False 去掉行索引 # mode="a" 追加數據 # header=False 不要重復追加列名 datas[:-1].to_csv("./price_test",columns=['per_price'],index=False,mode="a",header=False) # 讀取,查看數據 perice_test = pd.read_csv("./price_test") perice_test per_price01234...37463747374837493750
    7.0773
    6.9291
    3.9286
    3.3568
    5.0769
    ...
    6.1932
    4.2553
    3.806
    5.3412
    5.9508

    3751 rows × 1 columns

    總結

    以上是生活随笔為你收集整理的八、Pandas的基本使用的全部內容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。