當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

数据分析之pandas笔记

發布時間：2023/12/20 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了数据分析之pandas笔记小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Pandas

一個用于表示表格類型的內容

課時4：jupyter21 分22 秒
課時5：pandas的內容24 分31 秒
課時6：series內容38 分19 秒
課時7：dataframe25 分50 秒

# 載入pandas庫 import pandas as pd import numpy as np s = pd.Series([2,4,6,8,10]) s 0 2 1 4 2 6 3 8 4 10 dtype: int64 d = pd.DataFrame([[2,4,6,8,10],[7,3,4,7,15], ])d 0123401

2	4	6	8	10
7	3	4	7	15

d[0] 0 2 1 7 Name: 0, dtype: int64

這里要注意直接用中括號獲取的是,列,因為比如我們要獲取一個表中的age屬性,通常的拿這age一列的數據出來,所以想要獲取一條數據,需要再中括號一下

獲取一行怎么獲取

d.loc[0] 0 2 1 4 2 6 3 8 4 10 Name: 0, dtype: int64

這個給我們返回的是一個series
實際上這個dataframe是由多個series組成的
所以我們可以這么寫

d2 = pd.DataFrame([pd.Series([2,4,6,8,10]),pd.Series([7,3,4,7,15]), ]) d2 0123401

2	4	6	8	10
7	3	4	7	15

class1 = pd.Series({'hong': 50, 'huang': 90, 'qing': 60})# 修改字典索引 class1_values = {'hong': 50, 'huang': 90, 'qing': 60} class1_index = ['hong', 'lv', 'lan'] # 這個地方的鍵是根據index參數設置的,然后前面的那個字典的鍵就不要了 class1 = pd.Series(class1_values, index=class1_index) class1 hong 50.0 lv NaN lan NaN dtype: float64 class1# 值數據，輸出類型為array，還是ndarray數組 class1.values# 索引，輸出index類型（Pandas獨有的索引類型）,本質上就是ndarray class1.indexclass1.index[2] class1.index.values array(['hong', 'lv', 'lan'], dtype=object) class1_index class1.hong 50.0 class1[[1,2,0]] lv NaN lan NaN hong 50.0 dtype: float64 class1[0:1] hong 50.0 dtype: float64 # 直接就能記性判斷 class1 > 6 # 這個Nan值你怎么判斷都是False hong True lv False lan False dtype: bool # 還能這樣寫 # 這種寫法很類似于數據庫的寫法 class1[class1>6] hong 50.0 dtype: float64 # 直接就全都加一 class1+1 hong 51.0 lv NaN lan NaN dtype: float64

這種整體的加一,他是效率非常非常高的
如果是我們的列表,想要實現這個效果,那就得循環這個列表
從列表中獲取一個數據,把這個數據+1,放到新的列表中
而我們這個是將三條數據同時拿出來(就像并發一樣),然后同時進行+1操作
然后在同時放到一個新的里面.
我們可以通過那個運算時間的魔術命令來幫忙驗證一下

%%timeit # 修改字典索引 class2_values = [1024,3,5,7,9,10,13,115,127,149,221] # 這個地方的鍵是根據index參數設置的,然后前面的那個字典的鍵就不要了 class2 = pd.Series(class2_values) class2+1 198 μs ± 9.37 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each) %%timeit class2+1 100 μs ± 3.56 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each) %%timeit for i in range(100000):i+=1 4.12 ms ± 108 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) %%timeit a = pd.Series(range(100000)) a+1 562 μs ± 72 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

我猜可能是因為這個數據量不夠大,還顯示不出來這個庫的優勢,所以得多試試才行
有的時候需要用GPU來計算,如果用CPU,會非常耗CPU,因為GPU更擅長這種小量的計算,他就相當于一堆小學生,這中加減乘除,小學生比CPU數學家更厲害

# 不僅能夠進行加,減號,還能乘除,取余,底板除 print(class2 // 2) 11.0 11.0 class2 = pd.Series([1024,3,5,7,9,10,13,115,127,149,221]) # 平均數 print(class2.mean()) print(np.mean(class2)) class2 153.0 153.00 1024 1 3 2 5 3 7 4 9 5 10 6 13 7 115 8 127 9 149 10 221 dtype: int64 class3 = pd.Series([1024,13,5,7,9,10,1,115,127,149,221]) # 中位數 # 通過庫中的函數調用 print(np.median(class3)) # 自身屬性調用寫法 print(class3.median()) # 中位數如果有兩個數據,那就是這兩個數據的平均數 13.0 13.0 # 方差 class2.var() 89190.6 # 標準差 class2.std() 298.6479532827908 print(class2) print("-"*50) print(class2+1) print("-"*50) # 全判斷在不在容器中 # 這個容器包括類似于字典的鍵和值,都都算上,只有有都行,都算存在啊 print(10 in class2) print("-"*50) print(5 in class2 + 1) # 浮點數運算不準的問題 0 1024 1 3 2 5 3 7 4 9 5 10 6 13 7 115 8 127 9 149 10 221 dtype: int64 -------------------------------------------------- 0 1025 1 4 2 6 3 8 4 10 5 11 6 14 7 116 8 128 9 150 10 222 dtype: int64 -------------------------------------------------- True -------------------------------------------------- True # 然后問我們可以取出來values print(4 in class2) print(4 in class2.values) True False # values值修改 class2['ming'] = 0 class2['hua'] = 0 class2['hong'] = 0class2[['hua','hong']] = 55 class2[['hua','hong']] = [35, 55] class2['hua','hong'] = [1, 2] # 一層也可以 class2 0 1024 1 3 2 5 3 7 4 9 5 10 6 13 7 115 8 127 9 149 10 221 ming 0 hua 1 hong 2 dtype: int64 # 深拷貝 class4 = class2.copy() class4 = class4+1 print(class2) class4 0 1024 1 3 2 5 3 7 4 9 5 10 6 13 7 115 8 127 9 149 10 221 ming 0 hua 1 hong 2 dtype: int640 1025 1 4 2 6 3 8 4 10 5 11 6 14 7 116 8 128 9 150 10 222 ming 1 hua 2 hong 3 dtype: int64 # 索引也可以單獨的進行修改 class2.index = [22,23,24,28,24,29,1,2,3,4,8,5,9,21] class2 22 1024 23 3 24 5 28 7 24 9 29 10 1 13 2 115 3 127 4 149 8 221 5 0 9 1 21 2 dtype: int64 # 這個csv路徑不能有中文,否則獲取失敗 df = pd.read_csv("./source/test.csv") df roc1c2c3c4c5c6c7c8c9c10c11c12c13c14c15c16c17c1801234

a	0	5	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10
b	1	6	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11
c	2	7	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12
d	3	8	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13
e	4	9	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14

csv中的數據都是用逗號隔開的,出自:
python:pandas——read_csv方法

總結

以上是生活随笔為你收集整理的数据分析之pandas笔记的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： SAPUI5入门到精通5---MVC和数
下一篇： html表格里的超链接点不了,Excel

a	0	5	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10
b	1	6	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11
c	2	7	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12
d	3	8	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13
e	4	9	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14

a	0	5	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10
b	1	6	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11
c	2	7	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12
d	3	8	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13
e	4	9	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14

a	0	5	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10	10
b	1	6	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11	11
c	2	7	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12	12
d	3	8	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13	13
e	4	9	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14	14