當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

python数据分析第三方库是_python数据分析复盘——数据分析相关库之Pandas

發(fā)布時(shí)間：2025/3/15 python 20 豆豆

生活随笔收集整理的這篇文章主要介紹了 python数据分析第三方库是_python数据分析复盘——数据分析相关库之Pandas 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

編輯推薦:

本文來(lái)源csdn，本文主要對(duì)Python的第三方庫(kù)Pandas，進(jìn)行高性能易用數(shù)據(jù)類型和分析。

1.Pandas 簡(jiǎn)介

1.1 pandas是什么

Pandas是Python第三方庫(kù)，提供高性能易用數(shù)據(jù)類型和分析工具

Pandas基于NumPy實(shí)現(xiàn) ，常與NumPy和Matplotlib一同使用

1.2 pandas vs numpy

2.Pandas庫(kù)的Series類型

2.1 Series的結(jié)構(gòu)

#多維一列，形式是：索引+值。（省略index會(huì)自動(dòng)生成，從0開(kāi)始）

>>> pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])

a 1

b 2

c 3

d 4

e 5

dtype: int64

2.2 Series的創(chuàng)建

Series類型可以由如下類型創(chuàng)建：

1.Python列表

2.標(biāo)量值

3.Python字典

4.ndarray

5.其他函數(shù)，range()等

#標(biāo)量值

>>> pd.Series(5)

0 5

dtype: int64

#標(biāo)量值+index 結(jié)果會(huì)根據(jù)索引重新排序

pd.Series(5,index=['a','v','c','d','e'])

a 5

v 5

c 5

d 5

e 5

dtype: int64

#字典

>>> pd.Series({'a':999,'v':888,'c':756,

'd':7,'e':437})

a 999

c 756

d 7

e 437

v 888

dtype: int64

#字典+index

>>>pd.Series({'a':999,'v':888,'c':756,

'd':7,'e':437},index=['a','v'])

a 999

v 888

dtype: int64

#用ndarray創(chuàng)建

>>> pd.Series(np.arange(5),index=np.

arange(14,9,-1))

14 0

13 1

12 2

11 3

10 4

dtype: int32

Seriesd的創(chuàng)建總結(jié)：

1.Series類型可以由如下類型創(chuàng)建：

2.Python列表，index與列表元素個(gè)數(shù)一致

3.標(biāo)量值，index表達(dá)Series類型的尺寸

4.Python字典，鍵值對(duì)中的“鍵”是索引，index從字典中進(jìn)行選擇操作

5.ndarray，索引和數(shù)據(jù)都可以通過(guò)ndarray類型創(chuàng)建

6.其他函數(shù)，range()函數(shù)等

2.3 Series基本操作

1.Series類型包括index和values兩部分

2.Series類型的操作類似ndarray類型

3.Series類型的操作類似Python字典類型

（1）Series基本操作

#Series基本操作

>>>a=pd.Series({'a':1,'v':2,'c':3,'d':4,'e':5})

>>> a.index

Index(['a', 'c', 'd', 'e', 'v'], dtype='object')

>>> a.values

array([1, 3, 4, 5, 2], dtype=int64)

#兩套索引并存，但不能混用

>>> a[['a','v']]

a 1

v 2

dtype: int64

>>> a[[0,4]]

a 1

v 2

dtype: int64

#混用，以靠前的為準(zhǔn)

>>> a[['a',4]]

a 1.0

4 NaN

dtype: float64

(2)Series類型的操作類似ndarray類型：

索引方法相同，采用 [ ]

可以通過(guò)自定義索引的列表進(jìn)行切片

可以通過(guò)自動(dòng)索引進(jìn)行切片，如果存在自定義索引，則一同被切片

#采用

[]切片

>>> a=pd.Series({'a':1,'v':2,'c':3,'d':4,'e':5})

>>> a[:3]

a 1

c 3

d 4

dtype: int64

#在索引前進(jìn)行運(yùn)算

>>> a[a>a.median()]

d 4

e 5

dtype: int64

#以自然常數(shù)e為底的指數(shù)函數(shù)

>>> np.exp(a)

a 2.718282

c 20.085537

d 54.598150

e 148.413159

v 7.389056

dtype: float64

(3)Series類型的操作(類似Python)：

通過(guò)自定義索引訪問(wèn)

保留字in操作

使用.get()方法

#保留字in

>>> a=pd.Series({'a':1,'v':2,'c':3,'d':4,'e':5})

>>> 'a' in a

True

>>> 'v' in a

True

#只匹配索引

>>> 1 in a

False

2.4 Series對(duì)齊操作

#Series類型在運(yùn)算中會(huì)自動(dòng)對(duì)齊不同索引的數(shù)據(jù).(即對(duì)不齊，就當(dāng)缺失項(xiàng)處理)

>>> a=pd.Series({'a':1,'v':2,'c':3,'d':4,'e':5})

>>> b=pd.Series({'a':1,'b':2,'c':3,'d':4,'e':5})

>>> a+b

a 2.0

b NaN

c 6.0

d 8.0

e 10.0

v NaN

dtype: float64

2.5 Series的name屬性

#Series對(duì)象和索引都可以有一個(gè)名字，存儲(chǔ)在屬性.name中

a=pd.Series({'a':1,'v':2,'c':3,'d':4,'e':5})

>>> a.name

>>> a.name="精忠跳水隊(duì)"

>>> a.name

'精忠跳水隊(duì)'

>>> a

a 1

c 3

d 4

e 5

v 2

Name: 精忠跳水隊(duì), dtype: int64

2.6 Series小結(jié)

Series是一維帶“標(biāo)簽”數(shù)組

index_0 → data_a

Series基本操作類似ndarray和字典，根據(jù)索引對(duì)齊

3.Pandas庫(kù)的DataFrame類型

3.1 DataFrame結(jié)構(gòu)

#DataFrame是一個(gè)表格型的數(shù)據(jù)類型，每列值類型可以不同

#DataFrame既有行索引、也有列索引

>>> df = pd.DataFrame(np.random.randint(1,10,(4,5)))

>>> df

0 1 2 3 4

0 8 5 4 1 1

1 3 4 2 7 3

2 4 3 8 9 9

3 7 8 9 1 7

3.2 DataFrame的創(chuàng)建

DataFrame類型可以由如下類型創(chuàng)建：

ndarray對(duì)象

由一維ndarray、列表、字典、元組或Series構(gòu)成的字典

Series類型

其他的DataFrame類型

#由字典創(chuàng)建

（自定義行列索引，會(huì)自動(dòng)補(bǔ)齊缺失的值為NAN）

>>> df=pd.DataFrame({'one':pd.Series([1,2,3],index=

['a','v','c']),'two':pd.Series([1,2,3,4,5],index=

['a','b','c','d','e'])})

>>> df

one two

a 1.0 1.0

b NaN 2.0

c 3.0 3.0

d NaN 4.0

e NaN 5.0

v 2.0 NaN

#由字典+列表創(chuàng)建。統(tǒng)一index，尺寸必須相同

>>> df=pd.DataFrame({'one':[1,2,3],'two':[2,2,3],'three'

:[3,2,3]},index=['a','b','c'])

>>> df

one three two

a 1 3 2

b 2 2 2

c 3 3 3

#索引（類似Series，依據(jù)行列索引）

>>> df['one']['a']

>>> df['three']['c']

3.3 pandas數(shù)據(jù)類型操作——重新索引

#由a

b c改為c a b

>>> df.reindex(['c','a','b'])

one three two

c 3 3 3

a 1 3 2

b 2 2 2

#重排并增加列

>>> df.reindex(columns=['three','two','one','two'])

three two one two

a 3 2 1 2

b 2 2 2 2

c 3 3 3 3

#原始的數(shù)據(jù)

>>> df

one three two

a 1 3 2

b 2 2 2

c 3 3 3

#插入列

>>> newc=df.columns.insert(3,'新增')

>>> newc

Index(['one', 'three', 'two', '新增'], dtype='object')

#插入新數(shù)據(jù)

>>> newd=df.reindex(columns=newc,fill_value=99)

>>> newd

one three two 新增

a 1 3 2 99

b 2 2 2 99

c 3 3 3 99

3.4pandas數(shù)據(jù)類型操作——索引類型

>>>

one three two

a 1 3 2

b 2 2 2

c 3 3 3

>>> nc=df.columns.delete(1)

>>> ni=df.index.insert(3,'new_index')

#無(wú)填充

>>> df.reindex(columns=nc,index=ni)

one two

a 1.0 2.0

b 2.0 2.0

c 3.0 3.0

new_index NaN NaN

#有填充

>>> df.reindex(columns=nc,index=ni,method='ffill')

one two

a 1 2

b 2 2

c 3 3

new_index 3 3

#刪除行列

#默認(rèn)刪除行

>>> df.drop('b')

one three two

a 1 3 2

c 3 3 3

#軸1為列

>>> df.drop('three',axis=1)

one two

a 1 2

b 2 2

c 3 3

3.5pandas數(shù)據(jù)類型運(yùn)算——算數(shù)運(yùn)算

算數(shù)運(yùn)算法則：

算術(shù)運(yùn)算根據(jù)行列索引，補(bǔ)齊后運(yùn)算，運(yùn)算默認(rèn)產(chǎn)生浮點(diǎn)數(shù)

補(bǔ)齊時(shí)缺項(xiàng)填充NaN (空值)

二維和一維、一維和零維間為廣播運(yùn)算

采用+ ‐ * /符號(hào)進(jìn)行的二元運(yùn)算產(chǎn)生新的對(duì)象

（1）采用+ ‐ * /符號(hào)進(jìn)行的二元運(yùn)算：

#用符號(hào)運(yùn)算，無(wú)法處理缺失值

>>> df1 =pd.DataFrame({'one':[1,2,3],'two':

[4,5,6]},index=['a','b','c'])

>>> df2 =pd.DataFrame({'one':[1,2,3]},

index=['a','b','c'])

>>> df1+df2

one two

a 2 NaN

b 4 NaN

c 6 NaN

（2）采用方法形式進(jìn)行二元運(yùn)算：

>>>

df1 =pd.DataFrame({'one':[1,2,3],'two':[4,5,6]},

index=['a','b','c'])

>>> df2 =pd.DataFrame({'one':[1,2,3]},index=['a','b','c'])

#用方法進(jìn)行運(yùn)算，可選參數(shù)處理缺失值

>>> df1.add(df2,fill_value=0)

one two

a 2 4.0

b 4 5.0

c 6 6.0

#運(yùn)算方式

#只對(duì)對(duì)應(yīng)維度及對(duì)應(yīng)位置進(jìn)行運(yùn)算，常數(shù)則進(jìn)行廣播運(yùn)算。

無(wú)匹配位置，則置為NAN

df =pd.DataFrame({'one':[1,2,3],'two':[2,2,3],'three'

:[3,2,3]},index=['a','b','c'])

df3=df=pd.DataFrame({'one':[1,2,3]},index=['a','b','c'])

df4=pd.DataFrame({'two':[2,2,3]},index=['a','b','c'])

#常數(shù)

>>> df3-1

one

a 0

b 1

c 2

>>> df -1

one three two

a 0 2 1

b 1 1 1

c 2 2 2

#對(duì)應(yīng)維度

>>> df3 -df

one three two

a 0 NaN NaN

b 0 NaN NaN

c 0 NaN NaN

>>> df-df4

one three two

a NaN NaN 0

b NaN NaN 0

c NaN NaN 0

3.6pandas數(shù)據(jù)類型運(yùn)算——比較運(yùn)算

（1）法則

比較運(yùn)算只能比較相同索引的元素，不進(jìn)行補(bǔ)齊

二維和一維、一維和零維間為廣播運(yùn)算

采用>、<、 >=、 <= 、==、 !=等符號(hào)進(jìn)行的二元運(yùn)算產(chǎn)生布爾對(duì)象

>>>

dfx

one three two

a 1 3 2

b 1 3 2

c 1 3 2

>>> df

one three two

a 1 3 2

b 2 2 2

c 3 3 3

>>> df>dfx

one three two

a False False False

b True False False

c True False True

4.Pandas數(shù)據(jù)類型小結(jié)

1.據(jù)類型與索引的關(guān)系，操作索引即操作數(shù)據(jù)

2.Series = 索引+ 一維數(shù)據(jù)

3.DataFrame = 行列索引+ 多維數(shù)據(jù)

4.重新索引、數(shù)據(jù)刪除、算術(shù)運(yùn)算、比較運(yùn)算

5.像對(duì)待單一數(shù)據(jù)一樣對(duì)待Series和DataFrame對(duì)象

總結(jié)

以上是生活随笔為你收集整理的python数据分析第三方库是_python数据分析复盘——数据分析相关库之Pandas的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： python array函数_Pytho
下一篇： python3写一个计算器_Python