當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

Python 数据分析三剑客之 Pandas（二）：Index 索引对象以及各种索引操作

發(fā)布時(shí)間：2023/12/10 python 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 数据分析三剑客之 Pandas（二）：Index 索引对象以及各种索引操作小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

CSDN 課程推薦：《邁向數(shù)據(jù)科學(xué)家：帶你玩轉(zhuǎn)Python數(shù)據(jù)分析》，講師齊偉，蘇州研途教育科技有限公司CTO，蘇州大學(xué)應(yīng)用統(tǒng)計(jì)專(zhuān)業(yè)碩士生指導(dǎo)委員會(huì)委員；已出版《跟老齊學(xué)Python：輕松入門(mén)》《跟老齊學(xué)Python：Django實(shí)戰(zhàn)》、《跟老齊學(xué)Python：數(shù)據(jù)分析》和《Python大學(xué)實(shí)用教程》暢銷(xiāo)圖書(shū)。

Pandas 系列文章：

Python 數(shù)據(jù)分析三劍客之 Pandas（一）：認(rèn)識(shí) Pandas 及其 Series、DataFrame 對(duì)象
Python 數(shù)據(jù)分析三劍客之 Pandas（二）：Index 索引對(duì)象以及各種索引操作
Python 數(shù)據(jù)分析三劍客之 Pandas（三）：算術(shù)運(yùn)算與缺失值的處理
Python 數(shù)據(jù)分析三劍客之 Pandas（四）：函數(shù)應(yīng)用、映射、排序和層級(jí)索引
Python 數(shù)據(jù)分析三劍客之 Pandas（五）：統(tǒng)計(jì)計(jì)算與統(tǒng)計(jì)描述
Python 數(shù)據(jù)分析三劍客之 Pandas（六）：GroupBy 數(shù)據(jù)分裂、應(yīng)用與合并
Python 數(shù)據(jù)分析三劍客之 Pandas（七）：合并數(shù)據(jù)集
Python 數(shù)據(jù)分析三劍客之 Pandas（八）：數(shù)據(jù)重塑、重復(fù)數(shù)據(jù)處理與數(shù)據(jù)替換
Python 數(shù)據(jù)分析三劍客之 Pandas（九）：時(shí)間序列
Python 數(shù)據(jù)分析三劍客之 Pandas（十）：數(shù)據(jù)讀寫(xiě)

另有 NumPy、Matplotlib 系列文章已更新完畢，歡迎關(guān)注：

NumPy 系列文章：https://itrhx.blog.csdn.net/category_9780393.html
Matplotlib 系列文章：https://itrhx.blog.csdn.net/category_9780418.html

推薦學(xué)習(xí)資料與網(wǎng)站（博主參與部分文檔翻譯）：

NumPy 官方中文網(wǎng)：https://www.numpy.org.cn/
Pandas 官方中文網(wǎng)：https://www.pypandas.cn/
Matplotlib 官方中文網(wǎng)：https://www.matplotlib.org.cn/
NumPy、Matplotlib、Pandas 速查表：https://github.com/TRHX/Python-quick-reference-table

文章目錄

【1】Index 索引對(duì)象
【2】Pandas 一般索引
- 【2.1】Series 索引
- - 【2.1.1】head() / tail()
  - 【2.1.2】行索引
  - 【2.1.3】切片索引
  - 【2.1.4】花式索引
  - 【2.1.5】布爾索引
- 【2.2】DataFrame 索引
- - 【2.2.1】head() / tail()
  - 【2.2.2】列索引
  - 【2.2.3】切片索引
  - 【2.2.4】花式索引
  - 【2.2.5】布爾索引
【3】索引器：loc 和 iloc
- 【3.1】loc 標(biāo)簽索引
- - 【3.1.1】Series.loc
  - 【3.1.2】DataFrame.loc
- 【3.2】iloc 位置索引
- - 【3.2.1】Series.iloc
  - 【3.2.2】DataFrame.iloc
【4】Pandas 重新索引

這里是一段防爬蟲(chóng)文本，請(qǐng)讀者忽略。本文原創(chuàng)首發(fā)于 CSDN，作者 TRHX。博客首頁(yè)：https://itrhx.blog.csdn.net/ 本文鏈接：https://itrhx.blog.csdn.net/article/details/106698307 未經(jīng)授權(quán)，禁止轉(zhuǎn)載！惡意轉(zhuǎn)載，后果自負(fù)！尊重原創(chuàng)，遠(yuǎn)離剽竊！

【1】Index 索引對(duì)象

Series 和 DataFrame 中的索引都是 Index 對(duì)象，為了保證數(shù)據(jù)的安全，索引對(duì)象是不可變的，如果嘗試更改索引就會(huì)報(bào)錯(cuò)；常見(jiàn)的 Index 種類(lèi)有：索引（Index），整數(shù)索引（Int64Index），層級(jí)索引（MultiIndex），時(shí)間戳類(lèi)型（DatetimeIndex）。

一下代碼演示了 Index 索引對(duì)象和其不可變的性質(zhì)：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj.index Index(['a', 'b', 'c', 'd'], dtype='object') >>> type(obj.index) <class 'pandas.core.indexes.base.Index'> >>> obj.index[0] = 'e' Traceback (most recent call last):File "<pyshell#28>", line 1, in <module>obj.index[0] = 'e'File "C:\Users\...\base.py", line 3909, in __setitem__raise TypeError("Index does not support mutable operations") TypeError: Index does not support mutable operations

index 索引對(duì)象常用屬性

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Index.html

屬性描述

T	轉(zhuǎn)置
array	index 的數(shù)組形式，常見(jiàn)官方文檔
dtype	返回基礎(chǔ)數(shù)據(jù)的 dtype 對(duì)象
hasnans	是否有 NaN（缺失值）
inferred_type	返回一個(gè)字符串，表示 index 的類(lèi)型
is_monotonic	判斷 index 是否是遞增的
is_monotonic_decreasing	判斷 index 是否單調(diào)遞減
is_monotonic_increasing	判斷 index 是否單調(diào)遞增
is_unique	index 是否沒(méi)有重復(fù)值
nbytes	返回 index 中的字節(jié)數(shù)
ndim	index 的維度
nlevels	Number of levels.
shape	返回一個(gè)元組，表示 index 的形狀
size	index 的大小
values	返回 index 中的值 / 數(shù)組

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj.index Index(['a', 'b', 'c', 'd'], dtype='object') >>> >>> obj.index.array <PandasArray> ['a', 'b', 'c', 'd'] Length: 4, dtype: object >>> >>> obj.index.dtype dtype('O') >>> >>> obj.index.hasnans False >>> >>> obj.index.inferred_type 'string' >>> >>> obj.index.is_monotonic True >>> >>> obj.index.is_monotonic_decreasing False >>> >>> obj.index.is_monotonic_increasing True >>> >>> obj.index.is_unique True >>> >>> obj.index.nbytes 16 >>> >>> obj.index.ndim 1 >>> >>> obj.index.nlevels 1 >>> >>> obj.index.shape (4,) >>> >>> obj.index.size 4 >>> >>> obj.index.values array(['a', 'b', 'c', 'd'], dtype=object)

index 索引對(duì)象常用方法

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Index.html

方法描述

all(self, args, *kwargs)	判斷所有元素是否為真，有 0 會(huì)被視為 False
any(self, args, *kwargs)	判斷是否至少有一個(gè)元素為真，均為 0 會(huì)被視為 False
append(self, other)	連接另一個(gè) index，產(chǎn)生一個(gè)新的 index
argmax(self[, axis, skipna])	返回 index 中最大值的索引值
argmin(self[, axis, skipna])	返回 index 中最小值的索引值
argsort(self, args, *kwargs)	對(duì) index 從小到大排序，返回排序后的元素在原 index 中的索引值
delete(self, loc)	刪除指定索引位置的元素，返回刪除后的新 index
difference(self, other[, sort])	在第一個(gè) index 中刪除第二個(gè) index 中的元素，即差集
drop(self, labels[, errors])	在原 index 中刪除傳入的值
drop_duplicates(self[, keep])	刪除重復(fù)值，keep 參數(shù)可選值如下： ‘first’：保留第一次出現(xiàn)的重復(fù)項(xiàng)； ‘last’：保留最后一次出現(xiàn)的重復(fù)項(xiàng)； False：不保留重復(fù)項(xiàng)
duplicated(self[, keep])	判斷是否為重復(fù)值，keep 參數(shù)可選值如下： ‘first’：第一次重復(fù)的為 False，其他為 True； ‘last’：最后一次重復(fù)的為 False，其他為 True； False：所有重復(fù)的均為 True
dropna(self[, how])	刪除缺失值，即 NaN
fillna(self[, value, downcast])	用指定值填充缺失值，即 NaN
equals(self, other)	判斷兩個(gè) index 是否相同
insert(self, loc, item)	將元素插入到指定索引處，返回新的 index
intersection(self, other[, sort])	返回兩個(gè) index 的交集
isna(self)	檢測(cè) index 元素是否為缺失值，即 NaN
isnull(self)	檢測(cè) index 元素是否為缺失值，即 NaN
max(self[, axis, skipna])	返回 index 的最大值
min(self[, axis, skipna])	返回 index 的最小值
union(self, other[, sort])	返回兩個(gè) index 的并集
unique(self[, level])	返回 index 中的唯一值，相當(dāng)于去除重復(fù)值

all(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd >>> pd.Index([1, 2, 3]).all() True >>> >>> pd.Index([0, 1, 2]).all() False

any(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd >>> pd.Index([0, 0, 1]).any() True >>> >>> pd.Index([0, 0, 0]).any() False

append(self, other) 【官方文檔】

>>> import pandas as pd >>> pd.Index(['a', 'b', 'c']).append(pd.Index([1, 2, 3])) Index(['a', 'b', 'c', 1, 2, 3], dtype='object')

argmax(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argmax() 3

argmin(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argmin() 4

argsort(self, *args, **kwargs) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).argsort() array([4, 1, 2, 0, 3], dtype=int32)

delete(self, loc) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).delete(0) Int64Index([2, 3, 9, 1], dtype='int64')

difference(self, other[, sort]) 【官方文檔】

>>> import pandas as pd >>> idx1 = pd.Index([2, 1, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.difference(idx2) Int64Index([1, 2], dtype='int64') >>> idx1.difference(idx2, sort=False) Int64Index([2, 1], dtype='int64')

drop(self, labels[, errors]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).drop([2, 1]) Int64Index([5, 3, 9], dtype='int64')

drop_duplicates(self[, keep]) 【官方文檔】

>>> import pandas as pd >>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo']) >>> idx.drop_duplicates(keep='first') Index(['lama', 'cow', 'beetle', 'hippo'], dtype='object') >>> idx.drop_duplicates(keep='last') Index(['cow', 'beetle', 'lama', 'hippo'], dtype='object') >>> idx.drop_duplicates(keep=False) Index(['cow', 'beetle', 'hippo'], dtype='object')

duplicated(self[, keep]) 【官方文檔】

>>> import pandas as pd >>> idx = pd.Index(['lama', 'cow', 'lama', 'beetle', 'lama']) >>> idx.duplicated() array([False, False, True, False, True]) >>> idx.duplicated(keep='first') array([False, False, True, False, True]) >>> idx.duplicated(keep='last') array([ True, False, True, False, False]) >>> idx.duplicated(keep=False) array([ True, False, True, False, True])

dropna(self[, how]) 【官方文檔】

>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).dropna() Float64Index([2.0, 5.0, 6.0], dtype='float64')

fillna(self[, value, downcast]) 【官方文檔】

>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).fillna(5) Float64Index([2.0, 5.0, 5.0, 6.0, 5.0, 5.0], dtype='float64')

equals(self, other) 【官方文檔】

>>> import pandas as pd >>> idx1 = pd.Index([5, 2, 3, 9, 1]) >>> idx2 = pd.Index([5, 2, 3, 9, 1]) >>> idx1.equals(idx2) True >>> >>> idx1 = pd.Index([5, 2, 3, 9, 1]) >>> idx2 = pd.Index([5, 2, 4, 9, 1]) >>> idx1.equals(idx2) False

intersection(self, other[, sort]) 【官方文檔】

>>> import pandas as pd >>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.intersection(idx2) Int64Index([3, 4], dtype='int64')

insert(self, loc, item) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).insert(2, 'A') Index([5, 2, 'A', 3, 9, 1], dtype='object')

isna(self) 【官方文檔】、isnull(self) 【官方文檔】

>>> import numpy as np >>> import pandas as pd >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isna() array([False, False, True, False, True, True]) >>> pd.Index([2, 5, np.NaN, 6, np.NaN, np.NaN]).isnull() array([False, False, True, False, True, True])

max(self[, axis, skipna]) 【官方文檔】、min(self[, axis, skipna]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 2, 3, 9, 1]).max() 9 >>> pd.Index([5, 2, 3, 9, 1]).min() 1

union(self, other[, sort]) 【官方文檔】

>>> import pandas as pd >>> idx1 = pd.Index([1, 2, 3, 4]) >>> idx2 = pd.Index([3, 4, 5, 6]) >>> idx1.union(idx2) Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')

unique(self[, level]) 【官方文檔】

>>> import pandas as pd >>> pd.Index([5, 1, 3, 5, 1]).unique() Int64Index([5, 1, 3], dtype='int64')

【2】Pandas 一般索引

由于在 Pandas 中，由于有一些更高級(jí)的索引操作，比如重新索引，層級(jí)索引等，因此將一般的切片索引、花式索引、布爾索引等歸納為一般索引。

【2.1】Series 索引

【2.1.1】head() / tail()

Series.head() 和 Series.tail() 方法可以獲取的前五行和后五行數(shù)據(jù)，如果向 head() / tail() 里面?zhèn)魅雲(yún)?shù)，則會(huì)獲取指定行：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series(np.random.randn(8)) >>> obj 0 -0.643437 1 -0.365652 2 -0.966554 3 -0.036127 4 1.046095 5 -2.048362 6 -1.865551 7 1.344728 dtype: float64 >>> >>> obj.head() 0 -0.643437 1 -0.365652 2 -0.966554 3 -0.036127 4 1.046095 dtype: float64 >>> >>> obj.head(3) 0 -0.643437 1 -0.365652 2 -0.966554 dtype: float64 >>> >>> obj.tail() 3 1.221221 4 -1.373496 5 1.032843 6 0.029734 7 -1.861485 dtype: float64 >>> >>> obj.tail(3) 5 1.032843 6 0.029734 7 -1.861485 dtype: float64

【2.1.2】行索引

Pandas 中可以按照位置進(jìn)行索引，也可以按照索引名（index）進(jìn)行索引，也可以用 Python 字典的表達(dá)式和方法來(lái)獲取值：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> obj['c'] -8 >>> obj[2] -8 >>> 'b' in obj True >>> obj.keys() Index(['a', 'b', 'c', 'd'], dtype='object') >>> list(obj.items()) [('a', 1), ('b', 5), ('c', -8), ('d', 2)]

【2.1.3】切片索引

切片的方法有兩種：按位置切片和按索引名（index）切片，注意：按位置切片時(shí)，不包含終止索引；按索引名（index）切片時(shí)，包含終止索引。

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj[1:3] b 5 c -8 dtype: int64 >>> >>> obj[0:3:2] a 1 c -8 dtype: int64 >>> >>> obj['b':'d'] b 5 c -8 d 2 dtype: int64

【2.1.4】花式索引

所謂的花式索引，就是間隔索引、不連續(xù)的索引，傳遞一個(gè)由索引名（index）或者位置參數(shù)組成的列表來(lái)一次性獲得多個(gè)元素：

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj[[0, 2]] a 1 c -8 dtype: int64 >>> >>> obj[['a', 'c', 'd']] a 1 c -8 d 2 dtype: int64

【2.1.5】布爾索引

可以通過(guò)一個(gè)布爾數(shù)組來(lái)索引目標(biāo)數(shù)組，即通過(guò)布爾運(yùn)算（如：比較運(yùn)算符）來(lái)獲取符合指定條件的元素的數(shù)組。

>>> import pandas as pd >>> obj = pd.Series([1, 5, -8, 2, -3], index=['a', 'b', 'c', 'd', 'e']) >>> obj a 1 b 5 c -8 d 2 e -3 dtype: int64 >>> >>> obj[obj > 0] a 1 b 5 d 2 dtype: int64 >>> >>> obj > 0 a True b True c False d True e False dtype: bool

【2.2】DataFrame 索引

【2.2.1】head() / tail()

和 Series 一樣，DataFrame.head() 和 DataFrame.tail() 方法同樣可以獲取 DataFrame 的前五行和后五行數(shù)據(jù)，如果向 head() / tail() 里面?zhèn)魅雲(yún)?shù)，則會(huì)獲取指定行：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(8,4), columns = ['a', 'b', 'c', 'd']) >>> obja b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812 >>> >>> obj.head()a b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 >>> >>> obj.head(3)a b c d 0 -1.399390 0.521596 -0.869613 0.506621 1 -0.748562 -0.364952 0.188399 -1.402566 2 1.378776 -1.476480 0.361635 0.451134 >>> >>> obj.tail()a b c d 3 -0.206405 -1.188609 3.002599 0.563650 4 0.993289 1.133748 1.177549 -2.562286 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812 >>> >>> obj.tail(3)a b c d 5 -0.482157 1.069293 1.143983 -1.303079 6 -1.199154 0.220360 0.801838 -0.104533 7 -1.359816 -2.092035 2.003530 -0.151812

【2.2.2】列索引

DataFrame 可以按照列標(biāo)簽（columns）來(lái)進(jìn)行列索引：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(7,2), columns = ['a', 'b']) >>> obja b 0 -1.198795 0.928378 1 -2.878230 0.014650 2 2.267475 0.370952 3 0.639340 -1.301041 4 -1.953444 0.148934 5 -0.445225 0.459632 6 0.097109 -2.592833 >>> >>> obj['a'] 0 -1.198795 1 -2.878230 2 2.267475 3 0.639340 4 -1.953444 5 -0.445225 6 0.097109 Name: a, dtype: float64 >>> >>> obj[['a']]a 0 -1.198795 1 -2.878230 2 2.267475 3 0.639340 4 -1.953444 5 -0.445225 6 0.097109 >>> >>> type(obj['a']) <class 'pandas.core.series.Series'> >>> type(obj[['a']]) <class 'pandas.core.frame.DataFrame'>

【2.2.3】切片索引

DataFrame 中的切片索引是針對(duì)行來(lái)操作的，切片的方法有兩種：按位置切片和按索引名（index）切片，注意：按位置切片時(shí)，不包含終止索引；按索引名（index）切片時(shí)，包含終止索引。

>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obja b c d I1 0.828676 -1.663337 1.753632 1.432487 I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 I4 3.224288 -0.500268 1.293596 -1.235549 I5 -0.938833 -0.804433 -0.170047 -0.566766 >>> >>> obj[0:3]a b c d I1 0.828676 -1.663337 1.753632 1.432487 I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 >>> >>> obj[0:4:2]a b c d I1 -0.042168 1.437354 -1.114545 0.830790 I3 0.241506 0.018984 -0.499151 -1.190143 >>> >>> obj['I2':'I4']a b c d I2 0.368138 0.222166 0.902764 -1.436186 I3 2.285615 -2.415175 -1.344456 -0.502214 I4 3.224288 -0.500268 1.293596 -1.235549

【2.2.4】花式索引

和 Series 一樣，所謂的花式索引，就是間隔索引、不連續(xù)的索引，傳遞一個(gè)由列名（columns）組成的列表來(lái)一次性獲得多列元素：

>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obja b c d I1 -1.083223 -0.182874 -0.348460 -1.572120 I2 -0.205206 -0.251931 1.180131 0.847720 I3 -0.980379 0.325553 -0.847566 -0.882343 I4 -0.638228 -0.282882 -0.624997 -0.245980 I5 -0.229769 1.002930 -0.226715 -0.916591 >>> >>> obj[['a', 'd']]a d I1 -1.083223 -1.572120 I2 -0.205206 0.847720 I3 -0.980379 -0.882343 I4 -0.638228 -0.245980 I5 -0.229769 -0.916591

【2.2.5】布爾索引

>>> import pandas as pd >>> import numpy as np >>> data = np.random.randn(5,4) >>> index = ['I1', 'I2', 'I3', 'I4', 'I5'] >>> columns = ['a', 'b', 'c', 'd'] >>> obj = pd.DataFrame(data, index, columns) >>> obja b c d I1 -0.602984 -0.135716 0.999689 -0.339786 I2 0.911130 -0.092485 -0.914074 -0.279588 I3 0.849606 -0.420055 -1.240389 -0.179297 I4 0.249986 -1.250668 0.329416 -1.105774 I5 -0.743816 0.430647 -0.058126 -0.337319 >>> >>> obj[obj > 0]a b c d I1 NaN NaN 0.999689 NaN I2 0.911130 NaN NaN NaN I3 0.849606 NaN NaN NaN I4 0.249986 NaN 0.329416 NaN I5 NaN 0.430647 NaN NaN >>> >>> obj > 0a b c d I1 False False True False I2 True False False False I3 True False False False I4 True False True False I5 False True False False

【3】索引器：loc 和 iloc

loc 是標(biāo)簽索引、iloc 是位置索引，注意：在 Pandas1.0.0 之前還有 ix 方法（即可按標(biāo)簽也可按位置索引），在 Pandas1.0.0 之后已被移除。

【3.1】loc 標(biāo)簽索引

loc 標(biāo)簽索引，即根據(jù) index 和 columns 來(lái)選擇數(shù)據(jù)。

【3.1.1】Series.loc

在 Series 中，允許輸入：

單個(gè)標(biāo)簽，例如 5 或 'a'，（注意，5 是 index 的名稱(chēng)，而不是位置索引）；
標(biāo)簽列表或數(shù)組，例如 ['a', 'b', 'c']；
帶有標(biāo)簽的切片對(duì)象，例如 'a':'f'。

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Series.loc.html

>>> import pandas as np >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj.loc['a'] 1 >>> >>> obj.loc['a':'c'] a 1 b 5 c -8 dtype: int64 >>> >>> obj.loc[['a', 'd']] a 1 d 2 dtype: int64

【3.1.2】DataFrame.loc

在 DataFrame 中，第一個(gè)參數(shù)索引行，第二個(gè)參數(shù)是索引列，允許輸入的格式和 Series 大同小異。

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

>>> import pandas as pd >>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C']) >>> objA B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.loc['a'] A 1 B 2 C 3 Name: a, dtype: int64 >>> >>> obj.loc['a':'c']A B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.loc[['a', 'c']]A B C a 1 2 3 c 7 8 9 >>> >>> obj.loc['b', 'B'] 5 >>> obj.loc['b', 'A':'C'] A 4 B 5 C 6 Name: b, dtype: int64

【3.2】iloc 位置索引

作用和 loc 一樣，不過(guò)是基于索引的編號(hào)來(lái)索引，即根據(jù) index 和 columns 的位置編號(hào)來(lái)選擇數(shù)據(jù)。

【3.2.1】Series.iloc

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.Series.iloc.html

在 Series 中，允許輸入：

整數(shù)，例如 5；
整數(shù)列表或數(shù)組，例如 [4, 3, 0]；
具有整數(shù)的切片對(duì)象，例如 1:7。

>>> import pandas as np >>> obj = pd.Series([1, 5, -8, 2], index=['a', 'b', 'c', 'd']) >>> obj a 1 b 5 c -8 d 2 dtype: int64 >>> >>> obj.iloc[1] 5 >>> >>> obj.iloc[0:2] a 1 b 5 dtype: int64 >>> >>> obj.iloc[[0, 1, 3]] a 1 b 5 d 2 dtype: int64

【3.2.2】DataFrame.iloc

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

在 DataFrame 中，第一個(gè)參數(shù)索引行，第二個(gè)參數(shù)是索引列，允許輸入的格式和 Series 大同小異：

>>> import pandas as pd >>> obj = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a', 'b', 'c'], columns=['A', 'B', 'C']) >>> objA B C a 1 2 3 b 4 5 6 c 7 8 9 >>> >>> obj.iloc[1] A 4 B 5 C 6 Name: b, dtype: int64 >>> >>> obj.iloc[0:2]A B C a 1 2 3 b 4 5 6 >>> >>> obj.iloc[[0, 2]]A B C a 1 2 3 c 7 8 9 >>> >>> obj.iloc[1, 2] 6 >>> >>> obj.iloc[1, 0:2] A 4 B 5 Name: b, dtype: int64

【4】Pandas 重新索引

Pandas 對(duì)象的一個(gè)重要方法是 reindex，其作用是創(chuàng)建一個(gè)新對(duì)象，它的數(shù)據(jù)符合新的索引。以 DataFrame.reindex 為例（Series 類(lèi)似），基本語(yǔ)法如下：

DataFrame.reindex(self, labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

部分參數(shù)描述如下：（完整參數(shù)解釋參見(jiàn)官方文檔）

參數(shù)描述

index	用作索引的新序列，既可以是 index 實(shí)例，也可以是其他序列型的 Python 數(shù)據(jù)結(jié)構(gòu)
method	插值（填充）方式，取值如下： None：不填補(bǔ)空白； pad / ffill：將上一個(gè)有效的觀測(cè)值向前傳播到下一個(gè)有效的觀測(cè)值； backfill / bfill：使用下一個(gè)有效觀察值來(lái)填補(bǔ)空白； nearest：使用最近的有效觀測(cè)值來(lái)填補(bǔ)空白。
fill_value	在重新索引的過(guò)程中，需要引入缺失值時(shí)使用的替代值
limit	前向或后向填充時(shí)的最大填充量
tolerance	向前或向后填充時(shí)，填充不準(zhǔn)確匹配項(xiàng)的最大間距（絕對(duì)值距離）
level	在 Multilndex 的指定級(jí)別上匹配簡(jiǎn)單索引，否則選其子集
copy	默認(rèn)為 True，無(wú)論如何都復(fù)制；如果為 False，則新舊相等就不復(fù)制

reindex 將會(huì)根據(jù)新索引進(jìn)行重排。如果某個(gè)索引值當(dāng)前不存在，就引入缺失值：

>>> import pandas as pd >>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c']) >>> obj d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64 >>> >>> obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e']) >>> obj2 a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64

對(duì)于時(shí)間序列這樣的有序數(shù)據(jù)，重新索引時(shí)可能需要做一些插值處理。method 選項(xiàng)即可達(dá)到此目的，例如，使用 ffill 可以實(shí)現(xiàn)前向值填充：

>>> import pandas as pd >>> obj = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4]) >>> obj 0 blue 2 purple 4 yellow dtype: object >>> >>> obj2 = obj.reindex(range(6), method='ffill') >>> obj2 0 blue 1 blue 2 purple 3 purple 4 yellow 5 yellow dtype: object

借助 DataFrame，reindex可以修改（行）索引和列。只傳遞一個(gè)序列時(shí)，會(huì)重新索引結(jié)果的行：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California']) >>> objOhio Texas California a 0 1 2 c 3 4 5 d 6 7 8 >>> >>> obj2 = obj.reindex(['a', 'b', 'c', 'd']) >>> obj2Ohio Texas California a 0.0 1.0 2.0 b NaN NaN NaN c 3.0 4.0 5.0 d 6.0 7.0 8.0

列可以用 columns 關(guān)鍵字重新索引：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California']) >>> objOhio Texas California a 0 1 2 c 3 4 5 d 6 7 8 >>> >>> states = ['Texas', 'Utah', 'California'] >>> obj.reindex(columns=states)Texas Utah California a 1 NaN 2 c 4 NaN 5 d 7 NaN 8

總結(jié)

以上是生活随笔為你收集整理的Python 数据分析三剑客之 Pandas（二）：Index 索引对象以及各种索引操作的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：兴业爱奇艺联名信用卡怎么样兴业爱奇艺信
下一篇： Python3 已经安装相关库，Pych