當前位置：首頁 > 编程语言 > python >内容正文

python

python：去重（list，dataframe）

發布時間：2025/3/20 python 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 python：去重（list，dataframe）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.對列表去重
1.1.用for或while去重
1.2.用集合的特性set()

>>> l = [1,4,3,3,4,2,3,4,5,6,1] >>> type(l) <class 'list'> >>> set(l) {1, 2, 3, 4, 5, 6} >>> res = list(set(l)) >>> res [1, 2, 3, 4, 5, 6]

1.3.使用itertools模塊的grouby方法

>>> li2 = [1,4,3,3,4,2,3,4,5,6,1] >>> li2.sort() # 排序 >>> it = itertools.groupby(li2) >>> for k, g in it: ... print (k) ... 1 2 3 4 5 6

1.4.使用keys()方式

>>> li4 = [1,0,3,7,7,5] >>> {}.fromkeys(li4) {1: None, 0: None, 3: None, 7: None, 5: None} >>> {}.fromkeys(li4).keys() dict_keys([1, 0, 3, 7, 5]) >>> list({}.fromkeys(li4).keys()) [1, 0, 3, 7, 5]

1.5.使用unique
對于一維數組或者列表，unique函數去除其中重復的元素，并按元素由大到小返回一個新的無元素重復的元組或者列表

return_index=True：返回新列表a=[1 2 3 4 5]中每個元素在原列表A = [1 2 5 3 4 3]中第一次出現的索引值
return_inverse=True：返回原列表A = [1 2 5 3 4 3]中每個元素在新列表a=[1 2 3 4 5]中的索引值

>>> A = [1, 2, 5, 3, 4, 3] >>> a, s, p = np.unique(A, return_index=True, return_inverse=True)>>> print ("新列表：",a) 新列表： [1 2 3 4 5]>>> print ("return_index", s) return_index [0 1 3 4 2]>>> print ("return_inverse", p) return_inverse [0 1 4 2 3 2]

2.對數據框去重
2.1.用unique()對單屬性列去重

>>> import pandas as pd >>> data = {'id':['A','B','C','C','C','A','B','C','A'],'age':[18,20,14,10,50,14,65,14,98]} >>> data = pd.DataFrame(data) >>> data.id.unique() array(['A', 'B', 'C'], dtype=object) ###或者 >>> np.unique(data.id) array(['A', 'B', 'C'], dtype=object)

2.2.用frame.drop_duplicates()對單屬性列去重

>>> data.drop_duplicates(['id'])id age 0 A 18 1 B 20 2 C 14

2.3.用frame.drop_duplicates()對多屬性列去重

>>> data.drop_duplicates(['id','age'])id age 0 A 18 1 B 20 2 C 14 3 C 10 4 C 50 5 A 14 6 B 65 8 A 98

2.4.用frame.duplicated()對多屬性列去重

>>> isduplicated = data.duplicated(['id','age'],keep='first') >>> data.loc[~isduplicated,:]id age 0 A 18 1 B 20 2 C 14 3 C 10 4 C 50 5 A 14 6 B 65 8 A 98 >>> data.loc[isduplicated,:]id age 7 C 14

總結

以上是生活随笔為你收集整理的python：去重（list，dataframe）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python：array，mat，tol
下一篇： websocket python爬虫_p