日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程语言 > python >内容正文

python

Python Pandas –操作

發布時間:2025/3/11 python 21 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Python Pandas –操作 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Pandas support very useful operations which are illustrated below,

熊貓支持非常有用的操作,如下所示,

Consider the below dataFrame,

考慮下面的dataFrame,

import numpy as np import pandas as pddf = pd.DataFrame({'col1': [1, 2, 3, 4],'col2': [444, 555, 666, 444],'col3': ['abc', 'def', 'ghi', 'xyz'] })print(df.head())''' Output:col1 col2 col3 0 1 444 abc 1 2 555 def 2 3 666 ghi 3 4 444 xyz '''

在數據框中查找唯一值 (Finding unique values in a data frame)

In order to find unique values from columns,

為了從列中找到唯一值,

# returns numpy array of all unique values print(df['col2'].unique() ) # Output: array([444, 555, 666])# returns length / number of unique values # in a numpy array print(df['col2'].nunique()) # Output: 3# if we want the table of the unique values # and how many times they show up print(df['col2'].value_counts() ) ''' Output: 444 2 555 1 666 1 Name: col2, dtype: int64 '''

從數據框中選擇數據 (Selecting data from a data frame)

Consider the dataFrame,

考慮一下dataFrame,

Using the conditional selection, we could select data as follows,

使用條件選擇,我們可以選擇以下數據,

print(df['col1']>2)''' Output: 0 False 1 False 2 True 3 True Name: col1, dtype: bool '''print(df[(df['col1']>2)])''' Output:col1 col2 col3 2 3 666 ghi 3 4 444 xyz '''print(df[df['col1']>2 & (df['col2']==44)])''' Output:col1 col2 col3 0 1 444 abc 1 2 555 def 2 3 666 ghi 3 4 444 xyz '''

應用方法 (Applied Methods)

Consider a simple method,

考慮一個簡單的方法,

def times2(x):return x*2

We already are aware that we can grab a column and call a built-in function off of it. Such as below,

我們已經知道我們可以抓住一列并從中調用一個內置函數。 如下

print(df['col1'].sum()) # Output: 10

Now, in order to apply the custom function, such as one defined above (times2), pandas provide an option to do that as well, as explained below,

現在,為了應用自定義功能(例如上面定義的時間(times2)),熊貓也提供了執行此功能的選項,如下所述,

print(df['col2'].apply(times2))''' Output: 0 888 1 1110 2 1332 3 888 Name: col2, dtype: int64 '''

Apply built-in functions,

應用內置功能,

print(df['col3'].apply(len))''' Output: 0 3 1 3 2 3 3 3 Name: col3, dtype: int64 '''

Apply method will be more powerful, when combined with lambda expressions. For instance,

與lambda表達式結合使用時,apply方法將更強大。 例如,

print(df['col2'].apply(lambda x: x*2))''' Output: 0 888 1 1110 2 1332 3 888 Name: col2, dtype: int64 '''

更多操作 (Some more operations)

# returns the columns names print(df.columns) # Output: Index(['col1', 'col2', 'col3'], dtype='object')#since this is a rangeindex, it actually reports # start, stop and step values too print(df.index) # Output: RangeIndex(start=0, stop=4, step=1)# sort by column print(df.sort_values('col2'))''' Output:col1 col2 col3 0 1 444 abc 3 4 444 xyz 1 2 555 def 2 3 666 ghi '''

In the above result, note that the index values doesn't change, this is to ensure that the values is retained.

在上面的結果中,請注意索引值不會更改,這是為了確保保留這些值。

isnull

一片空白

# isnull print(df.isnull())''' Outputcol1 col2 col3 0 False False False 1 False False False 2 False False False 3 False False False '''

The isnull() will return a dataframe of booleans indicating whether or not the value was null or not. In the above, we get a boolean of all false because we have nulls in our dataframe.

notull()將返回一個布爾值數據框,指示該值是否為null。 在上面的代碼中,由于我們的數據幀中包含null,因此我們得到的布爾值均為false。

Drop NAN values

降低NAN值

print(df.dropna())''' Output:col1 col2 col3 0 1 444 abc 1 2 555 def 2 3 666 ghi 3 4 444 xyz '''

Fill NAN values with custom values

用自定義值填充NAN值

df = pd.DataFrame({'col1': [1, 2, 3, np.nan],'col2': [np.nan, 555, 666, 444],'col3': ['abc', 'def', 'ghi', 'xyz'] })print(df)''' Output:col1 col2 col3 0 1.0 NaN abc 1 2.0 555.0 def 2 3.0 666.0 ghi 3 NaN 444.0 xyz '''print(df.fillna('FILL'))''' Output:col1 col2 col3 0 1 FILL abc 1 2 555 def 2 3 666 ghi 3 FILL 444 xyz '''

Usage of pivot table

數據透視表的用法

This methodology will be familiar for the Advanced Excel users. Consider a new dataFrame,

Advanced Excel用戶將熟悉這種方法。 考慮一個新的dataFrame,

data = {'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],'B': ['one', 'one', 'two', 'two', 'one', 'one'],'C': ['x', 'y', 'x', 'y', 'x', 'y'],'D': [1, 3, 2, 5, 4, 1] }df = pd.DataFrame(data)print(df)''' Output:A B C D 0 foo one x 1 1 foo one y 3 2 foo two x 2 3 bar two y 5 4 bar one x 4 5 bar one y 1 '''

The pivot table, creates a multi index dataFrame. The pivot table takes three main arguments, the values, the index and the columns.

數據透視表創建一個多索引dataFrame。 數據透視表采用三個主要參數,即值,索引和列。

print(df.pivot_table(values='D',index=['A', 'B'],columns=['C']))''' Output:C x y A B bar one 4.0 1.0two NaN 5.0 foo one 1.0 3.0two 2.0 NaN '''

翻譯自: https://www.includehelp.com/python/python-pandas-operations.aspx

總結

以上是生活随笔為你收集整理的Python Pandas –操作的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。