魅族mx5游戏模式小熊猫_您不知道的5大熊猫技巧
魅族mx5游戲模式小熊貓
重點(diǎn) (Top highlight)
I’ve been using pandas for years and each time I feel I am typing too much, I google it and I usually find a new pandas trick! I learned about these functions recently and I deem them essential because of ease of use.
我已經(jīng)使用熊貓多年了,每次我輸入太多單詞時(shí),我都會(huì)用google搜索它,而且我通常會(huì)發(fā)現(xiàn)一個(gè)新的熊貓技巧! 我最近了解了這些功能,并且由于易于使用,我認(rèn)為它們是必不可少的。
1.功能之間 (1. between function)
GiphyGiphy的 GifI’ve been using “between” function in SQL for years, but I only discovered it recently in pandas.
多年來(lái),我一直在SQL中使用“ between”功能,但最近才在pandas中發(fā)現(xiàn)它。
Let’s say we have a DataFrame with prices and we would like to filter prices between 2 and 4.
假設(shè)我們有一個(gè)帶有價(jià)格的DataFrame,并且我們希望在2到4之間過(guò)濾價(jià)格。
df = pd.DataFrame({'price': [1.99, 3, 5, 0.5, 3.5, 5.5, 3.9]})With between function, you can reduce this filter:
使用between功能,可以減少此過(guò)濾器:
df[(df.price >= 2) & (df.price <= 4)]To this:
對(duì)此:
df[df.price.between(2, 4)]It might not seem much, but those parentheses are annoying when writing many filters. The filter with between function is also more readable.
看起來(lái)似乎不多,但是編寫(xiě)許多過(guò)濾器時(shí)這些括號(hào)令人討厭。 具有中間功能的過(guò)濾器也更易讀。
between function sets interval left <= series <= right.
功能集之間的間隔左<=系列<=右。
2.使用重新索引功能固定行的順序 (2. Fix the order of the rows with reindex function)
giphygiphyReindex function conforms a Series or a DataFrame to a new index. I resort to the reindex function when making reports with columns that have a predefined order.
Reindex函數(shù)使Series或DataFrame符合新索引。 當(dāng)使用具有預(yù)定義順序的列制作報(bào)表時(shí),我求助于reindex函數(shù)。
Let’s add sizes of T-shirts to our Dataframe. The goal of analysis is to calculate the mean price for each size:
讓我們?cè)跀?shù)據(jù)框中添加T恤的尺寸。 分析的目的是計(jì)算每種尺寸的平ASP格:
df = pd.DataFrame({'price': [1.99, 3, 5], 'size': ['medium', 'large', 'small']})df_avg = df.groupby('size').price.mean()df_avg
Sizes have a random order in the table above. It should be ordered: small, medium, large. As sizes are strings we cannot use the sort_values function. Here comes reindex function to the rescue:
尺寸在上表中具有隨機(jī)順序。 應(yīng)該訂購(gòu):小,中,大。 由于大小是字符串,因此我們不能使用sort_values函數(shù)。 這里有reindex函數(shù)來(lái)解救:
df_avg.reindex(['small', 'medium', 'large'])By
通過(guò)
3.描述類固醇 (3. Describe on steroids)
GiphyGiphy的 GifDescribe function is an essential tool when working on Exploratory Data Analysis. It shows basic summary statistics for all columns in a DataFrame.
當(dāng)進(jìn)行探索性數(shù)據(jù)分析時(shí),描述功能是必不可少的工具。 它顯示了DataFrame中所有列的基本摘要統(tǒng)計(jì)信息。
df.price.describe()What if we would like to calculate 10 quantiles instead of 3?
如果我們想計(jì)算10個(gè)分位數(shù)而不是3個(gè)分位數(shù)怎么辦?
df.price.describe(percentiles=np.arange(0, 1, 0.1))Describe function takes percentiles argument. We can specify the number of percentiles with NumPy's arange function to avoid typing each percentile by hand.
描述函數(shù)采用百分位數(shù)參數(shù)。 我們可以使用NumPy的arange函數(shù)指定百分位數(shù),以避免手動(dòng)鍵入每個(gè)百分位數(shù)。
This feature becomes really useful when combined with the group by function:
與group by函數(shù)結(jié)合使用時(shí),此功能將非常有用:
df.groupby('size').describe(percentiles=np.arange(0, 1, 0.1))4.使用正則表達(dá)式進(jìn)行文本搜索 (4. Text search with regex)
GiphyGiphy的 GifOur T-shirt dataset has 3 sizes. Let’s say we would like to filter small and medium sizes. A cumbersome way of filtering is:
我們的T恤數(shù)據(jù)集有3種尺寸。 假設(shè)我們要過(guò)濾中小型尺寸。 繁瑣的過(guò)濾方式是:
df[(df['size'] == 'small') | (df['size'] == 'medium')]This is bad because we usually combine it with other filters, which makes the expression unreadable. Is there a better way?
這很不好,因?yàn)槲覀兺ǔ⑵渑c其他過(guò)濾器結(jié)合使用,從而使表達(dá)式不可讀。 有沒(méi)有更好的辦法?
pandas string columns have an “str” accessor, which implements many functions that simplify manipulating string. One of them is “contains” function, which supports search with regular expressions.
pandas字符串列具有“ str”訪問(wèn)器,該訪問(wèn)器實(shí)現(xiàn)了許多簡(jiǎn)化操作字符串的功能。 其中之一是“包含”功能,該功能支持使用正則表達(dá)式進(jìn)行搜索。
df[df['size'].str.contains('small|medium')]The filter with “contains” function is more readable, easier to extend and combine with other filters.
具有“包含”功能的過(guò)濾器更具可讀性,更易于擴(kuò)展并與其他過(guò)濾器組合。
5.比帶有熊貓的內(nèi)存數(shù)據(jù)集更大 (5. Bigger than memory datasets with pandas)
giphygiphypandas cannot even read bigger than the main memory datasets. It throws a MemoryError or Jupyter Kernel crashes. But to process a big dataset you don’t need Dask or Vaex. You just need some ingenuity. Sounds too good to be true?
熊貓讀取的數(shù)據(jù)甚至不能超過(guò)主內(nèi)存數(shù)據(jù)集。 它引發(fā)MemoryError或Jupyter Kernel崩潰。 但是,要處理大型數(shù)據(jù)集,您不需要Dask或Vaex。 您只需要一些獨(dú)創(chuàng)性 。 聽(tīng)起來(lái)好得令人難以置信?
In case you’ve missed my article about Dask and Vaex with bigger than main memory datasets:
如果您錯(cuò)過(guò)了我的有關(guān)Dask和Vaex的文章,而這篇文章的內(nèi)容比主內(nèi)存數(shù)據(jù)集還大:
When doing an analysis you usually don’t need all rows or all columns in the dataset.
執(zhí)行分析時(shí),通常不需要數(shù)據(jù)集中的所有行或所有列。
In a case, you don’t need all rows, you can read the dataset in chunks and filter unnecessary rows to reduce the memory usage:
在某種情況下,您不需要所有行,您可以按塊讀取數(shù)據(jù)集并過(guò)濾不必要的行以減少內(nèi)存使用量:
iter_csv = pd.read_csv('dataset.csv', iterator=True, chunksize=1000)df = pd.concat([chunk[chunk['field'] > constant] for chunk in iter_csv])
Reading a dataset in chunks is slower than reading it all once. I would recommend using this approach only with bigger than memory datasets.
分塊讀取數(shù)據(jù)集要比一次讀取所有數(shù)據(jù)集慢。 我建議僅對(duì)大于內(nèi)存的數(shù)據(jù)集使用此方法。
In a case, you don’t need all columns, you can specify required columns with “usecols” argument when reading a dataset:
在某種情況下,不需要所有列,可以在讀取數(shù)據(jù)集時(shí)使用“ usecols”參數(shù)指定所需的列:
df = pd.read_csvsecols=['col1', 'col2'])The great thing about these two approaches is that you can combine them.
這兩種方法的優(yōu)點(diǎn)在于您可以將它們組合在一起。
你走之前 (Before you go)
giphygiphyThese are a few links that might interest you:
這些鏈接可能會(huì)讓您感興趣:
- Your First Machine Learning Model in the Cloud- AI for Healthcare- Parallels Desktop 50% off- School of Autonomous Systems- Data Science Nanodegree Program- 5 lesser-known pandas tricks- How NOT to write pandas code翻譯自: https://towardsdatascience.com/5-essential-pandas-tricks-you-didnt-know-about-2d1a5b6f2e7
魅族mx5游戲模式小熊貓
總結(jié)
以上是生活随笔為你收集整理的魅族mx5游戏模式小熊猫_您不知道的5大熊猫技巧的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 做梦梦到熟人向自己表白是什么意思
- 下一篇: 数据科学中的数据可视化