PandasSQL语法归纳总结,真的太全了
對于數(shù)據(jù)分析師而言,Pandas與SQL可能是大家用的比較多的兩個工具,兩者都可以對數(shù)據(jù)集進行深度的分析,挖掘出有價值的信息,但是二者的語法有著諸多的不同,今天小編就來總結(jié)歸納一下Pandas與SQL這兩者之間在語法上到底有哪些不同。
導(dǎo)入數(shù)據(jù)
對于Pandas而言,我們需要提前導(dǎo)入數(shù)據(jù)集,然后再進行進一步的分析與挖掘
import?pandas?as?pdairports?=?pd.read_csv('data/airports.csv') airport_freq?=?pd.read_csv('data/airport-frequencies.csv') runways?=?pd.read_csv('data/runways.csv')基礎(chǔ)語法
在SQL當(dāng)中,我們用SELECT來查找數(shù)據(jù),WHERE來過濾數(shù)據(jù),DISTINCT來去重,LIMIT來限制輸出結(jié)果的數(shù)量,
輸出數(shù)據(jù)集
##?SQL select?*?from?airports##?Pandas airports輸出數(shù)據(jù)集的前三行數(shù)據(jù),代碼如下
##?SQL select?*?from?airports?limit?3##?Pandas airports.head(3)對數(shù)據(jù)集進行過濾篩查
##?SQL select?id?from?airports?where?ident?=?'KLAX'##?Pandas airports[airports.ident?==?'KLAX'].id對于篩選出來的數(shù)據(jù)進行去重
##?SQL select?distinct?type?from?airport##?Pandas airports.type.unique()多個條件交集來篩選數(shù)據(jù)
多個條件的交集來篩選數(shù)據(jù),代碼如下
##?SQL select?*?from?airports? where?iso_region?=?'US-CA'?and? type?=?'seaplane_base'##?Pandas airports[(airports.iso_region?==?'US-CA')?&? (airports.type?==?'seaplane_base')]或者是
##?SQL select?ident,?name,?municipality?from?airports? where?iso_region?=?'US-CA'?and type?=?'large_airport'##?Pandas airports[(airports.iso_region?==?'US-CA')?& (airports.type?==?'large_airport')][['ident',?'name',?'municipality']]排序
在Pandas當(dāng)中默認(rèn)是對數(shù)據(jù)進行升序排序,要是我們希望對數(shù)據(jù)進行降序排序,需要設(shè)定ascending參數(shù)
##?SQL select?*?from?airport_freq where?airport_ident?=?'KLAX' order?by?type##?Pandas airport_freq[airport_freq.airport_ident?==?'KLAX'] .sort_values('type')又或者是
##?SQL select?*?from?airport_freq where?airport_ident?=?'KLAX' order?by?type?desc##?Pandas airport_freq[airport_freq.airport_ident?==?'KLAX'] .sort_values('type',?ascending=False)篩選出列表當(dāng)中的數(shù)據(jù)
要是我們需要篩選出來的數(shù)據(jù)在一個列表當(dāng)中,這里就需要用到isin()方法,代碼如下
##?SQL select?*?from?airports? where?type?in?('heliport',?'balloonport')##?Pandas airports[airports.type.isin(['heliport',?'balloonport'])]又或者是
##?SQL select?*?from?airports? where?type?not?in?('heliport',?'balloonport')##?Pandas airports[~airports.type.isin(['heliport',?'balloonport'])]刪除數(shù)據(jù)
在Pandas當(dāng)中刪除數(shù)據(jù)用的是drop()方法,代碼如下
##?SQL delete?from?dataframe?where?col_name?=?'MISC'##?Pandas df?=?df[df.type?!=?'MISC'] df.drop(df[df.type?==?'MISC'].index)更新數(shù)據(jù)
在SQL當(dāng)中更新數(shù)據(jù)使用的是update和set方法,代碼如下
###?SQL update?airports?set?home_link?=?'......' where?ident?==?'KLAX'###?Pandas airports.loc[airports['ident']?==?'KLAX',?'home_link']?=?'......'調(diào)用統(tǒng)計函數(shù)
對于給定的數(shù)據(jù)集,如下圖所示
runways.head()output
我們調(diào)用min()、max()、mean()以及median()函數(shù)作用于length_ft這一列上面,代碼如下
合并兩表格
在Pandas當(dāng)中合并表格用的是pd.concat()方法,在SQL當(dāng)中則是UNION ALL,代碼如下
##?SQL select?name,?municipality?from?airports where?ident?=?'KLAX' union?all select?name,?municipality?from?airports where?ident?=?'KLGB'##?Pandas pd.concat([airports[airports.ident?==?'KLAX'][['name',?'municipality']], airports[airports.ident?==?'KLGB'][['name',?'municipality']]])分組
顧名思義也就是groupby()方法,代碼如下
##?SQL select?iso_country,?type,?count(*)?from?airports group?by?iso_country,?type order?by?iso_country,?type##?Pandas airports.groupby(['iso_country',?'type']).size()分組之后再做篩選
在Pandas當(dāng)中是在進行了groupby()之后調(diào)用filter()方法,而在SQL當(dāng)中則是調(diào)用HAVING方法,代碼如下
##?SQL select?type,?count(*)?from?airports where?iso_country?=?'US' group?by?type having?count(*)?>?1000 order?by?count(*)?desc##?Pandas airports[airports.iso_country?==?'US'] .groupby('type') .filter(lambda?g:?len(g)?>?1000) .groupby('type') .size() .sort_values(ascending=False)TOP N records
代碼如下
##?SQL? select?列名?from?表名 order?by?size desc?limit?10##?Pandas 表名.nlargest(10,?columns='列名')END
推薦閱讀牛逼!Python常用數(shù)據(jù)類型的基本操作(長文系列第①篇) 牛逼!Python的判斷、循環(huán)和各種表達(dá)式(長文系列第②篇)牛逼!Python函數(shù)和文件操作(長文系列第③篇)牛逼!Python錯誤、異常和模塊(長文系列第④篇)總結(jié)
以上是生活随笔為你收集整理的PandasSQL语法归纳总结,真的太全了的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 厉害了,在Pandas中用SQL来查询数
- 下一篇: linux cmake编译源码,linu