當(dāng)前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

PandasSQL语法归纳总结，真的太全了

發(fā)布時間：2024/9/16 数据库 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 PandasSQL语法归纳总结，真的太全了小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

對于數(shù)據(jù)分析師而言，Pandas與SQL可能是大家用的比較多的兩個工具，兩者都可以對數(shù)據(jù)集進行深度的分析，挖掘出有價值的信息，但是二者的語法有著諸多的不同，今天小編就來總結(jié)歸納一下Pandas與SQL這兩者之間在語法上到底有哪些不同。

導(dǎo)入數(shù)據(jù)

對于Pandas而言，我們需要提前導(dǎo)入數(shù)據(jù)集，然后再進行進一步的分析與挖掘

import?pandas?as?pdairports?=?pd.read_csv('data/airports.csv') airport_freq?=?pd.read_csv('data/airport-frequencies.csv') runways?=?pd.read_csv('data/runways.csv')

基礎(chǔ)語法

在SQL當(dāng)中，我們用SELECT來查找數(shù)據(jù)，WHERE來過濾數(shù)據(jù)，DISTINCT來去重，LIMIT來限制輸出結(jié)果的數(shù)量，

輸出數(shù)據(jù)集

##?SQL select?*?from?airports##?Pandas airports

輸出數(shù)據(jù)集的前三行數(shù)據(jù)，代碼如下

##?SQL select?*?from?airports?limit?3##?Pandas airports.head(3)

對數(shù)據(jù)集進行過濾篩查

##?SQL select?id?from?airports?where?ident?=?'KLAX'##?Pandas airports[airports.ident?==?'KLAX'].id

對于篩選出來的數(shù)據(jù)進行去重

##?SQL select?distinct?type?from?airport##?Pandas airports.type.unique()

多個條件交集來篩選數(shù)據(jù)

多個條件的交集來篩選數(shù)據(jù)，代碼如下

##?SQL select?*?from?airports? where?iso_region?=?'US-CA'?and? type?=?'seaplane_base'##?Pandas airports[(airports.iso_region?==?'US-CA')?&? (airports.type?==?'seaplane_base')]

或者是

##?SQL select?ident,?name,?municipality?from?airports? where?iso_region?=?'US-CA'?and type?=?'large_airport'##?Pandas airports[(airports.iso_region?==?'US-CA')?& (airports.type?==?'large_airport')][['ident',?'name',?'municipality']]

排序

在Pandas當(dāng)中默認(rèn)是對數(shù)據(jù)進行升序排序，要是我們希望對數(shù)據(jù)進行降序排序，需要設(shè)定ascending參數(shù)

##?SQL select?*?from?airport_freq where?airport_ident?=?'KLAX' order?by?type##?Pandas airport_freq[airport_freq.airport_ident?==?'KLAX'] .sort_values('type')

又或者是

##?SQL select?*?from?airport_freq where?airport_ident?=?'KLAX' order?by?type?desc##?Pandas airport_freq[airport_freq.airport_ident?==?'KLAX'] .sort_values('type',?ascending=False)

篩選出列表當(dāng)中的數(shù)據(jù)

要是我們需要篩選出來的數(shù)據(jù)在一個列表當(dāng)中，這里就需要用到isin()方法，代碼如下

##?SQL select?*?from?airports? where?type?in?('heliport',?'balloonport')##?Pandas airports[airports.type.isin(['heliport',?'balloonport'])]

又或者是

##?SQL select?*?from?airports? where?type?not?in?('heliport',?'balloonport')##?Pandas airports[~airports.type.isin(['heliport',?'balloonport'])]

刪除數(shù)據(jù)

在Pandas當(dāng)中刪除數(shù)據(jù)用的是drop()方法，代碼如下

##?SQL delete?from?dataframe?where?col_name?=?'MISC'##?Pandas df?=?df[df.type?!=?'MISC'] df.drop(df[df.type?==?'MISC'].index)

更新數(shù)據(jù)

在SQL當(dāng)中更新數(shù)據(jù)使用的是update和set方法，代碼如下

###?SQL update?airports?set?home_link?=?'......' where?ident?==?'KLAX'###?Pandas airports.loc[airports['ident']?==?'KLAX',?'home_link']?=?'......'

調(diào)用統(tǒng)計函數(shù)

對于給定的數(shù)據(jù)集，如下圖所示

runways.head()

output

我們調(diào)用min()、max()、mean()以及median()函數(shù)作用于length_ft這一列上面，代碼如下

##?SQL select?max(length_ft),?min(length_ft), avg(length_ft),?median(length_ft)?from?runways##?Pandas runways.agg({'length_ft':?['min',?'max',?'mean',?'median']})

合并兩表格

在Pandas當(dāng)中合并表格用的是pd.concat()方法，在SQL當(dāng)中則是UNION ALL，代碼如下

##?SQL select?name,?municipality?from?airports where?ident?=?'KLAX' union?all select?name,?municipality?from?airports where?ident?=?'KLGB'##?Pandas pd.concat([airports[airports.ident?==?'KLAX'][['name',?'municipality']], airports[airports.ident?==?'KLGB'][['name',?'municipality']]])

分組

顧名思義也就是groupby()方法，代碼如下

##?SQL select?iso_country,?type,?count(*)?from?airports group?by?iso_country,?type order?by?iso_country,?type##?Pandas airports.groupby(['iso_country',?'type']).size()

分組之后再做篩選

在Pandas當(dāng)中是在進行了groupby()之后調(diào)用filter()方法，而在SQL當(dāng)中則是調(diào)用HAVING方法，代碼如下

##?SQL select?type,?count(*)?from?airports where?iso_country?=?'US' group?by?type having?count(*)?>?1000 order?by?count(*)?desc##?Pandas airports[airports.iso_country?==?'US'] .groupby('type') .filter(lambda?g:?len(g)?>?1000) .groupby('type') .size() .sort_values(ascending=False)

TOP N records

代碼如下

##?SQL? select?列名?from?表名 order?by?size desc?limit?10##?Pandas 表名.nlargest(10,?columns='列名')

END

推薦閱讀牛逼！Python常用數(shù)據(jù)類型的基本操作（長文系列第①篇）牛逼！Python的判斷、循環(huán)和各種表達(dá)式（長文系列第②篇）牛逼！Python函數(shù)和文件操作（長文系列第③篇）牛逼！Python錯誤、異常和模塊（長文系列第④篇）

總結(jié)

以上是生活随笔為你收集整理的PandasSQL语法归纳总结，真的太全了的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：厉害了，在Pandas中用SQL来查询数
下一篇： linux cmake编译源码,linu