當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

SQLite | SQLite 与 Pandas 比较篇之一

發布時間：2025/3/15 数据库 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 SQLite | SQLite 与 Pandas 比较篇之一小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

1. SQLite 與 Pandas 異同點比較
- 1.1 數據導入
- - 1.1.1 SQLIte
  - 1.1.2 Pandas
- 1.2 選取數據
- - 1.2.1 SQLite
  - 1.2.2 Pandas
- 1.3 篩選數據
- - 1.3.1 SQLite
  - 1.3.2 Pandas
- 1.4 數據聚合與分組
- - 1.4.1 SQLite
  - 1.4.2 Pandas
2. 總結

1. SQLite 與 Pandas 異同點比較

1.1 數據導入

1.1.1 SQLIte

SQLite 需要首先導入數據庫文件并使用 select 語句選取記錄：

使用Jupyter Notebook 運行 SQL 語句需安裝 ipython-sql
%sql 以及 %%sql 為在 Notebook 中運行 SQL 語句，在 SQLite 命令行或 SQLite Stiduo 中不需要 %sql 或 %%sql

%load_ext sql %sql sqlite:///DataBase/weather_stations.db 'Connected: @DataBase/weather_stations.db' %%sql select * from station_data limit 0,3 * sqlite:///DataBase/weather_stations.db Done. station_numberreport_codeyearmonthdaydew_pointstation_pressurevisibilitywind_speedtemperatureprecipitationsnow_depthfograinhailthundertornado

143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36	None	1	1	1	1	1
766440	39537B	1998	10	1	72.7	1014.6	5.9	6.7	83.3	None	0	0	0	0	0
176010	C3C6D5	2001	5	18	55.7	None	7.3	4.3	69.1	None	0	0	0	0	0

1.1.2 Pandas

在 Python 中我們可以利用 sqlite3 模塊來讀取數據庫并轉換為 Pandas 的 DataFrame 格式：

import sqlite3 import pandas as pd con = sqlite3.connect('./DataBase/weather_stations.db') cursor = con.execute('select * from station_data') rows = cursor.fetchall() df = pd.DataFrame(rows, columns=[x[0] for x in cursor.description]) df.head(3) station_numberreport_codeyearmonthdaydew_pointstation_pressurevisibilitywind_speedtemperatureprecipitationsnow_depthfograinhailthundertornado012

143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36.0	NaN	1	1	1	1	1
766440	39537B	1998	10	1	72.7	1014.6	5.9	6.7	83.3	NaN	0	0	0	0	0
176010	C3C6D5	2001	5	18	55.7	NaN	7.3	4.3	69.1	NaN	0	0	0	0	0

1.2 選取數據

1.2.1 SQLite

在 SQLite 中，我們需要通過 select 語句來篩選數據，并可以使用 alias、內置函數和文字拼接等對數據進行處理：

%%sql select station_number ||'_'|| report_code as number, -- 將兩個拼接 round(temperature*9/5+32, 2) as Fahrenheit -- 將攝氏度轉換為華氏度 from station_data limit 0,3 * sqlite:///DataBase/weather_stations.db Done. numberFahrenheit

143080_34DDA7	96.0
766440_39537B	181.94
176010_C3C6D5	156.38

1.2.2 Pandas

Pandas 中似乎沒有直接作用于字符的拼接函數，因此需要多一步轉換的過程才能得到與 SQL 中 concat 相同的效果：

pd.concat( [pd.Series([str(num)+'_'+code for (num,code) in zip(list(df['station_number']), list(df['report_code']))], name='number'), pd.Series(round(df['temperature']*9/5+32, 2), name='Fahrenheit')],axis=1).head(3) numberFahrenheit012

143080_34DDA7	96.80
766440_39537B	181.94
176010_C3C6D5	156.38

1.3 篩選數據

1.3.1 SQLite

SQLite 中使用 where 對數據進行篩選，如篩選 2005 年至 2010 年的數據：

%%sql select * from station_data where year>=2005 and year<=2010 limit 0,3 * sqlite:///DataBase/weather_stations.db Done. station_numberreport_codeyearmonthdaydew_pointstation_pressurevisibilitywind_speedtemperatureprecipitationsnow_depthfograinhailthundertornado

125600	145150	2007	10	14	33	None	6.9	2.5	39.7	None
598550	C5C66E	2006	10	15	72.9	None	14.2	1.7	82	None
941830	229317	2007	4	19	66.5	994.9	None	4	76.3	None

篩選月份為 3、6、9、12 的數據：

%%sql select * from station_data where Month in (3,6,9,12) limit 0,3; * sqlite:///DataBase/weather_stations.db Done. station_numberreport_codeyearmonthdaydew_pointstation_pressurevisibilitywind_speedtemperatureprecipitationsnow_depthfograinhailthundertornado

143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36	0	None	1	1	1	1	1
821930	1F8A7B	1953	6	18	72.8	1007.1	12.4	3.6	81.3	0	None	0	0	0	0	0
478070	D028D8	1981	6	27	73.4	None	7.9	3	77	1.93	None	0	0	0	0	0

1.3.2 Pandas

篩選 2005 年至 2010 年的數據：

df[(df['year']>=2005) & (df['year']<=2010)].head(3) station_numberreport_codeyearmonthdaydew_pointstation_pressurevisibilitywind_speedtemperatureprecipitationsnow_depthfograinhailthundertornado3918

125600	145150	2007	10	14	33.0	NaN	6.9	2.5	39.7	NaN
598550	C5C66E	2006	10	15	72.9	NaN	14.2	1.7	82.0	NaN
941830	229317	2007	4	19	66.5	994.9	NaN	4.0	76.3	NaN

篩選月份為 3、6、9、12 的數據：

df[ [ [data in [3,6,9,12]] for data in df['month'].values ] ].head(3) station_numberreport_codeyearmonthdaydew_pointstation_pressurevisibilitywind_speedtemperatureprecipitationsnow_depthfograinhailthundertornado056

143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36.0	0.00	NaN	1	1	1	1	1
821930	1F8A7B	1953	6	18	72.8	1007.1	12.4	3.6	81.3	0.00	NaN	0	0	0	0	0
478070	D028D8	1981	6	27	73.4	NaN	7.9	3.0	77.0	1.93	NaN	0	0	0	0	0

1.4 數據聚合與分組

1.4.1 SQLite

在 SQLite 中，我們使用 group by 和內置聚合函數實現聚合分組操作，如統計每個月龍卷風的記錄次數：

%%sql select year, month, count(*) as record_count from station_data where tornado == 1 group by year, month order by year, month limit 0,5; * sqlite:///DataBase/weather_stations.db Done. yearmonthrecord_count

1937	7	3
1941	8	3
1942	10	3
1943	1	3
1943	4	3

1.4.2 Pandas

同樣在 Pandas 中，也有 groupby 函數實現分組操作：

df[df['tornado']==1].groupby(['year','month'])['tornado'].count().head(5) year month 1937 7 3 1941 8 3 1942 10 3 1943 1 34 3 Name: tornado, dtype: int64

2. 總結

從數據導入、選取、篩選、聚合與分組可以看出， SQLite 具有結構化的特點，
容易操作且易上手，代碼一目了然。反觀 Pandas ，則繼承了 Pythonic 的特點。
雖然部分功能需要自己寫循環，但由于列表推倒式的存在，使得這代碼依然可以保持短小精悍。
總體而言，SQLite 的功能實現更加方便快捷，而 Pandas 則擁有更高的自由度，
但需要對 Python 比較熟悉，兩者各有千秋！

相關文章：

創作挑戰賽新人創作獎勵來咯，堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的SQLite | SQLite 与 Pandas 比较篇之一的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：这21个不太好搜索其含义的特殊符号你都知
下一篇： linux cmake编译源码,linu