notebook python 内嵌 数据库_python数据分析:在jupyter notebook上使用pythonSQL做数据分析...
python數(shù)據(jù)分析:在jupyter notebook上使用python&SQL做數(shù)據(jù)分析
發(fā)布時間:2019-01-14 21:14,
瀏覽次數(shù):1143
, 標簽:
python
jupyter
notebook
SQL
類似于在jupyter上使用R語言,同樣可以使用SQL語句:
詳細見github項目:https://github.com/catherinedevlin/ipython-sql
<>安裝ipython-sql
pip install ipython-sql
<>載入
%load_ext sql
<>連接數(shù)據(jù)庫 同 SQLAlchemy
* postgresql://will:[email?protected]/shakes
* mysql+pymysql://scott:[email?protected]/foo
* oracle://scott:[email?protected]:1521/sidname
* sqlite://
* sqlite:///foo.db
*
mssql+pyodbc://username:[email?protected]/databasedriver=SQL+Server+Native+Client+11.0
我是使用的是mysql,本地鏈接,用戶名ffzs,密碼666666,test數(shù)據(jù)庫:
%sql mysql+pymysql://ffzs:[email?protected]/test
<>簡單使用
%matplotlib inline import matplotlib.pyplot as plt plt.style.use('bmh')
<>1.顯示表
%%sql show tables;
<>2.選取steam_users表的前5行
df = %sql select * from steam_users limit 5 df.DataFrame()
<>3.計算表中包含多少游戲數(shù)和玩家數(shù)
%%sql select count(distinct Game) gameCount, count(distinct UserID) userCount
from steam_users
<>4.篩選出擁有用戶前十的游戲
%%sql data << select Game , count(1) as count from steam_users where Action=
'play' group by Game order by count desc limit 10
data.DataFrame()[::-1].plot.barh("Game","count")
<>5.篩選出被玩總時長前十的游戲
%%sql playHour << select Game,sum(Hours) as playHour from steam_users where
Action="play" group by Game order by playHour desc limit 10
playHour.DataFrame()[::-1].plot.barh('Game', 'playHour')
<>6.篩選出被玩平均時長前十的游戲
%%sql avgHour << select Game, avg(Hours) as avgHour from steam_users where
Action='play' group by Game order by avgHour desc limit 10
avgHour.DataFrame()[::-1].plot.barh('Game','avgHour')
<>7.平均時長前十的游戲的游戲人數(shù)
%%sql select Game, avg(Hours) as avgHour, count(1) as count from steam_users
where Action='play' group by Game order by avgHour desc limit 10
聯(lián)系join on:
%%sql select a.Game, avgHour, count from (select Game, avg(Hours) as avgHour
from steam_users where Action='play' group by Game order by avgHour desc limit
10) a left join (select Game ,count(1) as count from steam_users where Action=
'play' group by Game) b on a.Game=b.Game order by avgHour desc
可見平均時長長的游戲大多是小眾游戲
<>8.玩家人數(shù)大于500人的游戲的個數(shù)(having使用)
%%sql select count(1) as count from (select Game, count(1) as count from
steam_userswhere Action='play' group by Game having count > 500) a
<>9.擁有游戲數(shù)量前十用戶
%%sql games << select UserID, count(1) count from steam_users where Action=
'play' group by UserID order by count desc limit 10
games.DataFrame()[::-1].plot.barh('UserID','count')
<>10.游戲總時長最多5個用戶和最少5個用戶(union使用)
%%sql (select UserID, sum(Hours) as allHour from steam_users where Action=
'play' group by UserID order by allHour desc limit 5) union (select UserID, sum(
Hours) as allHour from steam_users where Action='play' group by UserID order by
allHourlimit 5)
總結(jié)
以上是生活随笔為你收集整理的notebook python 内嵌 数据库_python数据分析:在jupyter notebook上使用pythonSQL做数据分析...的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python android自动化_py
- 下一篇: mac连接手机 vm_使用mac ssh