當前位置：首頁 > 编程语言 > python >内容正文

python

python基础学习[python编程从入门到实践读书笔记(连载六)]：数据可视化项目第17章

發布時間：2025/4/5 python 17 豆豆

生活随笔收集整理的這篇文章主要介紹了 python基础学习[python编程从入门到实践读书笔记(连载六)]：数据可视化项目第17章小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- - 使用API
  - end

項目結果：
使用plotly可視化github最受歡迎的python倉庫：

修改后的可視化圖表：

使用API

編寫獨立的程序，實現對獲取的數據可視化。我們使用Web API自動請求網站的特定信息而不是整個網頁。這種方式始終用的是最新的數據。

使用Web API
Web API 是網站的一部分，用于與具體URL請求特定信息的程序交互。這種請求稱為API調用。請求的數據將以易于處理的格式返回。這里的格式可以有JSON、CSV等。

依賴于外部源的大多數app依賴于API調用。

我們使用github 的API來完成工作。

這里我們要完成的任務是：編寫一個程序，自動下載github上star最多的python項目信息，并對這些信息進行可視化。

GitHub的API讓你能夠通過API調用來請求各種信息。

在瀏覽器輸入下面地址：

https://api.github.com/search/repositories?q=language:python&sort=stars

然后我們會得到如下的網頁信息：

這個調用返回了github當前托管了多少個python項目，還有一些關于最受歡迎的python倉庫的信息。

這個調用拆開來看看

開頭的https://api.github.com/將請求發送到GitHub網站中響應API調用的部分，接下來的search/repositories讓API搜索GitHub上的所有倉庫。

repositories后面的問號指出需要傳遞一個實參。q表示查詢，等號(=)讓我們可以指定查詢。

使用language:python指出只獲取主要語言為python的倉庫的信息。

后面的&sort=stars指定將項目按星級排序。

讓我們看看返回的網站內容，其為格式化的數據，適合程序處理。

截取前幾行如下：

"total_count": 7065328,"incomplete_results": false,"items": [{"id": 83222441,"node_id": "MDEwOlJlcG9zaXRvcnk4MzIyMjQ0MQ==","name": "system-design-primer","full_name": "donnemartin/system-design-primer","private": false,

從前幾行我們可以看到，截止到目前（2021年4月18日16點03分）GitHub總共有7,065,328個Python項目。

第二行中，"incomplete_results"的值為false，由此知道請求是成功的（并非不完整的）。倘若GitHub無法處理該API，此處返回的值將為true。

后續即為列表。列表中顯示了返回的"items"，包含GitHub上最受歡迎的Python項目的詳細信息。

安裝Requests
Requests包讓Python程序能夠輕松地向網站請求信息并檢查返回的響應。

根據原書提供的安裝方式，會報錯：

python -m pip install --user requests

報錯信息：

PS D:\user\文檔\python\python_work\data_visualization\download_data> python -m pip install --user requests WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1125)'))': /simple/requests/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1125)'))': /simple/requests/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1125)'))': /simple/requests/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1125)'))': /simple/requests/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1125)'))': /simple/requests/ Could not fetch URL https://pypi.org/simple/requests/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/requests/ (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1125)'))) - skipping ERROR: Could not find a version that satisfies the requirement requests ERROR: No matching distribution found for requests PS D:\user\文檔\python\python_work\data_visualization\download_data>

對于這種報錯信息，我們已經習以為常了，只需要使用國內的源即可。

比如：執行下面這行代碼即可成功安裝requests，這里使用的是豆瓣的源。

pip install requests -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

安裝成功信息如下：

PS D:\user\文檔\python\python_work\data_visualization\download_data> pip install requests -i http://pypi.douban.com/simple --trusted-host pypi.douban.com >> Defaulting to user installation because normal site-packages is not writeable Looking in indexes: http://pypi.douban.com/simple Collecting requestsDownloading http://pypi.doubanio.com/packages/29/c1/24814557f1d22c56d50280771a17307e6bf87b70727d975fd6b2ce6b014a/requests-2.25.1-py2.py3-none-any.whl (61 kB)|████████████████████████████████| 61 kB 3.8 MB/s Collecting certifi>=2017.4.17Downloading http://pypi.doubanio.com/packages/5e/a0/5f06e1e1d463903cf0c0eebeb751791119ed7a4b3737fdc9a77f1cdfb51f/certifi-2020.12.5-py2.py3-none-any.whl (147 kB)|████████████████████████████████| 147 kB 2.2 MB/s Collecting idna<3,>=2.5Downloading http://pypi.doubanio.com/packages/a2/38/928ddce2273eaa564f6f50de919327bf3a00f091b5baba8dfa9460f3a8a8/idna-2.10-py2.py3-none-any.whl (58 kB)|████████████████████████████████| 58 kB 3.8 MB/s Collecting chardet<5,>=3.0.2Downloading http://pypi.doubanio.com/packages/19/c7/fa589626997dd07bd87d9269342ccb74b1720384a4d739a1872bd84fbe68/chardet-4.0.0-py2.py3-none-any.whl (178 kB)|████████████████████████████████| 178 kB 3.3 MB/s Collecting urllib3<1.27,>=1.21.1Downloading http://pypi.doubanio.com/packages/09/c6/d3e3abe5b4f4f16cf0dfc9240ab7ce10c2baa0e268989a4e3ec19e90c84e/urllib3-1.26.4-py2.py3-none-any.whl (153 kB)|████████████████████████████████| 153 kB 3.2 MB/s Installing collected packages: urllib3, idna, chardet, certifi, requestsWARNING: The script chardetect.exe is installed in 'C:\Users\m1521\AppData\Roaming\Python\Python38\Scripts' which is not on PATH.Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. Successfully installed certifi-2020.12.5 chardet-4.0.0 idna-2.10 requests-2.25.1 urllib3-1.26.4 PS D:\user\文檔\python\python_work\data_visualization\download_data>

所以，當初學的小伙伴遇到了無法安裝requests（其他python的包亦同樣）的問題，可以使用國內的pip源。

處理API響應

編寫程序，自動執行API，找出github上star最多的python項目：

pythonrepos.py

注：這里repo是repository(倉庫)的簡稱

遇到問題

File "D:\Program Files\Python38\lib\ssl.py", line 997, in _createraise ValueError("check_hostname requires server_hostname") ValueError: check_hostname requires server_hostname

這個問題的解決，請參考筆者的另一篇博文：python遇到ValueError: check_hostname requires server_hostname解決方案

按照上述教程去掉bug之后，我們得到運行結果：

Status code: 200 dict_keys(['total_count', 'incomplete_results', 'items'])

狀態碼為200，由此知道請求成功了。響應字典只包含三個鍵：‘total_count’、‘incomplete_results’和’items’。

處理響應字典

將API調用返回的信息存儲到字典中，就可以處理其中的數據了。首先我們先來輸出這些信息，查看是否正確，以防止對后面的可視化產生影響。

# 導入requests模塊 import requests# 執行API調用并存儲響應 # 存儲API調用的url url = 'https://api.github.com/search/repositories?q=language:python&sort=stars' # 最新的GitHub API版本為第3版，因此通過指定headers顯式地要求使用這個版本的API headers = {'Accept': 'application/vnd.github.v3+json'}r = requests.get(url, headers=headers)print(f"Status code: {r.status_code}")# 將API響應賦給一個變量 # 打印總共多少個倉庫 response_dict = r.json() print(f"Total repositories: {response_dict['total_count']}")# 探索有關倉庫的信息 # items是個列表，其中包含很多字典，每個字典都是一個python倉庫的信息。 # 將整個字典列表存在repo_dicts中 repo_dicts = response_dict['items'] print(f"Repositories returned: {len(repo_dicts)}")# 研究第一個倉庫 repo_dict = repo_dicts[0] # 字典列表中的第一個字典 print(f"\nKeys: {len(repo_dict)}") # 這個字典包含的鍵數，打印所有鍵 for key in sorted(repo_dict.keys()):print(key)

輸出結果

Status code: 200 Total repositories: 7066491 Repositories returned: 30Keys: 74 archive_url archived assignees_url blobs_url branches_url clone_url collaborators_url comments_url commits_url compare_url contents_url contributors_url created_at default_branch deployments_url description disabled downloads_url events_url fork forks forks_count forks_url full_name git_commits_url git_refs_url git_tags_url git_url has_downloads has_issues has_pages has_projects has_wiki homepage hooks_url html_url id issue_comment_url issue_events_url issues_url keys_url labels_url language languages_url license merges_url milestones_url mirror_url name node_id notifications_url open_issues open_issues_count owner private pulls_url pushed_at releases_url score size ssh_url stargazers_count stargazers_url statuses_url subscribers_url subscription_url svn_url tags_url teams_url trees_url updated_at url watchers watchers_count

通過上面返回的這些鍵，大概知道可以提取與項目有關的哪些信息。

下面我們來獲取某個項目的具體信息，這里是對最受歡迎的倉庫進行處理：

# 導入requests模塊 import requests# 執行API調用并存儲響應 # 存儲API調用的url url = 'https://api.github.com/search/repositories?q=language:python&sort=stars' # 最新的GitHub API版本為第3版，因此通過指定headers顯式地要求使用這個版本的API headers = {'Accept': 'application/vnd.github.v3+json'}r = requests.get(url, headers=headers)print(f"Status code: {r.status_code}")# 將API響應賦給一個變量 # 打印總共多少個倉庫 response_dict = r.json() print(f"Total repositories: {response_dict['total_count']}")# 探索有關倉庫的信息 # items是個列表，其中包含很多字典，每個字典都是一個python倉庫的信息。 # 將整個字典列表存在repo_dicts中 repo_dicts = response_dict['items'] print(f"Repositories returned: {len(repo_dicts)}")# 研究第一個倉庫 repo_dict = repo_dicts[0] # 字典列表中的第一個字典print("\nSelected information about the first repository:") print(f"Name: {repo_dict['name']}") print(f"Owner: {repo_dict['owner']['login']}") print(f"Star: {repo_dict['stargazers_count']}") print(f"Repository: {repo_dict['html_url']}") print(f"Created: {repo_dict['created_at']}") print(f"Updated: {repo_dict['updated_at']}") print(f"Description: {repo_dict['description']}")

運行程序，我們得到關于第一個項目的信息：

Status code: 200 Total repositories: 7070180 Repositories returned: 30Selected information about the first repository: Name: system-design-primer Owner: donnemartin Star: 126971 Repository: https://github.com/donnemartin/system-design-primer Created: 2017-02-26T16:15:28Z Updated: 2021-04-19T02:28:44Z Description: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

我們得到的信息有：倉庫名稱是sustem-design-primer ,擁有者是donnemartin,有126971人star這個項目，以及各種信息。

順著這個倉庫地址，打開GitHub查看一下，該倉庫是關于大規模系統設計的，里面知識覆蓋面很廣，知識結構清晰，于是我也知道為什么它是github上最火的python項目。

項目鏈接：

https://github.com/donnemartin/system-design-primer/blob/master/README-zh-Hans.md

概括最受歡迎的倉庫

我們想對其他倉庫同樣進行這樣的操作，得到有用的信息，體現在代碼中就是循環來處理。

# 導入requests模塊 import requests# 執行API調用并存儲響應 # 存儲API調用的url url = 'https://api.github.com/search/repositories?q=language:python&sort=stars' # 最新的GitHub API版本為第3版，因此通過指定headers顯式地要求使用這個版本的API headers = {'Accept': 'application/vnd.github.v3+json'}r = requests.get(url, headers=headers)print(f"Status code: {r.status_code}")# 將API響應賦給一個變量 # 打印總共多少個倉庫 response_dict = r.json() print(f"Total repositories: {response_dict['total_count']}")# 探索有關倉庫的信息 # items是個列表，其中包含很多字典，每個字典都是一個python倉庫的信息。 # 將整個字典列表存在repo_dicts中 repo_dicts = response_dict['items'] print(f"Repositories returned: {len(repo_dicts)}")# 研究第一個倉庫 repo_dict = repo_dicts[0] # 字典列表中的第一個字典print("\nSelected information about each repository:") for repo_dict in repo_dicts:print(f"\nName: {repo_dict['name']}")print(f"Owner: {repo_dict['owner']['login']}")print(f"Star: {repo_dict['stargazers_count']}")print(f"Repository: {repo_dict['html_url']}")print(f"Description: {repo_dict['description']}")

我們遍歷整個字典列表，得到如下每個項目的信息：由于篇幅較大，沒有復制過來，而是以屏幕截圖的方式：

在上述輸出中，如果有感興趣的項目可以打開一看，不過不用著急，后面要可視化這些項目。

監視API的速率限制

在瀏覽器輸入https://api.github.com/rate_limit

我們會得到如下的響應：

{“resources”:{“core”:{“limit”:60,“remaining”:59,“reset”:1618820589,“used”:1},“graphql”:{“limit”:0,“remaining”:0,“reset”:1618820666,“used”:0},“integration_manifest”:{“limit”:5000,“remaining”:5000,“reset”:1618820666,“used”:0},“search”:{“limit”:10,“remaining”:10,“reset”:1618817126,“used”:0}},“rate”:{“limit”:60,“remaining”:59,“reset”:1618820589,“used”:1}}

我們關心的信息是搜索API的速率限制。要找到”search“，limit告訴我們，極限為每分鐘10個請求，而在當前分鐘內，還可執行8個請求（remaining告訴我們的）。reset值指的是配額將重置的Unix時間
或新紀元時間（1970年1月1日午夜后多少秒）。用完配額后，你將收到一條簡單的響應，由此知道已到達API極限。到達極限后，必須等待配額重置。

使用Plotly可視化倉庫

上面我們得到了各個倉庫的信息，這里我們可以用它們創建條形圖：條形圖的高度表示某項目有多少個star。

代碼

python_repos_visual.py

import requests # 導入Bar類和offline模塊 from plotly.graph_objs import Bar from plotly import offline# 執行API調用并存儲響應url = 'https://api.github.com/search/repositories?q=language:python&sort=stars' headers = {'Accept': 'application/vnd.github.v3+json'} r = requests.get(url, headers=headers) print(f"Status code: {r.status_code}")# 處理結果 response_dict = r.json() repo_dicts = response_dict['items'] repo_names, stars = [], [] # 創建兩個空列表，存放要在圖表中呈現的數據 for repo_dict in repo_dicts:repo_names.append(repo_dict['name'])stars.append(repo_dict['stargazers_count'])# 可視化 # data包含一個字典，指定圖表類型，x是項目名稱，y是項目star數 data = [{'type': 'bar','x': repo_names,'y': stars, }]# 使用字典定義圖表布局：指定圖表名稱和坐標軸的標簽 my_layout = {'title': 'Github上最受歡迎的python項目','xaxis': {'title': 'Repository'},'yaxis': {'title': 'Stars'}, }fig = {'data':data, 'layout': my_layout} offline.plot(fig, filename='python_repos.html')

改進Plotly圖表

在data和my_layout這樣的字典中，可以通過鍵值對的形式指定各種樣式。

下面通過修改data字典，來定制條形：marker設置影響條形設計。我們給條形指定了一種自定義的藍色，加上了寬1.5像素的深灰色輪廓，還將條形的不透明度設置為0.6，以免圖表過于惹眼。

對于data部分的修改如下：

# 可視化 data = [{'type': 'bar','x': repo_names,'y': stars,'marker':{'color':'rgb(60, 100, 150)','line':{'width':1.5, 'color': 'rgb(25, 25, 25)'},},'opacity':0.6, }]

所有代碼：

import requestsfrom plotly.graph_objs import Bar from plotly import offline# 執行API調用并存儲響應url = 'https://api.github.com/search/repositories?q=language:python&sort=stars' headers = {'Accept': 'application/vnd.github.v3+json'} r = requests.get(url, headers=headers) print(f"Status code: {r.status_code}")# 處理結果 response_dict = r.json() repo_dicts = response_dict['items'] repo_names, stars = [], [] for repo_dict in repo_dicts:repo_names.append(repo_dict['name'])stars.append(repo_dict['stargazers_count'])# 可視化 data = [{'type': 'bar','x': repo_names,'y': stars,'marker':{'color':'rgb(60, 100, 150)','line':{'width':1.5, 'color': 'rgb(25, 25, 25)'},},'opacity':0.6, }]my_layout = {'title': 'Github上最受歡迎的python項目','xaxis': {'title': 'Repository'},'yaxis': {'title': 'Stars'}, }fig = {'data':data, 'layout': my_layout} offline.plot(fig, filename='python_repos.html')

下面來修改my_layout:

my_layout = {'title': 'Github上最受歡迎的python項目','titlefont':{'size': 28}, # 指定圖表名稱的字號'xaxis': {'title': 'Repository','titlefont':{'size': 24}, # 標簽字號'tickfont':{'size': 14}, # 刻度標簽字號},'yaxis': {'title': 'Stars','titlefont':{'size': 24},'tickfont':{'size': 14},}, }

指定標題和坐標軸字號之后，圖表更美觀：

可運行代碼

import requestsfrom plotly.graph_objs import Bar from plotly import offline# 執行API調用并存儲響應url = 'https://api.github.com/search/repositories?q=language:python&sort=stars' headers = {'Accept': 'application/vnd.github.v3+json'} r = requests.get(url, headers=headers) print(f"Status code: {r.status_code}")# 處理結果 response_dict = r.json() repo_dicts = response_dict['items'] repo_names, stars = [], [] for repo_dict in repo_dicts:repo_names.append(repo_dict['name'])stars.append(repo_dict['stargazers_count'])# 可視化 data = [{'type': 'bar','x': repo_names,'y': stars,'marker':{'color':'rgb(60, 100, 150)','line':{'width':1.5, 'color': 'rgb(25, 25, 25)'},},'opacity':0.6, }]my_layout = {'title': 'Github上最受歡迎的python項目','titlefont':{'size': 28}, # 指定圖表名稱的字號'xaxis': {'title': 'Repository','titlefont':{'size': 24}, # 標簽字號'tickfont':{'size': 14}, # 刻度標簽字號},'yaxis': {'title': 'Stars','titlefont':{'size': 24},'tickfont':{'size': 14},}, }fig = {'data':data, 'layout': my_layout} offline.plot(fig, filename='python_repos.html')

end

《新程序員》：云原生和全面數字化實踐50位技術專家共同創作，文字、視頻、音頻交互閱讀

總結

以上是生活随笔為你收集整理的python基础学习[python编程从入门到实践读书笔记(连载六)]：数据可视化项目第17章的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： python基础学习[python编程从
下一篇： websocket python爬虫_p