日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 前端技术 > HTML >内容正文

HTML

32. Pandas借助Python爬虫读取HTML网页表格存储到Excel文件

發布時間:2023/12/8 HTML 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 32. Pandas借助Python爬虫读取HTML网页表格存储到Excel文件 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Pandas借助Python爬蟲讀取HTML網頁表格存儲到Excel文件

實現目標:

  • 網易有道詞典可以用于英語單詞查詢,可以將查詢的單詞加入到單詞本;
  • 當前沒有導出全部單詞列表的功能。為了復習方便,可以爬取所有的單詞列表,存入Excel方便復習

涉及技術:

  • Pandas:Python語言最強大的數據處理和數據分析庫
  • Python爬蟲:可以將網頁下載下來然后解析,使用requests庫實現,需要繞過登錄驗證
import requests import requests.cookies import json import time import pandas as pd

0. 處理流程

輸入網頁:有道詞典-單詞本

處理流程

數據結果到Excel文件(方便打印復習):

1. 登錄網易有道詞典的PC版,微信掃碼登錄,復制cookies到文件

  • PC版地址:http://dict.youdao.com/
  • Chrome插件可以復制Cookies為Json格式:http://www.editthiscookie.com/
cookie_jar = requests.cookies.RequestsCookieJar()with open("./course_datas/c32_read_html/cookie.txt") as fin:cookiejson = json.loads(fin.read())for cookie in cookiejson:cookie_jar.set(name=cookie["name"],value=cookie["value"],domain=cookie["domain"],path=cookie["path"]) cookie_jar <RequestsCookieJar[Cookie(version=0, name='DICT_LOGIN', value='3||1578922508302', port=None, port_specified=False, domain='.youdao.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='DICT_PERS', value='v2|weixin||DICT||web||2592000000||1578922508299||114.244.161.198||wxoXQUDj_FtHSw23tfJWsboPkq38ok||gFnMeLRLQLRpBOMYMhf6LRUf0Mz5P4TLRqSOM6uhfY5RzW0L6ZhHTB0kGRHeukLg40QZOMOMkMwu0gBkfJF0LTL0', port=None, port_specified=False, domain='.youdao.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='DICT_SESS', value='v2|odmTRIUgTmgz6MlEOMqB0TBnfk5h4pZ0Py0MeBP4Q40qynHeuPMOWRpLPMY5RHJuRQykfJBOLQBRPKO4YYOLquR6zhLwBnMYMR', port=None, port_specified=False, domain='.youdao.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='DICT_UGC', value='be3af0da19b5c5e6aa4e17bd8d90b28a|', port=None, port_specified=False, domain='.youdao.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='JSESSIONID', value='abc46uQPL03Au_P0ghF_w', port=None, port_specified=False, domain='.youdao.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='OUTFOX_SEARCH_USER_ID', value='"1678365514@10.108.160.18"', port=None, port_specified=False, domain='.youdao.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='OUTFOX_SEARCH_USER_ID_NCOO', value='1349541628.6994112', port=None, port_specified=False, domain='.youdao.com', domain_specified=True, domain_initial_dot=True, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='ACCSESSIONID', value='8F00E30693F3BD052C9A4F293394BE0A', port=None, port_specified=False, domain='dict.youdao.com', domain_specified=True, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False), Cookie(version=0, name='___rl__test__cookies', value='1578922438675', port=None, port_specified=False, domain='dict.youdao.com', domain_specified=True, domain_initial_dot=False, path='/', path_specified=True, secure=False, expires=None, discard=True, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)]>

2. 將html都下載下來存入列表

htmls = [] url = "http://dict.youdao.com/wordbook/wordlist?p={idx}&tags=" for idx in range(6):time.sleep(1)print("**爬數據:第%d頁" % idx)r = requests.get(url.format(idx=idx), cookies=cookie_jar)htmls.append(r.text) **爬數據:第0頁 **爬數據:第1頁 **爬數據:第2頁 **爬數據:第3頁 **爬數據:第4頁 **爬數據:第5頁 htmls[0] '<!doctype html>\n<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>\n<title>有道單詞本</title>\n\n<link rel="canonical" href="http://dict.youdao.com/wordbook/"/> \n<meta name="Keywords" content="單詞本,web單詞本,有道,詞典,youdao" />\n<meta name="Description" content="有道詞典單詞本" />\n<link rel="shortcut icon" href="http://shared.ydstatic.com/images/favicon.ico?213" type="image/x-icon"/>\n<link href="http://shared.ydstatic.com/r/1.0/s/g3.css?20110428" rel="stylesheet" type="text/css"/>\n<link type="text/css" href="resources/styles/main.css" rel="stylesheet">\n\n<style type="text/css">\n\n#f{background-image:url(http://shared.ydstatic.com/images/skins/default/skin-x.jpg)}\n#fbl{background:url(http://shared.ydstatic.com/images/skins/default/skin_.jpg) left top}\n#fbr{background:url(http://shared.ydstatic.com/images/skins/default/skin_.jpg) right -200px}\n\n</style>\n<script type="text/javascript">\nvar VARIABLES={ \n tags:"",\n page:"0",\n sort:"",\n querystring:""\n };\n</script>\n\n\n</head>\n\n<body>\n\n<div id="t">\n <div id="u">\n <span id="un">\n <span class="un_n">晚上好,</span>\n <span id="mun" class="un_box"><b class="un_l"><q></q></b><b class="un_r"><q></q></b>\n <span class="un_btn"><b class="un_m">&nbsp;<q></q></b>\n <span class="un_ml">\n wxoXQUDj_FtHSw23tfJWsboPkq38ok\n </span>\n </span>\n </span>\n </span>\n <span class="sl">|</span>\n <a href="http://account.youdao.com/logout?service=dict&back_url=http%3A%2F%2Fdict.youdao.com%2Fwordbook%2Fwordlist">登出</a>\n </div>\n <div id="n">\n <a href="http://www.163.com/" id="mn" class="mn" target="_blank"><u>網易</u><s>▼</s></a>\n <span class="sl">|</span>\n <a class="search-js" data-product=\'www\' href="http://www.youdao.com">網頁</a>\n <a class="search-js" data-product=\'image\' href="http://image.youdao.com">圖片</a>\n <a class="search-js" data-product=\'news\' href="http://news.youdao.com">熱聞</a>\n <a class="search-js" data-product=\'gouwu\' href="http://gouwu.youdao.com">購物</a>\n <a class="search-js" data-product=\'dict\' href="http://dict.youdao.com">詞典</a>\n <a class="search-js" data-product=\'fanyi\' data-trans=\'translate?i=\' href="http://fanyi.youdao.com/">翻譯</a>\n <a class="search-js" data-product=\'note\' href="http://note.youdao.com">筆記</a>\n <strong>單詞本</strong>\n\t<a class="mn" target="_blank" href="http://www.youdao.com/about/productlist.html"><u>更多?</u></a>\n </div>\n </div>\n\n\n<div id="ym" class="pm">\n <ul>\n <li><a href="http://video.youdao.com" class="search-js" data-product=\'video\'>視頻</a></li>\n <li><a href="http://blog.youdao.com/" class="search-js" data-product=\'blog\'>博客</a></li>\n <li><a href="http://tie.youdao.com/" class="search-js" data-product=\'tie\'>快貼</a></li>\n <li><a href="http://ditu.youdao.com/" class="search-js" data-product=\'ditu\'>地圖</a></li>\n\n <li class="sl"></li>\n <li><a href="http://reader.youdao.com">閱讀</a></li>\n <li><a href="http://m.youdao.com/help">手機</a></li>\n <li><a href="http://shuqian.youdao.com">書簽</a></li>\n <li><a href="http://cidian.youdao.com" class="search-js" data-product=\'cidian\'>桌面詞典</a></li>\n <li class="sl"></li>\n <li><a href="http://www.youdao.com/about/productlist.html">全部產品</a></li>\n\n </ul>\n</div>\n<div id="nm" class="pm">\n <ul>\n <li><a href="http://www.163.com/" target="_blank">首頁</a></li>\n <li><a href="http://news.163.com/" target="_blank">新聞</a></li>\n <li><a href="http://email.163.com/" target="_blank">郵箱</a></li>\n <li><a href="http://blog.163.com/" target="_blank">博客</a></li>\n\n <li><a href="http://photo.163.com/" target="_blank">相冊</a></li>\n <li><a href="http://nie.163.com/" target="_blank">游戲</a></li>\n <li class="sl"></li>\n <li><a href="http://sitemap.163.com/" target="_blank">全部產品</a></li>\n </ul>\n</div>\n\n\n<!-- 圖標與搜索框 -->\n<form id="f" method="get" action="#" name="sb">\n <h1 id="yd"><a href="/wordbook/wordlist">有道單詞本</a></h1>\n <!--<div id="ts" class="fc">\n \n <div class="qc no-suggest" id="qc">\n <input name="tab" value="chn" type="hidden">\n <input name="keyfrom" value="shuqian.top" type="hidden">\n <input type="text" class="q" name="q" id="query" autocomplete="off" value=""/>\n </div>\n <input type="submit" value="搜 索" class="qb" name="btnSearchTag"/>\n \n </div>-->\n <div class="ao"></div>\n <div id="fbl"> </div>\n <div id="fbr"> </div>\n</form> \n \n\n<div id="wrapper">\n\n\n <div id="top" >\n \n\n <a href="#" id="addword"></a>\n\n \n \n <div style="width:500px;float:right;text-align:right;"> \n <label for="select_category">分類</label>\n <select id="select_category">\n <option value="">全部分類</option>\n <option value="無標簽" >無標簽 </option>\n </select> \n \n <a href="#" id="toggle_listmode" class="active"></a><a href="#" id="toggle_cardmode" ></a>\n </div>\n <div class="clear"></div>\n\n </div> \n \n <div id="listmode">\n <div id="wordhead">\n <table width="100%" style="table-layout:fixed;background:#fff;">\n <tr>\n <th width="50px">序號</th>\n <th width="80px">單詞</th>\n <th width="80px">音標</th>\n <th width="320px">解釋</th>\n <!-- <th width="50px">難度</th> -->\n <th width="85px">時間</th>\n <th>分類</th>\n <th width="65px">操作</th>\n </tr>\n </table>\n </div> \n \n <div id="wordlist" >\n <table width="100%" style="table-layout:fixed">\n\n <tbody>\n <tr>\n <td width="50px"> 1</td>\n <td width="80px"><div class="word" title="agglomerative"><a href="/search?keyfrom=webwordbook&q=agglomerative" target="_blank"><strong>agglomerative</strong></a></div></td>\n <td width="80px"><div class="phonetic" title=""></div></td>\n <td width="320px">\n <div class="desc" title="adj. 會凝聚的;[冶] 燒結的,凝結的">adj. 會凝聚的;[冶] 燒結的,凝結的</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2020-1-13</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯agglomerative" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=agglomerative&p=0" \n class="deleteword" title="刪除agglomerative" onclick=\'if(!confirm("您確定刪除單詞 agglomerative 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 2</td>\n <td width="80px"><div class="word" title="anatomy"><a href="/search?keyfrom=webwordbook&q=anatomy" target="_blank"><strong>anatomy</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[?&#39;n?t?m?]">[?&#39;n?t?m?]</div></td>\n <td width="320px">\n <div class="desc" title="n. 解剖;解剖學;剖析;骨骼">n. 解剖;解剖學;剖析;骨骼</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-17</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯anatomy" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=anatomy&p=0" \n class="deleteword" title="刪除anatomy" onclick=\'if(!confirm("您確定刪除單詞 anatomy 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 3</td>\n <td width="80px"><div class="word" title="backbone"><a href="/search?keyfrom=webwordbook&q=backbone" target="_blank"><strong>backbone</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[&#39;b?kb??n]">[&#39;b?kb??n]</div></td>\n <td width="320px">\n <div class="desc" title="n. 支柱;主干網;決心,毅力;脊椎">n. 支柱;主干網;決心,毅力;脊椎</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-13</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯backbone" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=backbone&p=0" \n class="deleteword" title="刪除backbone" onclick=\'if(!confirm("您確定刪除單詞 backbone 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 4</td>\n <td width="80px"><div class="word" title="ballpark"><a href="/search?keyfrom=webwordbook&q=ballpark" target="_blank"><strong>ballpark</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[&#39;b??lpɑ?k]">[&#39;b??lpɑ?k]</div></td>\n <td width="320px">\n <div class="desc" title="n. (美)棒球場;活動領域;可變通范圍\nadj. 大約的">n. (美)棒球場;活動領域;可變通范圍\nadj. 大約的</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-16</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯ballpark" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=ballpark&p=0" \n class="deleteword" title="刪除ballpark" onclick=\'if(!confirm("您確定刪除單詞 ballpark 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 5</td>\n <td width="80px"><div class="word" title="bilingual"><a href="/search?keyfrom=webwordbook&q=bilingual" target="_blank"><strong>bilingual</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[ba?&#39;l??gw(?)l]">[ba?&#39;l??gw(?)l]</div></td>\n <td width="320px">\n <div class="desc" title="adj. 雙語的\nn. 通兩種語言的人">adj. 雙語的\nn. 通兩種語言的人</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-15</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯bilingual" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=bilingual&p=0" \n class="deleteword" title="刪除bilingual" onclick=\'if(!confirm("您確定刪除單詞 bilingual 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 6</td>\n <td width="80px"><div class="word" title="canonical"><a href="/search?keyfrom=webwordbook&q=canonical" target="_blank"><strong>canonical</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[k?&#39;n?n?k(?)l]">[k?&#39;n?n?k(?)l]</div></td>\n <td width="320px">\n <div class="desc" title="adj. 依教規的;權威的;牧師的\nn. 牧師禮服">adj. 依教規的;權威的;牧師的\nn. 牧師禮服</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-14</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯canonical" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=canonical&p=0" \n class="deleteword" title="刪除canonical" onclick=\'if(!confirm("您確定刪除單詞 canonical 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 7</td>\n <td width="80px"><div class="word" title="cater"><a href="/search?keyfrom=webwordbook&q=cater" target="_blank"><strong>cater</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[&#39;ke?t?]">[&#39;ke?t?]</div></td>\n <td width="320px">\n <div class="desc" title="vt. 投合,迎合;滿足需要;提供飲食及服務\nn. (Cater)人名;(英)凱特">vt. 投合,迎合;滿足需要;提供飲食及服務\nn. (Cater)人名;(英)凱特</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-17</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯cater" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=cater&p=0" \n class="deleteword" title="刪除cater" onclick=\'if(!confirm("您確定刪除單詞 cater 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 8</td>\n <td width="80px"><div class="word" title="clarity"><a href="/search?keyfrom=webwordbook&q=clarity" target="_blank"><strong>clarity</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[&#39;kl?r?t?]">[&#39;kl?r?t?]</div></td>\n <td width="320px">\n <div class="desc" title="n. 清楚,明晰;透明\nn. (Clarity)人名;(英)克拉里蒂">n. 清楚,明晰;透明\nn. (Clarity)人名;(英)克拉里蒂</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-16</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯clarity" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=clarity&p=0" \n class="deleteword" title="刪除clarity" onclick=\'if(!confirm("您確定刪除單詞 clarity 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 9</td>\n <td width="80px"><div class="word" title="compression"><a href="/search?keyfrom=webwordbook&q=compression" target="_blank"><strong>compression</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[k?m&#39;pre?(?)n]">[k?m&#39;pre?(?)n]</div></td>\n <td width="320px">\n <div class="desc" title="n. 壓縮,濃縮;壓榨,壓迫">n. 壓縮,濃縮;壓榨,壓迫</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-15</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯compression" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=compression&p=0" \n class="deleteword" title="刪除compression" onclick=\'if(!confirm("您確定刪除單詞 compression 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 10</td>\n <td width="80px"><div class="word" title="contaminated"><a href="/search?keyfrom=webwordbook&q=contaminated" target="_blank"><strong>contaminated</strong></a></div></td>\n <td width="80px"><div class="phonetic" title=""></div></td>\n <td width="320px">\n <div class="desc" title="adj. 受污染的,弄臟的 v. 污染;玷污,毒害(contaminate 的過去式和過去分詞)">adj. 受污染的,弄臟的 v. 污染;玷污,毒害(contaminate 的過去式和過去分詞)</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2020-1-13</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯contaminated" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=contaminated&p=0" \n class="deleteword" title="刪除contaminated" onclick=\'if(!confirm("您確定刪除單詞 contaminated 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 11</td>\n <td width="80px"><div class="word" title="counterparts"><a href="/search?keyfrom=webwordbook&q=counterparts" target="_blank"><strong>counterparts</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[]">[]</div></td>\n <td width="320px">\n <div class="desc" title="n. (契約)副本(counterpart的復數);相對物;相對應的人">n. (契約)副本(counterpart的復數);相對物;相對應的人</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-16</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯counterparts" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=counterparts&p=0" \n class="deleteword" title="刪除counterparts" onclick=\'if(!confirm("您確定刪除單詞 counterparts 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 12</td>\n <td width="80px"><div class="word" title="criteria"><a href="/search?keyfrom=webwordbook&q=criteria" target="_blank"><strong>criteria</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[kra?&#39;t??r??]">[kra?&#39;t??r??]</div></td>\n <td width="320px">\n <div class="desc" title="n. 標準,條件(criterion的復數)">n. 標準,條件(criterion的復數)</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-6</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯criteria" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=criteria&p=0" \n class="deleteword" title="刪除criteria" onclick=\'if(!confirm("您確定刪除單詞 criteria 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 13</td>\n <td width="80px"><div class="word" title="crunch"><a href="/search?keyfrom=webwordbook&q=crunch" target="_blank"><strong>crunch</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[kr?nt?]">[kr?nt?]</div></td>\n <td width="320px">\n <div class="desc" title="n.咬碎,咬碎聲;扎扎地踏\nvt.壓碎;嘎扎嘎扎的咬嚼;扎扎地踏過\nvi.嘎吱作響地咀嚼;嘎吱嘎吱地踏過">n.咬碎,咬碎聲;扎扎地踏\nvt.壓碎;嘎扎嘎扎的咬嚼;扎扎地踏過\nvi.嘎吱作響地咀嚼;嘎吱嘎吱地踏過</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-8</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯crunch" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=crunch&p=0" \n class="deleteword" title="刪除crunch" onclick=\'if(!confirm("您確定刪除單詞 crunch 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 14</td>\n <td width="80px"><div class="word" title="delighted"><a href="/search?keyfrom=webwordbook&q=delighted" target="_blank"><strong>delighted</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[d?&#39;la?t?d]">[d?&#39;la?t?d]</div></td>\n <td width="320px">\n <div class="desc" title="adj. 高興的;欣喜的\nv. 使…興高采烈;感到快樂(delight的過去分詞)">adj. 高興的;欣喜的\nv. 使…興高采烈;感到快樂(delight的過去分詞)</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-16</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯delighted" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=delighted&p=0" \n class="deleteword" title="刪除delighted" onclick=\'if(!confirm("您確定刪除單詞 delighted 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 15</td>\n <td width="80px"><div class="word" title="denominator"><a href="/search?keyfrom=webwordbook&q=denominator" target="_blank"><strong>denominator</strong></a></div></td>\n <td width="80px"><div class="phonetic" title=""></div></td>\n <td width="320px">\n <div class="desc" title="n. [數] 分母;命名者;共同特征或共同性質;平均水平或標準">n. [數] 分母;命名者;共同特征或共同性質;平均水平或標準</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2020-1-13</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯denominator" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=denominator&p=0" \n class="deleteword" title="刪除denominator" onclick=\'if(!confirm("您確定刪除單詞 denominator 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n </tbody>\n </table>\n </div>\n \n \n <div id="wordfoot" >\n \n <div id="pagination">\n <span class="current-page">1 </span>\n \n \n <a href="wordlist?p=1&tags=">2</a> \n <a href="wordlist?p=2&tags=">3</a> \n <span style="border:none;">...</span>\n \n \n <a href="wordlist?p=1&tags=" class="next-page">下一頁</a>\n <a href="wordlist?p=7&tags=" class="next-page">最后一頁</a>\n </div>\n <form id="pagejumpform" action="#">\n 跳至第<input type="text" value=""/>頁<button type="submit">確定</button>\n </form> \n \n\n \n \n \n <div class="right" >當前分類:<strong> 全部分類 </strong> &nbsp;&nbsp;共計 <strong>86</strong> 個單詞 </div>\n <div class="clear"></div>\n </div>\n </div>\n \n\n \n <div id="cardmode">\n <div id="cardmode-wrap">\n <div id="card">\n <h1 ><span id="card_word">agglomerative</span><a href="#" id="phonetic-voice"></a></h1> \n <div id="card_pronounce">\n \n </div>\n\n <div id="description" style="display:none;">\n adj. 會凝聚的;[冶] 燒結的,凝結的\n </div>\n\n <div id="mask" >\n <span id="toggle-description" ><img src="http://shared.ydstatic.com/dict/wordbook-v1/images/mask.png"></span>\n </div>\n \n <div id="action">\n <a id="pre" href="#"></a>\n <a id="next" href="#"></a>\n <div style="clear:both;"></div>\n </div>\n \n </div>\n </div>\n <div style="line-height:28px;text-align:right;">\n 當前分類:<strong> 全部分類 </strong> &nbsp;&nbsp;共計 \n <strong id="card_max_id">86</strong> 個單詞 現在是第<span id="card_id"> 1</span>個\n </div>\n \n </div>\n \n\n\n\n\n\n<div id="footarea" >\n <div style=" line-height:2; margin:10px 0 20px;">更好的進行生詞的整理/記憶,請使用桌面版和手機版有道詞典中的單詞本</div>\n <div id="foot-ad">\n \n <a href="http://cidian.youdao.com/?keyform=webwordbook" class="go-to-desktop" target="_blank"></a>\n <a href="http://cidian.youdao.com/android.html?keyform=webwordbook" class="go-to-mobile" target="_blank"></a>\n\n </div>\n</div> \n\n</div>\n\n<div id="bottom">\n <p><a href="http://youdao.com/">有道首頁</a> - <a href="http://www.youdao.com/help/dict/description/001/">幫助</a> - <a href="http://www.youdao.com/about/">關于有道</a> - <a href="http://i.youdao.com/">官方博客</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&copy; 2020 網易公司 京ICP證080268號</p>\n \n</div>\n\n\n\n <div id="editwordform">\n <h1>danci</h1>\n <a href="#" id="close-editwordform"></a>\n <form method="post" action="wordlist?action=modify">\n \n <label for="word">單詞<span id="waittext"></span></label>\n <input id="word" type="text" value="" name="word" autocomplete="off" />\n <label for="phonetic">音標</label>\n <input id="phonetic" type="text" value="" name="phonetic" />\n <label for="desc">解釋</label>\n <textarea id="desc" name="desc" ></textarea>\n \n <label style="color:blue;">更多(可不填)</label>\n\n <label for="tags">分類</label><input id="tags" type="text" value="" name="tags" autocomplete="off" />\n <ul id="tag-select-list">\n <li>無標簽</li>\n </ul>\n \n <div class="center-content"><button type="submit"></button></div>\n </form>\n </div> \n\n<div id="leftbar">\n<a href="/?keyfrom=webwordbook">返回詞典首頁</a>\n<br/><br/>\n<a href="http://xue.youdao.com/">返回有道學堂</a>\n</div> \n <object width="1" height="1" type="application/x-shockwave-flash" id="dictVoice" data="/dictVoice.swf">\n <param name="movie" value="/dictVoice.swf"/>\n <param name="menu" value="false"/>\n <param name="allowScriptAccess" value="always"/>\n <param name="wmode" value="transparent"/>\n </object>\n \n<script type="text/javascript" src="http://shared.ydstatic.com/dict/wordbook-v1/scripts/jquery-1.5.2.min.js"></script>\n<script type="text/javascript" src="http://shared.ydstatic.com/dict/wordbook-v1/scripts/jquery.extention.dict4.js"></script>\n<script type="text/javascript" src="http://shared.ydstatic.com/dict/wordbook-v1/scripts/navigatorBar.js"></script>\n<script type="text/javascript" src="resources/scripts/main.js"></script>\n</body>\n</html>\n'

3. 使用Pandas解析網頁中的表格

df = pd.read_html(htmls[0]) print(len(df)) print(type(df)) 2 <class 'list'> df[0].head(3) 序號單詞音標解釋時間分類操作
df[1].head(3) 0123456
01agglomerativeNaNadj. 會凝聚的;[冶] 燒結的,凝結的2020-1-13NaNNaN
12anatomy[?'n?t?m?]n. 解剖;解剖學;剖析;骨骼2017-7-17NaNNaN
23backbone['b?kb??n]n. 支柱;主干網;決心,毅力;脊椎2017-7-13NaNNaN
df_cont = df[1] df_cont.columns = df[0].columns df_cont.head(3) 序號單詞音標解釋時間分類操作
01agglomerativeNaNadj. 會凝聚的;[冶] 燒結的,凝結的2020-1-13NaNNaN
12anatomy[?'n?t?m?]n. 解剖;解剖學;剖析;骨骼2017-7-17NaNNaN
23backbone['b?kb??n]n. 支柱;主干網;決心,毅力;脊椎2017-7-13NaNNaN
# 收集6個網頁的表格 df_list = [] for html in htmls:df = pd.read_html(html)df_cont = df[1]df_cont.columns = df[0].columnsdf_list.append(df_cont) # 合并多個表格 df_all = pd.concat(df_list) df_all.head(3) 序號單詞音標解釋時間分類操作
01agglomerativeNaNadj. 會凝聚的;[冶] 燒結的,凝結的2020-1-13NaNNaN
12anatomy[?'n?t?m?]n. 解剖;解剖學;剖析;骨骼2017-7-17NaNNaN
23backbone['b?kb??n]n. 支柱;主干網;決心,毅力;脊椎2017-7-13NaNNaN
df_all.shape (86, 7)

4. 將結果數據輸出到Excel文件

df_all[["單詞", "音標", "解釋"]].to_excel("./course_datas/c32_read_html/網易有道單詞本列表.xlsx", index=False)

總結

以上是生活随笔為你收集整理的32. Pandas借助Python爬虫读取HTML网页表格存储到Excel文件的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。