32. Pandas借助Python爬虫读取HTML网页表格存储到Excel文件
生活随笔
收集整理的這篇文章主要介紹了
32. Pandas借助Python爬虫读取HTML网页表格存储到Excel文件
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
Pandas借助Python爬蟲讀取HTML網頁表格存儲到Excel文件
實現目標:
- 網易有道詞典可以用于英語單詞查詢,可以將查詢的單詞加入到單詞本;
- 當前沒有導出全部單詞列表的功能。為了復習方便,可以爬取所有的單詞列表,存入Excel方便復習
涉及技術:
- Pandas:Python語言最強大的數據處理和數據分析庫
- Python爬蟲:可以將網頁下載下來然后解析,使用requests庫實現,需要繞過登錄驗證
0. 處理流程
輸入網頁:有道詞典-單詞本
處理流程
數據結果到Excel文件(方便打印復習):
1. 登錄網易有道詞典的PC版,微信掃碼登錄,復制cookies到文件
- PC版地址:http://dict.youdao.com/
- Chrome插件可以復制Cookies為Json格式:http://www.editthiscookie.com/
2. 將html都下載下來存入列表
htmls = [] url = "http://dict.youdao.com/wordbook/wordlist?p={idx}&tags=" for idx in range(6):time.sleep(1)print("**爬數據:第%d頁" % idx)r = requests.get(url.format(idx=idx), cookies=cookie_jar)htmls.append(r.text) **爬數據:第0頁 **爬數據:第1頁 **爬數據:第2頁 **爬數據:第3頁 **爬數據:第4頁 **爬數據:第5頁 htmls[0] '<!doctype html>\n<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>\n<title>有道單詞本</title>\n\n<link rel="canonical" href="http://dict.youdao.com/wordbook/"/> \n<meta name="Keywords" content="單詞本,web單詞本,有道,詞典,youdao" />\n<meta name="Description" content="有道詞典單詞本" />\n<link rel="shortcut icon" href="http://shared.ydstatic.com/images/favicon.ico?213" type="image/x-icon"/>\n<link href="http://shared.ydstatic.com/r/1.0/s/g3.css?20110428" rel="stylesheet" type="text/css"/>\n<link type="text/css" href="resources/styles/main.css" rel="stylesheet">\n\n<style type="text/css">\n\n#f{background-image:url(http://shared.ydstatic.com/images/skins/default/skin-x.jpg)}\n#fbl{background:url(http://shared.ydstatic.com/images/skins/default/skin_.jpg) left top}\n#fbr{background:url(http://shared.ydstatic.com/images/skins/default/skin_.jpg) right -200px}\n\n</style>\n<script type="text/javascript">\nvar VARIABLES={ \n tags:"",\n page:"0",\n sort:"",\n querystring:""\n };\n</script>\n\n\n</head>\n\n<body>\n\n<div id="t">\n <div id="u">\n <span id="un">\n <span class="un_n">晚上好,</span>\n <span id="mun" class="un_box"><b class="un_l"><q></q></b><b class="un_r"><q></q></b>\n <span class="un_btn"><b class="un_m"> <q></q></b>\n <span class="un_ml">\n wxoXQUDj_FtHSw23tfJWsboPkq38ok\n </span>\n </span>\n </span>\n </span>\n <span class="sl">|</span>\n <a href="http://account.youdao.com/logout?service=dict&back_url=http%3A%2F%2Fdict.youdao.com%2Fwordbook%2Fwordlist">登出</a>\n </div>\n <div id="n">\n <a href="http://www.163.com/" id="mn" class="mn" target="_blank"><u>網易</u><s>▼</s></a>\n <span class="sl">|</span>\n <a class="search-js" data-product=\'www\' href="http://www.youdao.com">網頁</a>\n <a class="search-js" data-product=\'image\' href="http://image.youdao.com">圖片</a>\n <a class="search-js" data-product=\'news\' href="http://news.youdao.com">熱聞</a>\n <a class="search-js" data-product=\'gouwu\' href="http://gouwu.youdao.com">購物</a>\n <a class="search-js" data-product=\'dict\' href="http://dict.youdao.com">詞典</a>\n <a class="search-js" data-product=\'fanyi\' data-trans=\'translate?i=\' href="http://fanyi.youdao.com/">翻譯</a>\n <a class="search-js" data-product=\'note\' href="http://note.youdao.com">筆記</a>\n <strong>單詞本</strong>\n\t<a class="mn" target="_blank" href="http://www.youdao.com/about/productlist.html"><u>更多?</u></a>\n </div>\n </div>\n\n\n<div id="ym" class="pm">\n <ul>\n <li><a href="http://video.youdao.com" class="search-js" data-product=\'video\'>視頻</a></li>\n <li><a href="http://blog.youdao.com/" class="search-js" data-product=\'blog\'>博客</a></li>\n <li><a href="http://tie.youdao.com/" class="search-js" data-product=\'tie\'>快貼</a></li>\n <li><a href="http://ditu.youdao.com/" class="search-js" data-product=\'ditu\'>地圖</a></li>\n\n <li class="sl"></li>\n <li><a href="http://reader.youdao.com">閱讀</a></li>\n <li><a href="http://m.youdao.com/help">手機</a></li>\n <li><a href="http://shuqian.youdao.com">書簽</a></li>\n <li><a href="http://cidian.youdao.com" class="search-js" data-product=\'cidian\'>桌面詞典</a></li>\n <li class="sl"></li>\n <li><a href="http://www.youdao.com/about/productlist.html">全部產品</a></li>\n\n </ul>\n</div>\n<div id="nm" class="pm">\n <ul>\n <li><a href="http://www.163.com/" target="_blank">首頁</a></li>\n <li><a href="http://news.163.com/" target="_blank">新聞</a></li>\n <li><a href="http://email.163.com/" target="_blank">郵箱</a></li>\n <li><a href="http://blog.163.com/" target="_blank">博客</a></li>\n\n <li><a href="http://photo.163.com/" target="_blank">相冊</a></li>\n <li><a href="http://nie.163.com/" target="_blank">游戲</a></li>\n <li class="sl"></li>\n <li><a href="http://sitemap.163.com/" target="_blank">全部產品</a></li>\n </ul>\n</div>\n\n\n<!-- 圖標與搜索框 -->\n<form id="f" method="get" action="#" name="sb">\n <h1 id="yd"><a href="/wordbook/wordlist">有道單詞本</a></h1>\n <!--<div id="ts" class="fc">\n \n <div class="qc no-suggest" id="qc">\n <input name="tab" value="chn" type="hidden">\n <input name="keyfrom" value="shuqian.top" type="hidden">\n <input type="text" class="q" name="q" id="query" autocomplete="off" value=""/>\n </div>\n <input type="submit" value="搜 索" class="qb" name="btnSearchTag"/>\n \n </div>-->\n <div class="ao"></div>\n <div id="fbl"> </div>\n <div id="fbr"> </div>\n</form> \n \n\n<div id="wrapper">\n\n\n <div id="top" >\n \n\n <a href="#" id="addword"></a>\n\n \n \n <div style="width:500px;float:right;text-align:right;"> \n <label for="select_category">分類</label>\n <select id="select_category">\n <option value="">全部分類</option>\n <option value="無標簽" >無標簽 </option>\n </select> \n \n <a href="#" id="toggle_listmode" class="active"></a><a href="#" id="toggle_cardmode" ></a>\n </div>\n <div class="clear"></div>\n\n </div> \n \n <div id="listmode">\n <div id="wordhead">\n <table width="100%" style="table-layout:fixed;background:#fff;">\n <tr>\n <th width="50px">序號</th>\n <th width="80px">單詞</th>\n <th width="80px">音標</th>\n <th width="320px">解釋</th>\n <!-- <th width="50px">難度</th> -->\n <th width="85px">時間</th>\n <th>分類</th>\n <th width="65px">操作</th>\n </tr>\n </table>\n </div> \n \n <div id="wordlist" >\n <table width="100%" style="table-layout:fixed">\n\n <tbody>\n <tr>\n <td width="50px"> 1</td>\n <td width="80px"><div class="word" title="agglomerative"><a href="/search?keyfrom=webwordbook&q=agglomerative" target="_blank"><strong>agglomerative</strong></a></div></td>\n <td width="80px"><div class="phonetic" title=""></div></td>\n <td width="320px">\n <div class="desc" title="adj. 會凝聚的;[冶] 燒結的,凝結的">adj. 會凝聚的;[冶] 燒結的,凝結的</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2020-1-13</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯agglomerative" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=agglomerative&p=0" \n class="deleteword" title="刪除agglomerative" onclick=\'if(!confirm("您確定刪除單詞 agglomerative 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 2</td>\n <td width="80px"><div class="word" title="anatomy"><a href="/search?keyfrom=webwordbook&q=anatomy" target="_blank"><strong>anatomy</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[?'n?t?m?]">[?'n?t?m?]</div></td>\n <td width="320px">\n <div class="desc" title="n. 解剖;解剖學;剖析;骨骼">n. 解剖;解剖學;剖析;骨骼</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-17</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯anatomy" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=anatomy&p=0" \n class="deleteword" title="刪除anatomy" onclick=\'if(!confirm("您確定刪除單詞 anatomy 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 3</td>\n <td width="80px"><div class="word" title="backbone"><a href="/search?keyfrom=webwordbook&q=backbone" target="_blank"><strong>backbone</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="['b?kb??n]">['b?kb??n]</div></td>\n <td width="320px">\n <div class="desc" title="n. 支柱;主干網;決心,毅力;脊椎">n. 支柱;主干網;決心,毅力;脊椎</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-13</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯backbone" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=backbone&p=0" \n class="deleteword" title="刪除backbone" onclick=\'if(!confirm("您確定刪除單詞 backbone 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 4</td>\n <td width="80px"><div class="word" title="ballpark"><a href="/search?keyfrom=webwordbook&q=ballpark" target="_blank"><strong>ballpark</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="['b??lpɑ?k]">['b??lpɑ?k]</div></td>\n <td width="320px">\n <div class="desc" title="n. (美)棒球場;活動領域;可變通范圍\nadj. 大約的">n. (美)棒球場;活動領域;可變通范圍\nadj. 大約的</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-16</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯ballpark" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=ballpark&p=0" \n class="deleteword" title="刪除ballpark" onclick=\'if(!confirm("您確定刪除單詞 ballpark 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 5</td>\n <td width="80px"><div class="word" title="bilingual"><a href="/search?keyfrom=webwordbook&q=bilingual" target="_blank"><strong>bilingual</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[ba?'l??gw(?)l]">[ba?'l??gw(?)l]</div></td>\n <td width="320px">\n <div class="desc" title="adj. 雙語的\nn. 通兩種語言的人">adj. 雙語的\nn. 通兩種語言的人</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-15</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯bilingual" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=bilingual&p=0" \n class="deleteword" title="刪除bilingual" onclick=\'if(!confirm("您確定刪除單詞 bilingual 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 6</td>\n <td width="80px"><div class="word" title="canonical"><a href="/search?keyfrom=webwordbook&q=canonical" target="_blank"><strong>canonical</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[k?'n?n?k(?)l]">[k?'n?n?k(?)l]</div></td>\n <td width="320px">\n <div class="desc" title="adj. 依教規的;權威的;牧師的\nn. 牧師禮服">adj. 依教規的;權威的;牧師的\nn. 牧師禮服</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-14</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯canonical" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=canonical&p=0" \n class="deleteword" title="刪除canonical" onclick=\'if(!confirm("您確定刪除單詞 canonical 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 7</td>\n <td width="80px"><div class="word" title="cater"><a href="/search?keyfrom=webwordbook&q=cater" target="_blank"><strong>cater</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="['ke?t?]">['ke?t?]</div></td>\n <td width="320px">\n <div class="desc" title="vt. 投合,迎合;滿足需要;提供飲食及服務\nn. (Cater)人名;(英)凱特">vt. 投合,迎合;滿足需要;提供飲食及服務\nn. (Cater)人名;(英)凱特</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-17</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯cater" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=cater&p=0" \n class="deleteword" title="刪除cater" onclick=\'if(!confirm("您確定刪除單詞 cater 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 8</td>\n <td width="80px"><div class="word" title="clarity"><a href="/search?keyfrom=webwordbook&q=clarity" target="_blank"><strong>clarity</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="['kl?r?t?]">['kl?r?t?]</div></td>\n <td width="320px">\n <div class="desc" title="n. 清楚,明晰;透明\nn. (Clarity)人名;(英)克拉里蒂">n. 清楚,明晰;透明\nn. (Clarity)人名;(英)克拉里蒂</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-16</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯clarity" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=clarity&p=0" \n class="deleteword" title="刪除clarity" onclick=\'if(!confirm("您確定刪除單詞 clarity 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 9</td>\n <td width="80px"><div class="word" title="compression"><a href="/search?keyfrom=webwordbook&q=compression" target="_blank"><strong>compression</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[k?m'pre?(?)n]">[k?m'pre?(?)n]</div></td>\n <td width="320px">\n <div class="desc" title="n. 壓縮,濃縮;壓榨,壓迫">n. 壓縮,濃縮;壓榨,壓迫</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-15</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯compression" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=compression&p=0" \n class="deleteword" title="刪除compression" onclick=\'if(!confirm("您確定刪除單詞 compression 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 10</td>\n <td width="80px"><div class="word" title="contaminated"><a href="/search?keyfrom=webwordbook&q=contaminated" target="_blank"><strong>contaminated</strong></a></div></td>\n <td width="80px"><div class="phonetic" title=""></div></td>\n <td width="320px">\n <div class="desc" title="adj. 受污染的,弄臟的 v. 污染;玷污,毒害(contaminate 的過去式和過去分詞)">adj. 受污染的,弄臟的 v. 污染;玷污,毒害(contaminate 的過去式和過去分詞)</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2020-1-13</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯contaminated" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=contaminated&p=0" \n class="deleteword" title="刪除contaminated" onclick=\'if(!confirm("您確定刪除單詞 contaminated 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 11</td>\n <td width="80px"><div class="word" title="counterparts"><a href="/search?keyfrom=webwordbook&q=counterparts" target="_blank"><strong>counterparts</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[]">[]</div></td>\n <td width="320px">\n <div class="desc" title="n. (契約)副本(counterpart的復數);相對物;相對應的人">n. (契約)副本(counterpart的復數);相對物;相對應的人</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-16</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯counterparts" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=counterparts&p=0" \n class="deleteword" title="刪除counterparts" onclick=\'if(!confirm("您確定刪除單詞 counterparts 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 12</td>\n <td width="80px"><div class="word" title="criteria"><a href="/search?keyfrom=webwordbook&q=criteria" target="_blank"><strong>criteria</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[kra?'t??r??]">[kra?'t??r??]</div></td>\n <td width="320px">\n <div class="desc" title="n. 標準,條件(criterion的復數)">n. 標準,條件(criterion的復數)</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2017-7-6</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯criteria" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=criteria&p=0" \n class="deleteword" title="刪除criteria" onclick=\'if(!confirm("您確定刪除單詞 criteria 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 13</td>\n <td width="80px"><div class="word" title="crunch"><a href="/search?keyfrom=webwordbook&q=crunch" target="_blank"><strong>crunch</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[kr?nt?]">[kr?nt?]</div></td>\n <td width="320px">\n <div class="desc" title="n.咬碎,咬碎聲;扎扎地踏\nvt.壓碎;嘎扎嘎扎的咬嚼;扎扎地踏過\nvi.嘎吱作響地咀嚼;嘎吱嘎吱地踏過">n.咬碎,咬碎聲;扎扎地踏\nvt.壓碎;嘎扎嘎扎的咬嚼;扎扎地踏過\nvi.嘎吱作響地咀嚼;嘎吱嘎吱地踏過</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-8</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯crunch" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=crunch&p=0" \n class="deleteword" title="刪除crunch" onclick=\'if(!confirm("您確定刪除單詞 crunch 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 14</td>\n <td width="80px"><div class="word" title="delighted"><a href="/search?keyfrom=webwordbook&q=delighted" target="_blank"><strong>delighted</strong></a></div></td>\n <td width="80px"><div class="phonetic" title="[d?'la?t?d]">[d?'la?t?d]</div></td>\n <td width="320px">\n <div class="desc" title="adj. 高興的;欣喜的\nv. 使…興高采烈;感到快樂(delight的過去分詞)">adj. 高興的;欣喜的\nv. 使…興高采烈;感到快樂(delight的過去分詞)</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2019-10-16</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯delighted" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=delighted&p=0" \n class="deleteword" title="刪除delighted" onclick=\'if(!confirm("您確定刪除單詞 delighted 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n <tr>\n <td width="50px"> 15</td>\n <td width="80px"><div class="word" title="denominator"><a href="/search?keyfrom=webwordbook&q=denominator" target="_blank"><strong>denominator</strong></a></div></td>\n <td width="80px"><div class="phonetic" title=""></div></td>\n <td width="320px">\n <div class="desc" title="n. [數] 分母;命名者;共同特征或共同性質;平均水平或標準">n. [數] 分母;命名者;共同特征或共同性質;平均水平或標準</div>\n </td>\n <!-- <td width="50px">\n <span class="flag" style="display:none;">0</span>\n <span class="level">\n </span>\n </td> -->\n\n <td width="85px">2020-1-13</td>\n <td >\n <div class="tags" title=""></div>\n </td>\n <td width="65px" style="vertical-align:middle;">\n <a href="#" class="editword" title="編輯denominator" ></a>\n \n \n <a href=\n "wordlist?action=delete&word=denominator&p=0" \n class="deleteword" title="刪除denominator" onclick=\'if(!confirm("您確定刪除單詞 denominator 嗎?")){ return false;}else return true;\'></a>\n </td>\n </tr>\n </tbody>\n </table>\n </div>\n \n \n <div id="wordfoot" >\n \n <div id="pagination">\n <span class="current-page">1 </span>\n \n \n <a href="wordlist?p=1&tags=">2</a> \n <a href="wordlist?p=2&tags=">3</a> \n <span style="border:none;">...</span>\n \n \n <a href="wordlist?p=1&tags=" class="next-page">下一頁</a>\n <a href="wordlist?p=7&tags=" class="next-page">最后一頁</a>\n </div>\n <form id="pagejumpform" action="#">\n 跳至第<input type="text" value=""/>頁<button type="submit">確定</button>\n </form> \n \n\n \n \n \n <div class="right" >當前分類:<strong> 全部分類 </strong> 共計 <strong>86</strong> 個單詞 </div>\n <div class="clear"></div>\n </div>\n </div>\n \n\n \n <div id="cardmode">\n <div id="cardmode-wrap">\n <div id="card">\n <h1 ><span id="card_word">agglomerative</span><a href="#" id="phonetic-voice"></a></h1> \n <div id="card_pronounce">\n \n </div>\n\n <div id="description" style="display:none;">\n adj. 會凝聚的;[冶] 燒結的,凝結的\n </div>\n\n <div id="mask" >\n <span id="toggle-description" ><img src="http://shared.ydstatic.com/dict/wordbook-v1/images/mask.png"></span>\n </div>\n \n <div id="action">\n <a id="pre" href="#"></a>\n <a id="next" href="#"></a>\n <div style="clear:both;"></div>\n </div>\n \n </div>\n </div>\n <div style="line-height:28px;text-align:right;">\n 當前分類:<strong> 全部分類 </strong> 共計 \n <strong id="card_max_id">86</strong> 個單詞 現在是第<span id="card_id"> 1</span>個\n </div>\n \n </div>\n \n\n\n\n\n\n<div id="footarea" >\n <div style=" line-height:2; margin:10px 0 20px;">更好的進行生詞的整理/記憶,請使用桌面版和手機版有道詞典中的單詞本</div>\n <div id="foot-ad">\n \n <a href="http://cidian.youdao.com/?keyform=webwordbook" class="go-to-desktop" target="_blank"></a>\n <a href="http://cidian.youdao.com/android.html?keyform=webwordbook" class="go-to-mobile" target="_blank"></a>\n\n </div>\n</div> \n\n</div>\n\n<div id="bottom">\n <p><a href="http://youdao.com/">有道首頁</a> - <a href="http://www.youdao.com/help/dict/description/001/">幫助</a> - <a href="http://www.youdao.com/about/">關于有道</a> - <a href="http://i.youdao.com/">官方博客</a> © 2020 網易公司 京ICP證080268號</p>\n \n</div>\n\n\n\n <div id="editwordform">\n <h1>danci</h1>\n <a href="#" id="close-editwordform"></a>\n <form method="post" action="wordlist?action=modify">\n \n <label for="word">單詞<span id="waittext"></span></label>\n <input id="word" type="text" value="" name="word" autocomplete="off" />\n <label for="phonetic">音標</label>\n <input id="phonetic" type="text" value="" name="phonetic" />\n <label for="desc">解釋</label>\n <textarea id="desc" name="desc" ></textarea>\n \n <label style="color:blue;">更多(可不填)</label>\n\n <label for="tags">分類</label><input id="tags" type="text" value="" name="tags" autocomplete="off" />\n <ul id="tag-select-list">\n <li>無標簽</li>\n </ul>\n \n <div class="center-content"><button type="submit"></button></div>\n </form>\n </div> \n\n<div id="leftbar">\n<a href="/?keyfrom=webwordbook">返回詞典首頁</a>\n<br/><br/>\n<a href="http://xue.youdao.com/">返回有道學堂</a>\n</div> \n <object width="1" height="1" type="application/x-shockwave-flash" id="dictVoice" data="/dictVoice.swf">\n <param name="movie" value="/dictVoice.swf"/>\n <param name="menu" value="false"/>\n <param name="allowScriptAccess" value="always"/>\n <param name="wmode" value="transparent"/>\n </object>\n \n<script type="text/javascript" src="http://shared.ydstatic.com/dict/wordbook-v1/scripts/jquery-1.5.2.min.js"></script>\n<script type="text/javascript" src="http://shared.ydstatic.com/dict/wordbook-v1/scripts/jquery.extention.dict4.js"></script>\n<script type="text/javascript" src="http://shared.ydstatic.com/dict/wordbook-v1/scripts/navigatorBar.js"></script>\n<script type="text/javascript" src="resources/scripts/main.js"></script>\n</body>\n</html>\n'3. 使用Pandas解析網頁中的表格
df = pd.read_html(htmls[0]) print(len(df)) print(type(df)) 2 <class 'list'> df[0].head(3)| 0 | 1 | agglomerative | NaN | adj. 會凝聚的;[冶] 燒結的,凝結的 | 2020-1-13 | NaN | NaN |
| 1 | 2 | anatomy | [?'n?t?m?] | n. 解剖;解剖學;剖析;骨骼 | 2017-7-17 | NaN | NaN |
| 2 | 3 | backbone | ['b?kb??n] | n. 支柱;主干網;決心,毅力;脊椎 | 2017-7-13 | NaN | NaN |
| 0 | 1 | agglomerative | NaN | adj. 會凝聚的;[冶] 燒結的,凝結的 | 2020-1-13 | NaN | NaN |
| 1 | 2 | anatomy | [?'n?t?m?] | n. 解剖;解剖學;剖析;骨骼 | 2017-7-17 | NaN | NaN |
| 2 | 3 | backbone | ['b?kb??n] | n. 支柱;主干網;決心,毅力;脊椎 | 2017-7-13 | NaN | NaN |
| 0 | 1 | agglomerative | NaN | adj. 會凝聚的;[冶] 燒結的,凝結的 | 2020-1-13 | NaN | NaN |
| 1 | 2 | anatomy | [?'n?t?m?] | n. 解剖;解剖學;剖析;骨骼 | 2017-7-17 | NaN | NaN |
| 2 | 3 | backbone | ['b?kb??n] | n. 支柱;主干網;決心,毅力;脊椎 | 2017-7-13 | NaN | NaN |
4. 將結果數據輸出到Excel文件
df_all[["單詞", "音標", "解釋"]].to_excel("./course_datas/c32_read_html/網易有道單詞本列表.xlsx", index=False)總結
以上是生活随笔為你收集整理的32. Pandas借助Python爬虫读取HTML网页表格存储到Excel文件的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: ExtJS EditorGridPane
- 下一篇: html5的基本工作原理,HTML5基础