两个基于爬虫的项目: Kiwix ArchiveBox
我在之前的博文 “談談爬蟲的昨天、今天和明天” 提到,爬蟲技術是曾經(jīng)互聯(lián)網(wǎng)的基石,也是當今互聯(lián)網(wǎng)技術的重要組成。未來 PC 服務和移動服務將產(chǎn)生功能上的分離,移動端更加關注普羅大眾日常生活相關的功能(購物、社交、娛樂等),而 PC 將回歸其本質(zhì),即工具屬性。
最近看帖子發(fā)現(xiàn)了兩個基于爬蟲的開源項目,雖然還殘留著 PC 時代項目痕跡,但個人覺得這兩個項目還是有一定的意義的,因為這兩個項目從某種程度上來說,更加關注 PC 的工具性。同時國內(nèi)關于這兩個項目的介紹寥寥。今天在這里和大家分享一下~
1. Kiwix
Kiwix 的官方宣傳是:在你的手機和電腦上輕松儲存 Wikipeida 和任何網(wǎng)站(Store Wikipedia or any website on your mobile phone or computer, easily)。
這個網(wǎng)站最初是用來做 Wikipeida 離線訪問的,后來逐步擴展到一些其他主流網(wǎng)站的離線訪問,例如 Project Gutenberg、 Stack Exchange、 YouTube、 Ted Talks
該項目的核心技術思路很直接、簡單:
#mermaid-svg-XZbpTKTHfDYOGITG .label{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);fill:#333;color:#333}#mermaid-svg-XZbpTKTHfDYOGITG .label text{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .node rect,#mermaid-svg-XZbpTKTHfDYOGITG .node circle,#mermaid-svg-XZbpTKTHfDYOGITG .node ellipse,#mermaid-svg-XZbpTKTHfDYOGITG .node polygon,#mermaid-svg-XZbpTKTHfDYOGITG .node path{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-XZbpTKTHfDYOGITG .node .label{text-align:center;fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .node.clickable{cursor:pointer}#mermaid-svg-XZbpTKTHfDYOGITG .arrowheadPath{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .edgePath .path{stroke:#333;stroke-width:1.5px}#mermaid-svg-XZbpTKTHfDYOGITG .flowchart-link{stroke:#333;fill:none}#mermaid-svg-XZbpTKTHfDYOGITG .edgeLabel{background-color:#e8e8e8;text-align:center}#mermaid-svg-XZbpTKTHfDYOGITG .edgeLabel rect{opacity:0.9}#mermaid-svg-XZbpTKTHfDYOGITG .edgeLabel span{color:#333}#mermaid-svg-XZbpTKTHfDYOGITG .cluster rect{fill:#ffffde;stroke:#aa3;stroke-width:1px}#mermaid-svg-XZbpTKTHfDYOGITG .cluster text{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:12px;background:#ffffde;border:1px solid #aa3;border-radius:2px;pointer-events:none;z-index:100}#mermaid-svg-XZbpTKTHfDYOGITG .actor{stroke:#ccf;fill:#ECECFF}#mermaid-svg-XZbpTKTHfDYOGITG text.actor>tspan{fill:#000;stroke:none}#mermaid-svg-XZbpTKTHfDYOGITG .actor-line{stroke:grey}#mermaid-svg-XZbpTKTHfDYOGITG .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:#333}#mermaid-svg-XZbpTKTHfDYOGITG .messageLine1{stroke-width:1.5;stroke-dasharray:2, 2;stroke:#333}#mermaid-svg-XZbpTKTHfDYOGITG #arrowhead path{fill:#333;stroke:#333}#mermaid-svg-XZbpTKTHfDYOGITG .sequenceNumber{fill:#fff}#mermaid-svg-XZbpTKTHfDYOGITG #sequencenumber{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG #crosshead path{fill:#333;stroke:#333}#mermaid-svg-XZbpTKTHfDYOGITG .messageText{fill:#333;stroke:#333}#mermaid-svg-XZbpTKTHfDYOGITG .labelBox{stroke:#ccf;fill:#ECECFF}#mermaid-svg-XZbpTKTHfDYOGITG .labelText,#mermaid-svg-XZbpTKTHfDYOGITG .labelText>tspan{fill:#000;stroke:none}#mermaid-svg-XZbpTKTHfDYOGITG .loopText,#mermaid-svg-XZbpTKTHfDYOGITG .loopText>tspan{fill:#000;stroke:none}#mermaid-svg-XZbpTKTHfDYOGITG .loopLine{stroke-width:2px;stroke-dasharray:2, 2;stroke:#ccf;fill:#ccf}#mermaid-svg-XZbpTKTHfDYOGITG .note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-XZbpTKTHfDYOGITG .noteText,#mermaid-svg-XZbpTKTHfDYOGITG .noteText>tspan{fill:#000;stroke:none}#mermaid-svg-XZbpTKTHfDYOGITG .activation0{fill:#f4f4f4;stroke:#666}#mermaid-svg-XZbpTKTHfDYOGITG .activation1{fill:#f4f4f4;stroke:#666}#mermaid-svg-XZbpTKTHfDYOGITG .activation2{fill:#f4f4f4;stroke:#666}#mermaid-svg-XZbpTKTHfDYOGITG .mermaid-main-font{font-family:"trebuchet ms", verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG .section{stroke:none;opacity:0.2}#mermaid-svg-XZbpTKTHfDYOGITG .section0{fill:rgba(102,102,255,0.49)}#mermaid-svg-XZbpTKTHfDYOGITG .section2{fill:#fff400}#mermaid-svg-XZbpTKTHfDYOGITG .section1,#mermaid-svg-XZbpTKTHfDYOGITG .section3{fill:#fff;opacity:0.2}#mermaid-svg-XZbpTKTHfDYOGITG .sectionTitle0{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .sectionTitle1{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .sectionTitle2{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .sectionTitle3{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .sectionTitle{text-anchor:start;font-size:11px;text-height:14px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG .grid .tick{stroke:#d3d3d3;opacity:0.8;shape-rendering:crispEdges}#mermaid-svg-XZbpTKTHfDYOGITG .grid .tick text{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG .grid path{stroke-width:0}#mermaid-svg-XZbpTKTHfDYOGITG .today{fill:none;stroke:red;stroke-width:2px}#mermaid-svg-XZbpTKTHfDYOGITG .task{stroke-width:2}#mermaid-svg-XZbpTKTHfDYOGITG .taskText{text-anchor:middle;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG .taskText:not([font-size]){font-size:11px}#mermaid-svg-XZbpTKTHfDYOGITG .taskTextOutsideRight{fill:#000;text-anchor:start;font-size:11px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG .taskTextOutsideLeft{fill:#000;text-anchor:end;font-size:11px}#mermaid-svg-XZbpTKTHfDYOGITG .task.clickable{cursor:pointer}#mermaid-svg-XZbpTKTHfDYOGITG .taskText.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-XZbpTKTHfDYOGITG .taskTextOutsideLeft.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-XZbpTKTHfDYOGITG .taskTextOutsideRight.clickable{cursor:pointer;fill:#003163 !important;font-weight:bold}#mermaid-svg-XZbpTKTHfDYOGITG .taskText0,#mermaid-svg-XZbpTKTHfDYOGITG .taskText1,#mermaid-svg-XZbpTKTHfDYOGITG .taskText2,#mermaid-svg-XZbpTKTHfDYOGITG .taskText3{fill:#fff}#mermaid-svg-XZbpTKTHfDYOGITG .task0,#mermaid-svg-XZbpTKTHfDYOGITG .task1,#mermaid-svg-XZbpTKTHfDYOGITG .task2,#mermaid-svg-XZbpTKTHfDYOGITG .task3{fill:#8a90dd;stroke:#534fbc}#mermaid-svg-XZbpTKTHfDYOGITG .taskTextOutside0,#mermaid-svg-XZbpTKTHfDYOGITG .taskTextOutside2{fill:#000}#mermaid-svg-XZbpTKTHfDYOGITG .taskTextOutside1,#mermaid-svg-XZbpTKTHfDYOGITG .taskTextOutside3{fill:#000}#mermaid-svg-XZbpTKTHfDYOGITG .active0,#mermaid-svg-XZbpTKTHfDYOGITG .active1,#mermaid-svg-XZbpTKTHfDYOGITG .active2,#mermaid-svg-XZbpTKTHfDYOGITG .active3{fill:#bfc7ff;stroke:#534fbc}#mermaid-svg-XZbpTKTHfDYOGITG .activeText0,#mermaid-svg-XZbpTKTHfDYOGITG .activeText1,#mermaid-svg-XZbpTKTHfDYOGITG .activeText2,#mermaid-svg-XZbpTKTHfDYOGITG .activeText3{fill:#000 !important}#mermaid-svg-XZbpTKTHfDYOGITG .done0,#mermaid-svg-XZbpTKTHfDYOGITG .done1,#mermaid-svg-XZbpTKTHfDYOGITG .done2,#mermaid-svg-XZbpTKTHfDYOGITG .done3{stroke:grey;fill:#d3d3d3;stroke-width:2}#mermaid-svg-XZbpTKTHfDYOGITG .doneText0,#mermaid-svg-XZbpTKTHfDYOGITG .doneText1,#mermaid-svg-XZbpTKTHfDYOGITG .doneText2,#mermaid-svg-XZbpTKTHfDYOGITG .doneText3{fill:#000 !important}#mermaid-svg-XZbpTKTHfDYOGITG .crit0,#mermaid-svg-XZbpTKTHfDYOGITG .crit1,#mermaid-svg-XZbpTKTHfDYOGITG .crit2,#mermaid-svg-XZbpTKTHfDYOGITG .crit3{stroke:#f88;fill:red;stroke-width:2}#mermaid-svg-XZbpTKTHfDYOGITG .activeCrit0,#mermaid-svg-XZbpTKTHfDYOGITG .activeCrit1,#mermaid-svg-XZbpTKTHfDYOGITG .activeCrit2,#mermaid-svg-XZbpTKTHfDYOGITG .activeCrit3{stroke:#f88;fill:#bfc7ff;stroke-width:2}#mermaid-svg-XZbpTKTHfDYOGITG .doneCrit0,#mermaid-svg-XZbpTKTHfDYOGITG .doneCrit1,#mermaid-svg-XZbpTKTHfDYOGITG .doneCrit2,#mermaid-svg-XZbpTKTHfDYOGITG .doneCrit3{stroke:#f88;fill:#d3d3d3;stroke-width:2;cursor:pointer;shape-rendering:crispEdges}#mermaid-svg-XZbpTKTHfDYOGITG .milestone{transform:rotate(45deg) scale(0.8, 0.8)}#mermaid-svg-XZbpTKTHfDYOGITG .milestoneText{font-style:italic}#mermaid-svg-XZbpTKTHfDYOGITG .doneCritText0,#mermaid-svg-XZbpTKTHfDYOGITG .doneCritText1,#mermaid-svg-XZbpTKTHfDYOGITG .doneCritText2,#mermaid-svg-XZbpTKTHfDYOGITG .doneCritText3{fill:#000 !important}#mermaid-svg-XZbpTKTHfDYOGITG .activeCritText0,#mermaid-svg-XZbpTKTHfDYOGITG .activeCritText1,#mermaid-svg-XZbpTKTHfDYOGITG .activeCritText2,#mermaid-svg-XZbpTKTHfDYOGITG .activeCritText3{fill:#000 !important}#mermaid-svg-XZbpTKTHfDYOGITG .titleText{text-anchor:middle;font-size:18px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG g.classGroup text{fill:#9370db;stroke:none;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family);font-size:10px}#mermaid-svg-XZbpTKTHfDYOGITG g.classGroup text .title{font-weight:bolder}#mermaid-svg-XZbpTKTHfDYOGITG g.clickable{cursor:pointer}#mermaid-svg-XZbpTKTHfDYOGITG g.classGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-XZbpTKTHfDYOGITG g.classGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG .classLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.5}#mermaid-svg-XZbpTKTHfDYOGITG .classLabel .label{fill:#9370db;font-size:10px}#mermaid-svg-XZbpTKTHfDYOGITG .relation{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-XZbpTKTHfDYOGITG .dashed-line{stroke-dasharray:3}#mermaid-svg-XZbpTKTHfDYOGITG #compositionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG #compositionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG #aggregationStart{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG #aggregationEnd{fill:#ECECFF;stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG #dependencyStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG #dependencyEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG #extensionStart{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG #extensionEnd{fill:#9370db;stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG .commit-id,#mermaid-svg-XZbpTKTHfDYOGITG .commit-msg,#mermaid-svg-XZbpTKTHfDYOGITG .branch-label{fill:lightgrey;color:lightgrey;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG .pieTitleText{text-anchor:middle;font-size:25px;fill:#000;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG .slice{font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG g.stateGroup text{fill:#9370db;stroke:none;font-size:10px;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG g.stateGroup text{fill:#9370db;fill:#333;stroke:none;font-size:10px}#mermaid-svg-XZbpTKTHfDYOGITG g.statediagram-cluster .cluster-label text{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG g.stateGroup .state-title{font-weight:bolder;fill:#000}#mermaid-svg-XZbpTKTHfDYOGITG g.stateGroup rect{fill:#ECECFF;stroke:#9370db}#mermaid-svg-XZbpTKTHfDYOGITG g.stateGroup line{stroke:#9370db;stroke-width:1}#mermaid-svg-XZbpTKTHfDYOGITG .transition{stroke:#9370db;stroke-width:1;fill:none}#mermaid-svg-XZbpTKTHfDYOGITG .stateGroup .composit{fill:white;border-bottom:1px}#mermaid-svg-XZbpTKTHfDYOGITG .stateGroup .alt-composit{fill:#e0e0e0;border-bottom:1px}#mermaid-svg-XZbpTKTHfDYOGITG .state-note{stroke:#aa3;fill:#fff5ad}#mermaid-svg-XZbpTKTHfDYOGITG .state-note text{fill:black;stroke:none;font-size:10px}#mermaid-svg-XZbpTKTHfDYOGITG .stateLabel .box{stroke:none;stroke-width:0;fill:#ECECFF;opacity:0.7}#mermaid-svg-XZbpTKTHfDYOGITG .edgeLabel text{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .stateLabel text{fill:#000;font-size:10px;font-weight:bold;font-family:'trebuchet ms', verdana, arial;font-family:var(--mermaid-font-family)}#mermaid-svg-XZbpTKTHfDYOGITG .node circle.state-start{fill:black;stroke:black}#mermaid-svg-XZbpTKTHfDYOGITG .node circle.state-end{fill:black;stroke:white;stroke-width:1.5}#mermaid-svg-XZbpTKTHfDYOGITG #statediagram-barbEnd{fill:#9370db}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-cluster rect{fill:#ECECFF;stroke:#9370db;stroke-width:1px}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-cluster rect.outer{rx:5px;ry:5px}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-state .divider{stroke:#9370db}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-state .title-state{rx:5px;ry:5px}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-cluster.statediagram-cluster .inner{fill:white}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-cluster.statediagram-cluster-alt .inner{fill:#e0e0e0}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-cluster .inner{rx:0;ry:0}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-state rect.basic{rx:5px;ry:5px}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-state rect.divider{stroke-dasharray:10,10;fill:#efefef}#mermaid-svg-XZbpTKTHfDYOGITG .note-edge{stroke-dasharray:5}#mermaid-svg-XZbpTKTHfDYOGITG .statediagram-note rect{fill:#fff5ad;stroke:#aa3;stroke-width:1px;rx:0;ry:0}:root{--mermaid-font-family: '"trebuchet ms", verdana, arial';--mermaid-font-family: "Comic Sans MS", "Comic Sans", cursive}#mermaid-svg-XZbpTKTHfDYOGITG .error-icon{fill:#522}#mermaid-svg-XZbpTKTHfDYOGITG .error-text{fill:#522;stroke:#522}#mermaid-svg-XZbpTKTHfDYOGITG .edge-thickness-normal{stroke-width:2px}#mermaid-svg-XZbpTKTHfDYOGITG .edge-thickness-thick{stroke-width:3.5px}#mermaid-svg-XZbpTKTHfDYOGITG .edge-pattern-solid{stroke-dasharray:0}#mermaid-svg-XZbpTKTHfDYOGITG .edge-pattern-dashed{stroke-dasharray:3}#mermaid-svg-XZbpTKTHfDYOGITG .edge-pattern-dotted{stroke-dasharray:2}#mermaid-svg-XZbpTKTHfDYOGITG .marker{fill:#333}#mermaid-svg-XZbpTKTHfDYOGITG .marker.cross{stroke:#333}:root { --mermaid-font-family: "trebuchet ms", verdana, arial;}#mermaid-svg-XZbpTKTHfDYOGITG {color: rgba(0, 0, 0, 0.75);font: ;}服務器定期爬取網(wǎng)站的最新內(nèi)容轉(zhuǎn)成特定的格式
(zimfile)用戶在有網(wǎng)絡時
下載離線內(nèi)容用戶在無網(wǎng)時也可以
查看離線內(nèi)容
其中,zimfile 相關的代碼是通過 C++ 進行開發(fā)的,而爬蟲部分則是通過 python 實現(xiàn)的。
雖然看起來很簡單,但實際上有很多技術難點,例如我們在 Wikipeida 里搜索一個東西可以得到很快的響應,這得益于 ES 等工具軟件,但我們不可能在用戶的電腦上安裝一個 ES 吧,這樣做速度和數(shù)據(jù)體積都會變得很大。如果仔細查看 Kiwix 的 github 相關倉庫,會發(fā)現(xiàn)其相關工程非常多,是一個不折不扣的大型軟件項目!
作為一名程序員,我看到該項目的第一反應是這對于很多只能離線編程的程序員兄弟簡直是福音。(一些項目因為安全需求只能進行封閉開發(fā),無法連接互聯(lián)網(wǎng)!)
然而實際上該項目目標更加宏大:方便地向這個世界上無法使用網(wǎng)絡的地方傳播知識和文化!
無法使用網(wǎng)絡?這可真不是開玩笑。我們看下面這張圖,是 2017 年世界各地可以使用網(wǎng)絡的人口比例:
對于非洲地區(qū),很多地方由于基礎設施缺失無法使用網(wǎng)絡;其他一些地區(qū)由于政治原因,網(wǎng)絡被管控;還有一些地區(qū)網(wǎng)絡費用高昂,阻礙了大眾獲取知識。
而 Kiwix 項目甚至通過一個 U 盤就可以將思想進行傳播,我覺得應該點個贊。
2. ArchiveBox
ArchiveBox 本身是一個制作網(wǎng)頁 (站) 即時鏡像的工具,這點和 Kiwix 有異曲同工之妙。但是 ArchiveBox 更加通用與小巧一些,可以把你想靜態(tài)化的任何網(wǎng)站進行靜態(tài)化,包括文本、圖片、PDF 甚至視頻。
技術上來講,ArchiveBox 雖然技術品類比 Kiwix 多很多,用到了 wget、Chrome headless、youtube-dl、pywb、readability 等,但這些畢竟都是爬蟲常用的技術,感覺并不復雜。
實際操作了官方的 docker 鏡像后,發(fā)現(xiàn)其爬蟲功能做得比較完備,以后有時間可以深入研究一下(不清楚為什么一個簡單的網(wǎng)頁他會處理很久…)。
軟件截圖如下:
其中:
- Example Domain
- 示例網(wǎng)頁,展示效果較好 ?
- b 站:劉備斗舞謝廣坤
- 網(wǎng)頁圖片和視頻都無法正確處理 ?
- 標題識別有誤(因為軟件識別的是 meta 里的標題信息),也未提供修改功能 ?
- csdn:如何創(chuàng)作在頁面嵌入一個 “無法被下載” 的 PDF 文檔
- 頁面內(nèi)容可以較好保留 ?
- 標題和 b 站同樣的問題 ?
- 打開幾秒后發(fā)生了跳轉(zhuǎn)到 csdn 主頁的行為,說明 JS 未能處理好 ?
相較于普通的瀏覽器書簽,保存網(wǎng)站在瀏覽時的即時狀態(tài),可以很好地應對帖子被刪除,甚至網(wǎng)站關閉這些特殊情況。但該軟件總體來看 bug 較多,屬于一個半成品狀態(tài)。
3. 總結(jié)
本文介紹的兩個項目,均是基于爬蟲技術的比較有意義的項目。今后如果再遇到一些讓人眼前一亮的項目,會繼續(xù)和大家分享~
更多資料
- How to Fit All Human Knowledge in a Box
- 離線維基閱讀工具—— Kiwix(閱讀器)介紹
總結(jié)
以上是生活随笔為你收集整理的两个基于爬虫的项目: Kiwix ArchiveBox的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 第14章 Beta测试
- 下一篇: 【Codeforces 924C】Riv