當前位置：首頁 > 前端技术 > HTML >内容正文

HTML

jsoup解析HTML用法小结

發布時間：2023/12/3 HTML 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 jsoup解析HTML用法小结小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

轉載自? ?jsoup解析HTML用法小結

使用HttpClient+jsoup做采集器有一段時間了，發現jsoup很好用，而且還有很多方便的東西都沒怎么用上。于是想根據官網上的cookbook來對jsoup的使用做個小結，或者是歸納。按功能分類做個列表，方便在寫程序的時候快速翻閱。

從字符串解析String html = "<html><head><title>First parse</title></head>" + "<body><p>Parse HTML into a doc.</p></body></html>"; Document doc = Jsoup.parse(html);
從URL獲取并解析Document doc = Jsoup.connect("http://example.com/").get(); String title = doc.title();Document doc = Jsoup.connect("http://example.com") .data("query", "Java") .userAgent("Mozilla") .cookie("auth", "token") .timeout(3000) .post();
從文件解析File input = new File("/tmp/input.html"); Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

getElementById(String id)
getElementByTag(String tag)
getElementByClass(String className)
getElementByAttribute(String key)
siblingElements(),?firstElementSibling(),?lastElementSibling(),?nextElementSibling(),previousElementSibling()
parent(),?children(),?child(int index)

tagname: 搜索tag標簽的元素
ns|tag: 搜索命名空間內tag標簽的元素，如fb|name：<fb:name>
#id: 搜索有指定id的元素
.class: 搜索有指定class的元素
[attribute]: 搜索有attrribute屬性的元素
[^attri]: 搜索有以attri開頭的屬性的元素
[attr=value]: 搜索有指定屬性及其屬性值的元素
[attr^=value],?[attr$=value],?[attr*=value]: 搜索有指定attr屬性，且其屬性值是以value開頭、結尾或包括value的元素，如[href*=/path/]
[attr~=regex]: 搜索有指定attr屬性，且其屬性值符合regex正則表達式的元素
*: 搜索所有元素

el#id: 同時指定標簽名稱和id
el.class: 同時指定標簽名稱和class
el[attr]: 同時指定標簽名稱和及其中所含屬性的名稱
上述3項的任意組合，如a[href].highlight
ancestor child: 包含，如div.content p，即搜索<div class=”content”>下含有<p>標簽的元素
ancestor > child: 直接包含，如div.content > p，即搜索直屬<div class="content">節點下的<p>標簽元素；div.content > *，即搜索<div class="content">下的所有元素
siblingA + siblingB: 直接遍歷，如div.head + div，即搜索<div class="head"><div>的元素，其中不再包含子元素
siblingA ~ siblingX: 遍歷，如h1 ~ p，即<h1>下直接或間接有<p>的元素
el, el, el: 組合多個選擇器，搜索滿足其中一個選擇器的元素

Element.attr("href")?– 直接獲取URL
Element.attr("abs:href")或Element.absUrl("href")?– 獲取完整URL。如果HTML是從文件或字符串解析過來的，需要調用Jsoup.setBaseUri(String baseUri)來指定基URL，否則獲取的完整URL只會是空字符串

以上是生活随笔為你收集整理的jsoup解析HTML用法小结的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。