日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 前端技术 > javascript >内容正文

javascript

JSOUP初探

發(fā)布時間:2025/3/16 javascript 23 豆豆
生活随笔 收集整理的這篇文章主要介紹了 JSOUP初探 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

JSOUP是偶然看到的一個處理HTML的JAVA 類庫,其官方網(wǎng)址是:http://jsoup.org/

1、編寫相關的試用程序(只需要在工程中引用jsoup-1.3.3.jar即可):

[java] view plaincopyprint?
  • import?java.io.File;??
  • import?java.io.IOException;??
  • ??
  • import?org.jsoup.Jsoup;??
  • import?org.jsoup.nodes.Document;??
  • import?org.jsoup.select.Elements;??
  • ??
  • public?class?Test?{??
  • ????public?static?void?main(String[]?args)?{??
  • ????????Test?t?=?new?Test();??
  • ????????t.parseFile();??
  • ????}??
  • ??
  • ????public?void?parseString()?{??
  • ????????String?html?=?"<html><head><title>blog</title></head><body?οnlοad='test()'><p>Parsed?HTML?into?a?doc.</p></body></html>";??
  • ????????Document?doc?=?Jsoup.parse(html);??
  • ????????System.out.println(doc);??
  • ????????Elements?es?=?doc.body().getAllElements();??
  • ????????System.out.println(es.attr("onload"));??
  • ????????System.out.println(es.select("p"));??
  • ????}??
  • ??
  • ????public?void?parseUrl()?{??
  • ????????try?{??
  • ????????????Document?doc?=?Jsoup.connect("http://www.baidu.com/").get();??
  • ????????????Elements?hrefs?=?doc.select("a[href]");??
  • ????????????System.out.println(hrefs);??
  • ????????????System.out.println("------------------");??
  • ????????????System.out.println(hrefs.select("[href^=http]"));??
  • ????????}?catch?(IOException?e)?{??
  • ????????????e.printStackTrace();??
  • ????????}??
  • ????}??
  • ??
  • ????public?void?parseFile()?{??
  • ????????try?{??
  • ????????????File?input?=?new?File("input.html");??
  • ????????????Document?doc?=?Jsoup.parse(input,?"UTF-8");??
  • ????????????//?提取出所有的編號 ??
  • ????????????Elements?codes?=?doc.body().select("td[title^=IA]?>?a[href^=javascript:view]");??
  • ????????????System.out.println(codes);??
  • ????????????System.out.println("------------------");??
  • ????????????System.out.println(codes.html());??
  • ????????}?catch?(IOException?e)?{??
  • ????????????e.printStackTrace();??
  • ????????}??
  • ????}??
  • }??
  • import java.io.File; import java.io.IOException;import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.select.Elements;public class Test {public static void main(String[] args) {Test t = new Test();t.parseFile();}public void parseString() {String html = "<html><head><title>blog</title></head><body οnlοad='test()'><p>Parsed HTML into a doc.</p></body></html>";Document doc = Jsoup.parse(html);System.out.println(doc);Elements es = doc.body().getAllElements();System.out.println(es.attr("onload"));System.out.println(es.select("p"));}public void parseUrl() {try {Document doc = Jsoup.connect("http://www.baidu.com/").get();Elements hrefs = doc.select("a[href]");System.out.println(hrefs);System.out.println("------------------");System.out.println(hrefs.select("[href^=http]"));} catch (IOException e) {e.printStackTrace();}}public void parseFile() {try {File input = new File("input.html");Document doc = Jsoup.parse(input, "UTF-8");// 提取出所有的編號Elements codes = doc.body().select("td[title^=IA] > a[href^=javascript:view]");System.out.println(codes);System.out.println("------------------");System.out.println(codes.html());} catch (IOException e) {e.printStackTrace();}} }


    ?

    2、parseString的輸出:

    [java] view plaincopyprint?
  • <html>??
  • ?<head>??
  • ??<title>blog</title>??
  • ?</head>??
  • ?<body?οnlοad="test()">??
  • ??<p>Parsed?HTML?into?a?doc.</p>??
  • ?</body>??
  • </html>??
  • test()??
  • ??
  • <p>Parsed?HTML?into?a?doc.</p>??
  • <html><head><title>blog</title></head><body οnlοad="test()"><p>Parsed HTML into a doc.</p></body> </html> test()<p>Parsed HTML into a doc.</p>


    ?

    3、parseUrl的輸出:

    [java] view plaincopyprint?
  • <a?href="/gaoji/preferences.html">設置</a>??
  • <a?href="http://passport.baidu.com/?login&tpl=mn">登錄</a>??
  • <a?href="http://news.baidu.com">新?聞</a>??
  • <a?href="http://tieba.baidu.com">貼?吧</a>??
  • <a?href="http://zhidao.baidu.com">知?道</a>??
  • <a?href="http://mp3.baidu.com">MP3</a>??
  • <a?href="http://image.baidu.com">圖?片</a>??
  • <a?href="http://video.baidu.com">視?頻</a>??
  • <a?href="http://map.baidu.com">地?圖</a>??
  • ??
  • <a?href="#"?name="ime_hw">手寫</a>??
  • ??
  • <a?href="#"?name="ime_py">拼音</a>??
  • ??
  • <a?href="#"?name="ime_cl">關閉</a>??
  • <a?href="http://hi.baidu.com">空間</a>??
  • <a?href="http://baike.baidu.com">百科</a>??
  • <a?href="http://www.hao123.com">hao123</a>??
  • <a?href="/more/">更多>></a>??
  • <a?id="st"?οnclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')"?href="http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com">把百度設為主頁</a>??
  • <a?href="http://e.baidu.com/?refer=888">加入百度推廣</a>??
  • <a?href="http://top.baidu.com">搜索風云榜</a>??
  • <a?href="http://home.baidu.com">關于百度</a>??
  • <a?href="http://ir.baidu.com">About?Baidu</a>??
  • <a?href="/duty/">使用百度前必讀</a>??
  • <a?href="http://www.miibeian.gov.cn"?target="_blank">京ICP證030173號</a>??
  • ------------------??
  • <a?href="http://passport.baidu.com/?login&tpl=mn">登錄</a>??
  • <a?href="http://news.baidu.com">新?聞</a>??
  • <a?href="http://tieba.baidu.com">貼?吧</a>??
  • <a?href="http://zhidao.baidu.com">知?道</a>??
  • <a?href="http://mp3.baidu.com">MP3</a>??
  • <a?href="http://image.baidu.com">圖?片</a>??
  • <a?href="http://video.baidu.com">視?頻</a>??
  • <a?href="http://map.baidu.com">地?圖</a>??
  • <a?href="http://hi.baidu.com">空間</a>??
  • <a?href="http://baike.baidu.com">百科</a>??
  • <a?href="http://www.hao123.com">hao123</a>??
  • <a?id="st"?οnclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')"?href="http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com">把百度設為主頁</a>??
  • <a?href="http://e.baidu.com/?refer=888">加入百度推廣</a>??
  • <a?href="http://top.baidu.com">搜索風云榜</a>??
  • <a?href="http://home.baidu.com">關于百度</a>??
  • <a?href="http://ir.baidu.com">About?Baidu</a>??
  • <a?href="http://www.miibeian.gov.cn"?target="_blank">京ICP證030173號</a>??
  • <a href="/gaoji/preferences.html">設置</a> <a href="http://passport.baidu.com/?login&tpl=mn">登錄</a> <a href="http://news.baidu.com">新?聞</a> <a href="http://tieba.baidu.com">貼?吧</a> <a href="http://zhidao.baidu.com">知?道</a> <a href="http://mp3.baidu.com">MP3</a> <a href="http://image.baidu.com">圖?片</a> <a href="http://video.baidu.com">視?頻</a> <a href="http://map.baidu.com">地?圖</a><a href="#" name="ime_hw">手寫</a><a href="#" name="ime_py">拼音</a><a href="#" name="ime_cl">關閉</a> <a href="http://hi.baidu.com">空間</a> <a href="http://baike.baidu.com">百科</a> <a href="http://www.hao123.com">hao123</a> <a href="/more/">更多>></a> <a id="st" οnclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')" href="http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com">把百度設為主頁</a> <a href="http://e.baidu.com/?refer=888">加入百度推廣</a> <a href="http://top.baidu.com">搜索風云榜</a> <a href="http://home.baidu.com">關于百度</a> <a href="http://ir.baidu.com">About Baidu</a> <a href="/duty/">使用百度前必讀</a> <a href="http://www.miibeian.gov.cn" target="_blank">京ICP證030173號</a> ------------------ <a href="http://passport.baidu.com/?login&tpl=mn">登錄</a> <a href="http://news.baidu.com">新?聞</a> <a href="http://tieba.baidu.com">貼?吧</a> <a href="http://zhidao.baidu.com">知?道</a> <a href="http://mp3.baidu.com">MP3</a> <a href="http://image.baidu.com">圖?片</a> <a href="http://video.baidu.com">視?頻</a> <a href="http://map.baidu.com">地?圖</a> <a href="http://hi.baidu.com">空間</a> <a href="http://baike.baidu.com">百科</a> <a href="http://www.hao123.com">hao123</a> <a id="st" οnclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.baidu.com')" href="http://utility.baidu.com/traf/click.php?id=215&url=http://www.baidu.com">把百度設為主頁</a> <a href="http://e.baidu.com/?refer=888">加入百度推廣</a> <a href="http://top.baidu.com">搜索風云榜</a> <a href="http://home.baidu.com">關于百度</a> <a href="http://ir.baidu.com">About Baidu</a> <a href="http://www.miibeian.gov.cn" target="_blank">京ICP證030173號</a>


    ?

    3、parseFile的輸出:

    [java] view plaincopyprint?
  • <a?href="javascript:view('67530','67530','0');">IA100908-002</a>??
  • ??
  • <a?href="javascript:view('67529','67529','0');">IA100908-001</a>??
  • ??
  • <a?href="javascript:view('67544','67544','0');">IA100908-016</a>??
  • ??
  • <a?href="javascript:view('67364','67364','0');">IA100903-008</a>??
  • ??
  • <a?href="javascript:view('67363','67363','0');">IA100903-007</a>??
  • ??
  • <a?href="javascript:view('66104','66104','0');">IA100710-013</a>??
  • ??
  • <a?href="javascript:view('57916','57916','0');">IA100515-013</a>??
  • ??
  • <a?href="javascript:view('56962','56962','0');">IA100430-022</a>??
  • ??
  • <a?href="javascript:view('66958','66958','0');">IA100830-001</a>??
  • ??
  • <a?href="javascript:view('66319','66319','0');">IA100713-003</a>??
  • ??
  • <a?href="javascript:view('66317','66317','0');">IA100713-001</a>??
  • ??
  • <a?href="javascript:view('66321','66321','0');">IA100713-005</a>??
  • ??
  • <a?href="javascript:view('66967','66967','0');">IA100830-010</a>??
  • ??
  • <a?href="javascript:view('66999','66999','0');">IA100831-001</a>??
  • ??
  • <a?href="javascript:view('67377','67377','0');">IA100904-004</a>??
  • ??
  • <a?href="javascript:view('67378','67378','0');">IA100904-005</a>??
  • ??
  • <a?href="javascript:view('3271','3271','0');">IA080115-031</a>??
  • ------------------??
  • IA100908-002??
  • IA100908-001??
  • IA100908-016??
  • IA100903-008??
  • IA100903-007??
  • IA100710-013??
  • IA100515-013??
  • IA100430-022??
  • IA100830-001??
  • IA100713-003??
  • IA100713-001??
  • IA100713-005??
  • IA100830-010??
  • IA100831-001??
  • IA100904-004??
  • IA100904-005??
  • IA080115-031??
  • <a href="javascript:view('67530','67530','0');">IA100908-002</a><a href="javascript:view('67529','67529','0');">IA100908-001</a><a href="javascript:view('67544','67544','0');">IA100908-016</a><a href="javascript:view('67364','67364','0');">IA100903-008</a><a href="javascript:view('67363','67363','0');">IA100903-007</a><a href="javascript:view('66104','66104','0');">IA100710-013</a><a href="javascript:view('57916','57916','0');">IA100515-013</a><a href="javascript:view('56962','56962','0');">IA100430-022</a><a href="javascript:view('66958','66958','0');">IA100830-001</a><a href="javascript:view('66319','66319','0');">IA100713-003</a><a href="javascript:view('66317','66317','0');">IA100713-001</a><a href="javascript:view('66321','66321','0');">IA100713-005</a><a href="javascript:view('66967','66967','0');">IA100830-010</a><a href="javascript:view('66999','66999','0');">IA100831-001</a><a href="javascript:view('67377','67377','0');">IA100904-004</a><a href="javascript:view('67378','67378','0');">IA100904-005</a><a href="javascript:view('3271','3271','0');">IA080115-031</a> ------------------ IA100908-002 IA100908-001 IA100908-016 IA100903-008 IA100903-007 IA100710-013 IA100515-013 IA100430-022 IA100830-001 IA100713-003 IA100713-001 IA100713-005 IA100830-010 IA100831-001 IA100904-004 IA100904-005 IA080115-031


    補充下,input.html的基本結果如圖:

    總結

    以上是生活随笔為你收集整理的JSOUP初探的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。

    如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。