python爬虫lxml xpath测试
生活随笔
收集整理的這篇文章主要介紹了
python爬虫lxml xpath测试
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
xpath測試1:
main.py
xpath測試2:
test.html
main.py
"""=== coding: UTF8 ===""" from lxml import etree""" ======================================== 主函數(shù)功能測試 ======================================== """ if __name__ == '__main__':parser = etree.HTMLParser(encoding='utf-8')tree = etree.parse("test.html", parser=parser)# result = tree.xpath("/html") # /表示層級關系,第一個/是根節(jié)點# result = tree.xpath("/html/body/ul/li/a/text()") # text()拿文本# result = tree.xpath("/html/body/ul/li[1]/a/text()") # xpath的順序是從1開始數(shù)的,[]表示索引# result = tree.xpath("/html/body/ol/li/a[@href='dapao']/text()") # @xxx=xxx表示屬性的篩選# print(result)ol_li_list = tree.xpath("/html/body/ol/li")for li in ol_li_list:# 從每一個li中提取到文字信息result = li.xpath("./a/text()") # 在li中繼續(xù)查找,相對查找print(result)result = li.xpath("./a/@href") # 拿到屬性值: @屬性print(result)print(tree.xpath("/html/body/ul/li/a/@href"))print(tree.xpath("/html/body/div[1]/text()"))print(tree.xpath("/html/body/ol/li/a/text()"))關注公眾號,獲取更多資料
總結
以上是生活随笔為你收集整理的python爬虫lxml xpath测试的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python脚本去除文件名里的空格
- 下一篇: CTF【解密】字符串flag被加密成已知