日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

对内涵段子正则的提取

發布時間:2024/3/24 编程问答 41 豆豆
生活随笔 收集整理的這篇文章主要介紹了 对内涵段子正则的提取 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

內涵段子正則爬取:

""" 內涵段子爬蟲 https://www.neihan8.com/article/index.html""" from urllib import request,parse from urllib import error import chardet from lxml import etree import csv,string,re import csv def neihanba(url,beginPage, endPage):for page in range(beginPage, endPage):pn = pageif pn <= 1:fullurl = url + "index.html"else:fullurl = url + "index_%s"%pn + ".html"headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"}req = request.Request(fullurl, headers=headers)try:response = request.urlopen(req)resHtml = response.read()resHtml = resHtml.decode("utf-8", 'ignore')# 笑話標題title = r'<h3><a .*?>(.*?)</a></h3>'title_pattern = re.compile(title,re.I | re.S | re.M)joketitle = title_pattern.findall(resHtml)# 笑話內容content = r'<div class="desc">.*?(.*?)</div>'content_pattern = re.compile(content, re.I | re.S | re.M)jokecontent = content_pattern.findall(resHtml)for m in range(1,len(jokecontent)):k = jokecontent[m]filename = './data1/neihanba' + '.csv'with open(filename, 'a', encoding='utf-8') as file:wr = csv.writer(file)wr.writerow([joketitle,jokecontent])# 笑話urljokeurl = r'<h3><a href="(.*?)" .*?>.*?</a></h3>'url_patter = re.compile(jokeurl, re.I | re.S | re.M)jurl = url_patter.findall(resHtml)for i in jurl:jokefullurl = "https://www.neihan8.com" + iresponse = request.urlopen(jokefullurl)resHtml = response.read()resHtml = resHtml.decode("utf-8", 'ignore')# 笑話標題jokecontitle = r'<h1 class="title">(.*?)</h1>'jokecontitle_pattern = re.compile(jokecontitle, re.I | re.S | re.M)jokecontitle_content = jokecontitle_pattern.findall(resHtml)for a in jokecontitle_content:joke_content_title = a# 笑話內容jokecontent1 = r'<p>(.*?)</p>'joke_pattern = re.compile(jokecontent1, re.I | re.S | re.M)joke_content = joke_pattern.findall(resHtml)for s in range(len(joke_content)-2):openjoke_content = joke_content[s]filename = './data1/neihanba1' + '.csv'with open(filename, 'a', encoding='utf-8') as file:wr = csv.writer(file)wr.writerow([openjoke_content])except error.URLError as e:print(e)if __name__ == "__main__":proxy = {"http": "118.31.220.3:8080"}proxy_support = request.ProxyHandler(proxy)opener = request.build_opener(proxy_support)request.install_opener(opener)beginPage = int(input("請輸入起始頁:"))endPage = int(input("請輸入終止頁:"))url = "https://www.neihan8.com/article/"neihanba(url, beginPage, endPage)

?

總結

以上是生活随笔為你收集整理的对内涵段子正则的提取的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。