當前位置：首頁 > 编程语言 > python >内容正文

python

Python爬取某旅游网站中的中国城市信息

發布時間：2024/2/28 python 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python爬取某旅游网站中的中国城市信息小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

分析

這是目標網址
可以發現它是通過點擊下一頁來翻頁的，所以可以大概率判斷它每一頁的鏈接是有規律的，我們找出它的前兩頁的鏈接：

https://place.qyer.com/china/citylist-0-0-1/ https://place.qyer.com/china/citylist-0-0-2/

可以發現的確是有規律的，再找一個稍微后一點的頁面看看：

https://place.qyer.com/china/citylist-0-0-169/

這下確定無疑了，可以看到，它有171個頁面，鏈接中的數字也是從1開始一直到171，所以可以用一個for循環來提取每一頁的內容。
接下來就是分析如何提取一個頁中的內容了，我個人最拿手的是xpath，有些人使用的是BeautifulSoup也行。
可以在Chrome的開發者工具中明顯看到每一個城市對應一個li標簽，所以我先將所有的li標簽提取出來，提取結果是一個列表，列表中的每一個對象也是Selector對象，也就是說列表中的每一個li標簽還可以使用xpath方法提取該節點中的內容。
接下來就是寫好要提取的內容對應的xpath語句了，可以使用Full copy Xpath或在xpath helper插件中自己寫。

代碼編寫

下面是程序的完整代碼：

import requests # the library to initiate a request from fake_useragent import UserAgent # the library to make the request header import parsel # the library to parse HTML import csv # the library to writer csv filedef getdata(url):headers = {"user-Agent": UserAgent().chrome}response = requests.get(url=url, headers=headers)response.encoding = response.apparent_encodingselector = parsel.Selector(response.text)# extract all li tagslis = selector.xpath('//ul[@class="plcCitylist"]/li')for li in lis:city_names = li.xpath('./h3/a/text()').get()city_names = city_names.rstrip()number_people = li.xpath('./p[2]/text()').get()place_hot = li.xpath('./p[@class="pois"]/a/text()').getall()place_hot = [place.strip() for place in place_hot]place_hot = '、'.join(place_hot)place_url = li.xpath('./p[@class="pics"]/a/@href').get()img_url = li.xpath('./p[@class="pics"]/a/img/@src').get()print(city_names, number_people, place_url, img_url, place_hot, sep='|')with open('qiongyouDate.csv', mode='a', encoding='utf-8', newline='') as file_object:csv_write = csv.writer(file_object)csv_write.writerow([city_names, number_people, place_url, img_url, place_hot])def main():for i in range(1, 172):url = "https://place.qyer.com/china/citylist-0-0-{}/".format(str(i))getdata(url)if __name__ == '__main__':main()

運行結果

運行上面的代碼，爬取到的數據會打印在控制臺中，并且運行完成后會在程序目錄中生成一個名為qingyouDate.csv的csv文件，可以使用WPS或Excel將這個文件打開。

下面是運行截圖：

下面是生成的csv文件內容截屏：

爬取速度有一點慢。。。還請大家耐心等待

總結

以上是生活随笔為你收集整理的Python爬取某旅游网站中的中国城市信息的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： AMDk10.5内存控制：性能大揭秘，延
下一篇： websocket python爬虫_p