日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

爬虫系列---Scrapy框架学习

發布時間:2024/4/17 编程问答 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 爬虫系列---Scrapy框架学习 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

項目的需求需要爬蟲某網的商品信息,自己通過Requests,BeautifulSoup等編寫了一個spider,把抓取的數據存到數據庫里面。

?

跑起來的感覺速度有點慢,尤其是進入詳情頁面抓取信息的時候,小白入門,也不知道應該咋個整,反正就是跟著學嘛。

網上的爬蟲框架還是挺多的,現在打算學習spcrapy重新寫。

下面是記錄官方文檔的一些學習notes.

?

scrapy的環境是在anaconda里面搞得,所以子啊pycharm里面的 preject interpreter? 選擇anaconda下面的python.exe.

很多時候自己老是要忘記設置這個,會導致很多包都import不進來,,因為我很多包都是通過anaconda環境裝的。

?

?

下面是給的第一個測試例子

1 class QuotesSpider(scrapy.Spider): 2 name = "quotes" 3 start_urls = [ 4 'http://quotes.toscrape.com/tag/humor/', 5 ] 6 7 def parse(self, response): 8 for quote in response.css('div.quote'): 9 yield { 10 'text': quote.css('span.text::text').extract_first(), 11 'author': quote.xpath('span/small/text()').extract_first(), 12 } 13 14 next_page = response.css('li.next a::attr("href")').extract_first() 15 if next_page is not None: 16 yield response.follow(next_page, self.parse)

?

在anaconda 的prompt里面輸入命令

scrapy runspider quote_spider.py -o quote.json

注意要在文件所在的路徑下面哦

運行成功后,會生成一個quote.json的文件

[ {"text": "\u201cThe person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.\u201d", "author": "Jane Austen"}, {"text": "\u201cA day without sunshine is like, you know, night.\u201d", "author": "Steve Martin"}, {"text": "\u201cAnyone who thinks sitting in church can make you a Christian must also think that sitting in a garage can make you a car.\u201d", "author": "Garrison Keillor"}, {"text": "\u201cBeauty is in the eye of the beholder and it may be necessary from time to time to give a stupid or misinformed beholder a black eye.\u201d", "author": "Jim Henson"}, {"text": "\u201cAll you need is love. But a little chocolate now and then doesn't hurt.\u201d", "author": "Charles M. Schulz"}, {"text": "\u201cRemember, we're madly in love, so it's all right to kiss me anytime you feel like it.\u201d", "author": "Suzanne Collins"}, {"text": "\u201cSome people never go crazy. What truly horrible lives they must lead.\u201d", "author": "Charles Bukowski"}, {"text": "\u201cThe trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it.\u201d", "author": "Terry Pratchett"}, {"text": "\u201cThink left and think right and think low and think high. Oh, the thinks you can think up if only you try!\u201d", "author": "Dr. Seuss"}, {"text": "\u201cThe reason I talk to myself is because I\u2019m the only one whose answers I accept.\u201d", "author": "George Carlin"}, {"text": "\u201cI am free of all prejudice. I hate everyone equally. \u201d", "author": "W.C. Fields"}, {"text": "\u201cA lady's imagination is very rapid; it jumps from admiration to love, from love to matrimony in a moment.\u201d", "author": "Jane Austen"} ]

?

當你執行scrapy runspider quote_spider.py -o quote.json這條命令的時候,Scrapy會在這個文件里面去look for Spider的定義,找到后用scrapy的crawler engine運行。

?

通過向start_urls?屬性中定義的URL發送請求,并調用默認回調方法parse,將響應對象作為參數傳遞,從而開始爬網。在parse回調中,我們使用CSS Selector循環引用元素,產生一個帶有提取的引用文本和作者的Python字典,查找指向下一頁的鏈接,并使用與parse回調相同的方法安排另一個請求?

?

轉載于:https://www.cnblogs.com/taoHongFei/p/8694647.html

總結

以上是生活随笔為你收集整理的爬虫系列---Scrapy框架学习的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。