當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

【钉钉机器人 + 爬虫 + celery】定时发送微博热搜 + 定时发布财经新闻

發布時間：2023/12/10 编程问答 36 豆豆

生活随笔收集整理的這篇文章主要介紹了【钉钉机器人 + 爬虫 + celery】定时发送微博热搜 + 定时发布财经新闻小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

咱們這次主要詳細講解如何利用釘釘機器人進行定時發送爬蟲的內容

定時自動爬蟲釘釘端展示

主要工具

python3
爬蟲：beautifulsoup解析庫
定時任務：celery框架
釘釘機器人：對應的文檔接口數據格式使用，以及post的規范等
服務器：阿里云兩核2Gb學生專用服務器

前置知識

簡單的python爬蟲知識：請求頭，bs4使用，獲取對應元素
python文件讀寫
celery框架運行邏輯
釘釘機器人接口，需要先注冊釘釘企業，并開啟機器人
可參考我之前的博客：釘釘機器人1.0

參考鏈接

celery開發參考
釘釘機器人文檔
可參考的釘釘機器人創建

項目目錄

外層的init

主要初始化，定義celery的實例以及從config中加載配置

# celery_app will load these info atomatically from celery import Celery# strict name format app = Celery('dingding') # create an instance app.config_from_object('celery_app.celeryconfig') # load the config

外部的celeryconfig

主要配置redis數據庫作為broken核backend
同時導入模塊并供beat任務定時調度

from datetime import timedelta from celery.schedules import crontabBROKER_URL = 'redis://127.0.0.1:6379' CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/0'CELERY_TIMEZONE = 'Asia/Shanghai'# task modules to import # god damn 'S'!!!!! CELERY_IMPORTS = ('celery_app.crontabTasks.task_weiboTop10','celery_app.crontabTasks.task_jinseNews' )# schedules(beat) CELERYBEAT_SCHEDULE = {'weiboTop10-every-1-hour': {'task': 'task_weiboTop10.get_news','schedule': timedelta(seconds = 3600),},'jinseNew-every-1-hour': {'task': 'task_jinseNews.get_news','schedule': timedelta(seconds = 3600),}}

外部common

定義了釘釘機器人post需要的必要信息

import time import hmac import hashlib import base64# “自動回復”機器人 def get_sign_1():# 當前時間戳timestamp = int(round(time.time() * 1000))# 密文app_secret = '您的secert'# 編碼app_secret_enc = app_secret.encode('utf-8')# 時間戳 + 密文string_to_sign = '{}\n{}'.format(timestamp, app_secret)# （時間戳 + 密文）編碼string_to_sign_enc = string_to_sign.encode('utf-8')# 哈希摘要hmac_code = hmac.new(app_secret_enc, string_to_sign_enc, digestmod=hashlib.sha256).digest()# base64簽名sign = base64.b64encode(hmac_code).decode('utf-8')# 返回時間戳和簽名return timestamp, sign

內部weibotop10

用于爬蟲抓取，并將抓取的信息通過server返回給robot并顯示在聊天群中

import requests import json import timefrom celery_app import app from celery_app.common import get_sign_1def ding_mesage(news):# 釘釘header 加入時間戳和簽名信息ding_header = {"Content-Type": "application/json; charset=utf-8", 'timestamp': str((get_sign_1())[0]),'sign': str((get_sign_1())[1])}# 自己的oapi robot url，對應特定的tokending_url = '您的url'mes = {"msgtype": "markdown","markdown": {'title': "#### 微博十大熱門話題 \n\n","text": "## 微博十大熱門話題 \n\n",}}# 加一個時間提示mes['markdown']['text'] += time.ctime() + " \n\n"url_specific = '(https://s.weibo.com/weibo?q=%23{}%23)'# 信息linkfor new in news:new = ''.join(new.split())mes['markdown']['text'] += '[' + new + ']' + url_specific.format(new) + " \n\n"res = requests.post(ding_url, data=json.dumps(mes), headers=ding_header)# 要對下面這個函數定時出發，一小時一次吧 @app.task(name = 'task_weiboTop10.get_news') def get_news():global latest_newsspider_header = {'referer': 'https://weibo.com/','user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36',}url = 'https://weibo.com/ajax/side/hotSearch'res = requests.get(url=url, headers=spider_header)news = []for i in range(10):news.append(res.json()['data']['realtime'][i]['note'])ding_mesage(news)

內部的jinseNews

獲取金色財經新聞，通過txt記錄當前id，保證每次獲取的新聞不重復
通過爬蟲 + post自動顯示在聊天群中

import requests import jsonfrom celery_app import app from celery_app.common import get_sign_1def ding_mesage(news):ding_header = {"Content-Type": "application/json; charset=utf-8", 'timestamp': str((get_sign_1())[0]),'sign': str((get_sign_1())[1])}ding_url = '您的url'mes = {"msgtype": "actionCard","actionCard": {"title": news['title'],'text': "#### {} \n\n {} \n\n來源：金色財經".format(news['title'], news['content']),"singleTitle": "閱讀全文","singleURL": news['url']}}res = requests.post(ding_url, data=json.dumps(mes), headers=ding_header)# 要對下面這個函數定時出發，一小時一次吧 @app.task(name = 'task_jinseNews.get_news') def get_news():with open('celery_app/crontabTasks/resource_jinse.txt', 'r') as file:latest_news = int(file.read())spider_header = {'referer': 'https://www.jinse.com/','user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36'}res = requests.get(url='https://api.jinse.cn/noah/v2/lives?limit=20&reading=false&source=web&flag=up&id=&category=1'.format(latest_news), headers=spider_header)for news in res.json()['list'][0]['lives']:if news['id'] == latest_news:breaknews_data = {}news_data['title'] = news['content_prefix']news_data['content'] = news['content'].split('】')[-1]news_data['url'] = 'https://www.jinse.com/lives/{}.html'.format(news['id'])ding_mesage(news_data)latest_news = res.json()['list'][0]['lives'][0]['id']with open('celery_app/crontabTasks/resource_jinse.txt', 'w') as file_w:file_w.write(str(latest_news))

最后

通過拉起celery服務部署到server上即可完成

總結

利用server的一次小實驗
定時任務 + 爬蟲 + 釘釘機器人api調用小demo

總結

以上是生活随笔為你收集整理的【钉钉机器人 + 爬虫 + celery】定时发送微博热搜 + 定时发布财经新闻的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：云计算第二阶段shell脚本
下一篇： ssh-keygen -t rsa执行后