日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪(fǎng)問(wèn) 生活随笔!

生活随笔

當(dāng)前位置: 首頁(yè) > 编程语言 > python >内容正文

python

案例逐步演示python利用正则表达式提取指定内容并输出到csv

發(fā)布時(shí)間:2024/4/11 python 25 豆豆
生活随笔 收集整理的這篇文章主要介紹了 案例逐步演示python利用正则表达式提取指定内容并输出到csv 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

背景和目標(biāo)

這次我想要處理的是一個(gè)txt文件,里面的內(nèi)容是一臺(tái)機(jī)器定時(shí)ping另一臺(tái)機(jī)器的輸出結(jié)果,想要提取出的內(nèi)容是時(shí)間和rtt值,最后還要把結(jié)果輸出到csv文件。

1. 明確要提取的內(nèi)容,編寫(xiě)正則表達(dá)式

要提取的文本如下:

第一步是要編寫(xiě)正則表達(dá)式,此時(shí)可以先不要讀取數(shù)據(jù)文件。先復(fù)制一部分?jǐn)?shù)據(jù)到str中,方便測(cè)試。
編寫(xiě)正則表達(dá)式用到了re模塊,因?yàn)槊總€(gè)人要處理的文本是不一樣的,所以需要自己去學(xué)習(xí)基本的使用方法。re具體使用方法可以參考這篇文章:
https://zhuanlan.zhihu.com/p/139596371

關(guān)鍵就是弄清楚.*?和{}的作用,還有re.S可以匹配到換行符,就可以比較容易地寫(xiě)出正確的表達(dá)式。

import re # 為了方便測(cè)試,我把一部分文本先放到str里 str=''' 2022-03-11 15:21:48 1 PING 81.71.51.181 (81.71.51.181) 56(84) bytes of data. 64 bytes from 81.71.51.181: icmp_seq=1 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=2 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=3 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=4 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=5 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=6 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=7 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=8 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=9 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=10 ttl=45 time=253 ms--- 81.71.51.181 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9000ms rtt min/avg/max/mdev = 250.203/250.563/253.202/0.961 ms 2022-03-11 15:22:40 2 PING 81.71.51.181 (81.71.51.181) 56(84) bytes of data. 64 bytes from 81.71.51.181: icmp_seq=1 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=2 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=3 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=4 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=5 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=6 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=7 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=8 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=9 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=10 ttl=45 time=250 ms--- 81.71.51.181 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9009ms rtt min/avg/max/mdev = 250.181/250.256/250.434/0.636 ms 2022-03-11 15:23:44 3 PING 81.71.51.181 (81.71.51.181) 56(84) bytes of data. 64 bytes from 81.71.51.181: icmp_seq=1 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=2 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=3 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=4 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=5 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=6 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=7 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=8 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=9 ttl=45 time=250 ms 64 bytes from 81.71.51.181: icmp_seq=10 ttl=45 time=250 ms--- 81.71.51.181 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9009ms rtt min/avg/max/mdev = 250.209/250.320/250.658/0.563 ms '''# print(re.findall(r'(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2})', str)) # 提取時(shí)間 # print(re.findall(r'mdev = (.*?) ms', str)) # 提取rttprint(re.findall(r'(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}).*?mdev = (.*?) ms', data, re.S)) # 提取時(shí)間和rtt 包括換行

輸出:

D:\python37\python.exe D:/test/data_process.py ['2022-03-11 15:21', '2022-03-11 15:22', '2022-03-11 15:23'] ['250.203/250.563/253.202/0.961', '250.181/250.256/250.434/0.636', '250.209/250.320/250.658/0.563'] [('2022-03-11 15:21', '250.203/250.563/253.202/0.961'), ('2022-03-11 15:22', '250.181/250.256/250.434/0.636'), ('2022-03-11 15:23', '250.209/250.320/250.658/0.563')]Process finished with exit code 0

2. 從文件中讀入數(shù)據(jù)

編寫(xiě)出正確的正則表達(dá)式后,就可以從文件中讀數(shù)據(jù)了

import re # 讀取文件 with open("ping/ping_flkf_gz.txt","r") as input_file:str = input_file.read()print(re.findall(r'(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}).*?mdev = (.*?) ms', str, re.S)) # 提取時(shí)間和延遲 包括換行input_file.close() # 關(guān)閉文件

輸出比較多,截取一部分展示:

D:\python37\python.exe D:/test/data_process.py [('2022-03-11 15:21', '250.203/250.563/253.202/0.961'), ('2022-03-11 15:22', '250.181/250.256/250.434/0.636'), ('2022-03-11 15:23', '250.209/250.320/250.658/0.563'), ('2022-03-11 15:25', '250.183/250.240/250.275/0.225'), ('2022-03-11 15:26', '250.217/250.240/250.300/0.592'), ('2022-03-11 15:27', '250.166/250.362/250.956/0.683'), ('2022-03-11 15:28', '250.186/250.256/250.343/0.319'), ('2022-03-11 15:29', '250.181/250.435/252.077/0.776'), ('2022-03-11 15:30', '250.177/250.249/250.401/0.673'), ('2022-03-11 15:31', '250.210/250.436/251.498/0.376'), ('2022-03-11 15:32', '250.207/250.280/250.588/0.401'), ('2022-03-11 15:33', '250.237/250.336/250.747/0.568'), ('2022-03-11 15:34', '250.217/250.283/250.437/0.675'), ('2022-03-11 15:35', '250.254/250.456/251.092/0.623'), ('2022-03-11 15:36', '250.167/250.236/250.308/0.226'), ('2022-03-11 15:37', '250.162/250.399/251.032/0.667'), ('2022-03-11 15:38', '250.207/250.261/250.406/0.053'), ('2022-03-11 15:39', '250.219/250.657/252.056/0.878')]

這里其實(shí)是一個(gè)列表,里面的每個(gè)元組是我提取出來(lái)的時(shí)間和rtt。

3. 寫(xiě)入csv

能夠正確讀取輸入文件并提取數(shù)據(jù)后,下一步就是要把結(jié)果寫(xiě)入csv文件,所以用到了csv模塊。
for循環(huán)遍歷列表,使用csv_writer.writerow一行行寫(xiě)入csv文件。

import re import csv# 讀取文件 with open("ping/ping_flkf_gz.txt", "r") as input_file:str = input_file.read()# 用一個(gè)列表接收提取出來(lái)的數(shù)據(jù) list = re.findall(r'(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}).*?mdev = (.*?) ms', str, re.S) # 1.創(chuàng)建文件對(duì)象 output_file = open('res/ping/ping_flkf_gz.csv', 'w', encoding='utf-8', newline='') # 2.基于文件對(duì)象構(gòu)建csv寫(xiě)入對(duì)象 csv_writer = csv.writer(output_file) # 3. 寫(xiě)入表頭 csv_writer.writerow(["time", "latency"]) # 4.遍歷列表,寫(xiě)入csv文件 for i in list:csv_writer.writerow([i[0], i[1]])input_file.close() # 關(guān)閉文件 output_file.close() # 關(guān)閉文件

結(jié)果就寫(xiě)入到csv文件中了

time,latency 2022-03-11 15:21,250.203/250.563/253.202/0.961 2022-03-11 15:22,250.181/250.256/250.434/0.636 2022-03-11 15:23,250.209/250.320/250.658/0.563 2022-03-11 15:25,250.183/250.240/250.275/0.225 2022-03-11 15:26,250.217/250.240/250.300/0.592 2022-03-11 15:27,250.166/250.362/250.956/0.683 2022-03-11 15:28,250.186/250.256/250.343/0.319 2022-03-11 15:29,250.181/250.435/252.077/0.776 2022-03-11 15:30,250.177/250.249/250.401/0.673 2022-03-11 15:31,250.210/250.436/251.498/0.376

4. 還可以把每個(gè)數(shù)值分開(kāi)存放

發(fā)現(xiàn)此時(shí)latency那一列是這樣的250.203/250.563/253.202/0.961
為了后面方便處理,把每個(gè)數(shù)值單獨(dú)作為一列,因此要修改正則表達(dá)式

import re import csv# 讀取文件 with open("ping/ping_flkf_gz.txt", "r") as input_file:str = input_file.read()# 用一個(gè)列表接收提取出來(lái)的數(shù)據(jù) 這里修改了正則表達(dá)式,使得每個(gè)數(shù)值單獨(dú)作為一列 list = re.findall(r'(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}).*?mdev = (.*?)/(.*?)/(.*?)/(.*?) ms', str, re.S)# 1.創(chuàng)建文件對(duì)象 output_file = open('res/ping/ping_flkf_gz.csv', 'w', encoding='utf-8', newline='') # 2.基于文件對(duì)象構(gòu)建csv寫(xiě)入對(duì)象 csv_writer = csv.writer(output_file) # 3. 寫(xiě)入表頭 csv_writer.writerow(["time", "min", "avg", "max", "mdev"]) # 4.遍歷列表,寫(xiě)入csv文件 for i in list:csv_writer.writerow([i[0], i[1], i[2], i[3], i[4]])input_file.close() # 關(guān)閉文件 output_file.close() # 關(guān)閉文件

輸出到csv文件的效果:

至此就完成了~

總結(jié)

以上是生活随笔為你收集整理的案例逐步演示python利用正则表达式提取指定内容并输出到csv的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。