當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

python字符串与文本处理技巧(4): 格式化输出、令牌解析、串上串

發(fā)布時間：2025/3/15 python 15 豆豆

生活随笔收集整理的這篇文章主要介紹了 python字符串与文本处理技巧(4): 格式化输出、令牌解析、串上串小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1. 以指定列寬格式化字符串

很多情況下，我們有一些長字符串，想以指定的列寬將它們重新格式化。

textwarp()

import textwrap import oss = "Look into my eyes, look into my eyes, \ the eyes, the eyes, the eyes, not around the eyes, don't look \ around the eyes, look into my eyes, you're under." length = 50 # os.get_terminal_size().columns; print(textwrap.fill(s, length)) # >>> Look into my eyes, look into my eyes, the eyes, # the eyes, the eyes, not around the eyes, don't # look around the eyes, look into my eyes, you're # under.print(textwrap.fill(s, length, initial_indent = ' ')) # >>> Look into my eyes, look into my eyes, the # eyes, the eyes, the eyes, not around the eyes, # don't look around the eyes, look into my eyes, # you're under.print(textwrap.fill(s, length, subsequent_indent=' ')) # >>> Look into my eyes, look into my eyes, the eyes, # the eyes, the eyes, not around the eyes, don't # look around the eyes, look into my eyes, # you're under.

textwrap 模塊對于字符串打印是非常有用的，特別是輸出自動匹配終端大小的時候。可以使用 os.get_terminal_size() 方法來獲取終端的大小尺寸

2. 字符串令牌解析

當(dāng)有一個字符串，我們需要從左至右將其解析為一個令牌流。

為了令牌化字符串，我們不僅需要匹配模式，還得指定模式的類型。比如，可能想將字符串轉(zhuǎn)換為序列對。

為了執(zhí)行序列對的切分，第一步就是利用命名捕獲組的正則表達(dá)式來定義所有可能的令牌，包括空格。

import retext = 'foo = 23 + 42 * 10' NAME = r'(?P<NAME>[a-zA-Z_][a-zA-Z_0-9]*)' NUM = r'(?P<NUM>\d+)' PLUS = r'(?P<PLUS>\+)' TIMES = r'(?P<TIMES>\*)' EQ = r'(?P<EQ>=)' WS = r'(?P<WS>\s+)' master_pat = re.compile('|'.join([NAME, NUM, PLUS, TIMES, EQ, WS])) scanner = master_pat.scanner(text) res = scanner.match() print( res.lastgroup, res.group()) # >>> NAME foo

在上面的模式中， ?P<TOKENNAME> 用于給一個模式命名，供后面使用。

為了令牌化，使用模式對象 scanner() 方法。這個方法會創(chuàng)建一個 scanner 對象，在這個對象上不斷的調(diào)用 match() 方法會一步步的掃描目標(biāo)文本，每步一個匹配。

實(shí)際使用這種技術(shù)的時候，可以很容易將上述代碼打包到一個生成器中。

3. 字節(jié)字符串上的字符串操作

如果想在字節(jié)字符串上執(zhí)行文本操作(比如移除，搜索和替換)？字節(jié)字符串同樣也支持大部分和文本字符串一樣的內(nèi)置操作。

data = b'Hello World' print( data[0:5] ) # >>> b'Hello' print( data.startswith(b'Hello') ) # >>> True print( data.split() ) # >>> [b'Hello', b'World'] print( data.replace(b'Hello', b'Hello Cruel') ) # >>> b'Hello Cruel World'

這些操作同樣也適用于字節(jié)數(shù)組。比如：

data = bytearray(b'Hello World') print( data[0:5] ) # >>> bytearray(b'Hello') print( data.startswith(b'Hello') ) # >>> True print( data.split() ) # >>> [bytearray(b'Hello'), bytearray(b'World')] print( data.replace(b'Hello', b'Hello Cruel') ) # >>> bytearray(b'Hello Cruel World')

如果我們使用正則表達(dá)式匹配字節(jié)字符串，但是正則表達(dá)式本身必須也是字節(jié)串。比如：

data = b'FOO:BAR,SPAM' import re ## print ( re.split('[:,]',data) ) # >>> Traceback (most recent call last): # File "<stdin>", line 1, in <module> # File "ByteString.py", line 3, in split # return _compile(pattern, flags).split(string, maxsplit) # TypeError: can't use a string pattern on a bytes-like object print( re.split(b'[:,]',data) )# Notice: pattern as bytes # >>> [b'FOO', b'BAR', b'SPAM']

文章參考《python3-codebook》

總結(jié)

以上是生活随笔為你收集整理的python字符串与文本处理技巧(4): 格式化输出、令牌解析、串上串的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：几个小时后,我学数据库,找到一些代码
下一篇： websocket python爬虫_p