當(dāng)前位置：首頁 > 人文社科 > 生活经验 >内容正文

生活经验

Linux shell 学习笔记（15）— shell 正则表达式

發(fā)布時間：2023/11/28 生活经验 34 豆豆

生活随笔收集整理的這篇文章主要介紹了 Linux shell 学习笔记（15）— shell 正则表达式小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

1. 定義 BRE 模式

1.1 純文本

第一條原則就是：正則表達式模式都區(qū)分大小寫。這意味著它們只會匹配大小寫也相符的模式。

$ echo "This is a test" | sed -n '/this/p'
$
$ echo "This is a test" | sed -n '/This/p'
This is a test
$

在正則表達式中，你不用寫出整個單詞。只要定義的文本出現(xiàn)在數(shù)據(jù)流中，正則表達式就能夠匹配。

$ echo "The books are expensive" | sed -n '/book/p'
The books are expensive
$

1.2 特殊字符

正則表達式識別的特殊字符包括：.*[]^${}+?|()

如果要用某個特殊字符作為文本字符，就必須轉(zhuǎn)義。轉(zhuǎn)義字符就是反斜線（\）。

由于反斜線是特殊字符，如果要在正則表達式模式中使用它，你必須對其轉(zhuǎn)義，這樣就產(chǎn)生了兩個反斜線。

$ echo "\ is a special character" | sed -n '/\\/p'
\ is a special character
$

要使用正斜線，也需要進行轉(zhuǎn)義。

$ echo "3 / 2" | sed -n '/\//p'
3 / 2
$

1.3 錨字符

鎖定在行首

脫字符（^）定義從數(shù)據(jù)流中文本行的行首開始的模式。如果模式出現(xiàn)在行首之外的位置，正則表達式模式則無法匹配。

要用脫字符，就必須將它放在正則表達式中指定的模式前面。
```
$ echo "The book store" | sed -n '/^book/p'
$
$ echo "Books are great" | sed -n '/^Book/p'
Books are great
$
```
如果你將脫字符放到模式開頭之外的其他位置，那么它就跟普通字符一樣，不再是特殊字符了：
```
$ echo "This ^ is a test" | sed -n '/s ^/p'
This ^ is a test
$
```
由于脫字符出現(xiàn)在正則表達式模式的尾部，sed 編輯器會將它當(dāng)作普通字符來匹配。
鎖定在行尾

特殊字符美元符（$）定義了行尾錨點。將這個特殊字符放在文本模式之后來指明數(shù)據(jù)行必須以該文本模式結(jié)尾。
```
$ echo "This is a good book" | sed -n '/book$/p'
This is a good book
$ echo "This book is good" | sed -n '/book$/p'
$
```

組合錨點

可以在同一行中將行首錨點和行尾錨點組合在一起使用。

$ cat data4
this is a test of using both anchors
I said this is a test
this is a test
I'm sure this is a test.
$ sed -n '/^this is a test$/p' data4
this is a test
$

將兩個錨點直接組合在一起，之間不加任何文本，這樣過濾出數(shù)據(jù)流中的空白行。

$ cat data5
This is one test line.
This is another test line.
$ sed '/^$/d' data5
This is one test line.
This is another test line.
$

定義的正則表達式模式會查找行首和行尾之間什么都沒有的那些行。

1.4 點號字符

特殊字符點號用來匹配除換行符之外的任意單個字符。它必須匹配一個字符，如果在點號字符的位置沒有字符，那么模式就不成立。

$ cat data6
This is a test of a line.
The cat is sleeping.
That is a very nice hat.
This test is at line four.
at ten o'clock we'll go home.
$ sed -n '/.at/p' data6
The cat is sleeping.
That is a very nice hat.
This test is at line four.
$

1.5 字符組

可以定義用來匹配文本模式中某個位置的一組字符。使用方括號來定義一個字符組。方括號中包含所有你希望出現(xiàn)在該字符組中的字符。

$ sed -n '/[ch]at/p' data6
The cat is sleeping.
That is a very nice hat.
$

字符組不必只含有字母，也可以在其中使用數(shù)字。

$ cat data7
This line doesn't contain a number.
This line has 1 number on it.
This line a number 2 on it.
This line has a number 4 on it.
$ sed -n '/[0123]/p' data7
This line has 1 number on it.
This line a number 2 on it.
$

1.6 排除型字符組

在正則表達式模式中，也可以反轉(zhuǎn)字符組的作用。可以尋找組中沒有的字符，而不是去尋找組中含有的字符。只要在字符組的開頭加個脫字符。

$ sed -n '/[^ch]at/p' data6
This test is at line four.
$

通過排除型字符組，正則表達式模式會匹配c或h之外的任何字符以及文本模式。由于空格字符屬于這個范圍，它通過了模式匹配。但即使是排除，字符組仍然必須匹配一個字符，所以以 at 開頭的行仍然未能匹配模式。

1.7 區(qū)間

可以用單破折線符號在字符組中表示字符區(qū)間。只需要指定區(qū)間的第一個字符、單破折線以及區(qū)間的最后一個字符就行了。

$ sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' data8
60633
46201
45902
$

同樣的方法也適用于字母。

$ sed -n '/[c-h]at/p' data6
The cat is sleeping.
That is a very nice hat.
$

1.8 特殊字符數(shù)組

組	描述
[[:alpha:]]	匹配任意字母字符，不管是大寫還是小寫
[[:alnum:]]	匹配任意字母數(shù)字字符0_9、AZ或a~z
[[:blank:]]	匹配空格或制表符
[[:digit:]]	匹配0~9之間的數(shù)字
[[:lower:]]	匹配小寫字母字符a~z
[[:print:]]	匹配任意可打印字符
[[:punct:]]	匹配標(biāo)點符號
[[:space:]]	匹配任意空白字符：空格、制表符、NL、FF、VT和CR
[[:upper:]]	匹配任意大寫字母字符A~Z

$ echo "abc" | sed -n '/[[:digit:]]/p'
$
$ echo "abc" | sed -n '/[[:alpha:]]/p'
abc
$ echo "abc123" | sed -n '/[[:digit:]]/p'
abc123
$ echo "This is, a test" | sed -n '/[[:punct:]]/p'
This is, a test
$ echo "This is a test" | sed -n '/[[:punct:]]/p'
$

1.9 星號

在字符后面放置星號表明該字符必須在匹配模式的文本中出現(xiàn) 0 次或多次。

$ echo "ik" | sed -n '/ie*k/p'
ik
$ echo "iek" | sed -n '/ie*k/p'
iek
$ echo "ieek" | sed -n '/ie*k/p'
ieek
$ echo "ieeek" | sed -n '/ie*k/p'
ieeek
$ echo "ieeeek" | sed -n '/ie*k/p'
ieeeek
$

另一個方便的特性是將點號特殊字符和星號特殊字符組合起來。這個組合能夠匹配任意數(shù)量的任意字符。

$ echo "this is a regular pattern expression" | sed -n '
> /regular.*expression/p'
this is a regular pattern expression
$

2. 擴展正則表達式

2.1 問號

問號表明前面的字符可以出現(xiàn) 0 次或 1 次，但只限于此。它不會匹配多次出現(xiàn)的字符。

$ echo "bt" | gawk '/be?t/{print $0}'
bt
$ echo "bet" | gawk '/be?t/{print $0}'
bet
$ echo "beet" | gawk '/be?t/{print $0}'
$
$ echo "beeet" | gawk '/be?t/{print $0}'
$

2.2 加號

加號表明前面的字符可以出現(xiàn) 1 次或多次，但必須至少出現(xiàn) 1 次。

$ echo "beeet" | gawk '/be+t/{print $0}'
beeet
$ echo "beet" | gawk '/be+t/{print $0}'
beet
$ echo "bet" | gawk '/be+t/{print $0}'
bet
$ echo "bt" | gawk '/be+t/{print $0}'
$

2.3 花括號

m：正則表達式準(zhǔn)確出現(xiàn) m 次。
m, n：正則表達式至少出現(xiàn) m 次，至多 n 次。

$ echo "bt" | gawk --re-interval '/be{1,2}t/{print $0}'
$
$ echo "bet" | gawk --re-interval '/be{1,2}t/{print $0}'
bet
$ echo "beet" | gawk --re-interval '/be{1,2}t/{print $0}'
beet
$ echo "beeet" | gawk --re-interval '/be{1,2}t/{print $0}'
$

2.4 管道符號

管道符號允許你在檢查數(shù)據(jù)流時，用邏輯 OR 方式指定正則表達式引擎要用的兩個或多個模式。

使用管道符號的格式如下：expr1|expr2|…

正則表達式和管道符號之間不能有空格，否則它們也會被認(rèn)為是正則表達式模式的一部分。

$ echo "The cat is asleep" | gawk '/cat|dog/{print $0}'
The cat is asleep
$ echo "The dog is asleep" | gawk '/cat|dog/{print $0}'
The dog is asleep
$ echo "The sheep is asleep" | gawk '/cat|dog/{print $0}'
$

管道符號兩側(cè)的正則表達式可以采用任何正則表達式模式（包括字符組）來定義文本。

$ echo "He has a hat." | gawk '/[ch]at|dog/{print $0}'
He has a hat.
$

2.5 表達式分組

正則表達式模式也可以用圓括號進行分組。當(dāng)你將正則表達式模式分組時，該組會被視為一個標(biāo)準(zhǔn)字符?？梢韵駥ζ胀ㄗ址粯咏o該組使用特殊字符。

$ echo "Sat" | gawk '/Sat(urday)?/{print $0}'
Sat
$ echo "Saturday" | gawk '/Sat(urday)?/{print $0}'
Saturday
$

總結(jié)

以上是生活随笔為你收集整理的Linux shell 学习笔记（15）— shell 正则表达式的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Linux shell 学习笔记（12）
下一篇： pip install nmslib 失