Linux——Shell編程之正則表達式與文本處理器（筆記）

基礎正則表達式

1:基礎正則表達式示例

（4）查找任意一個字符“.”與重新字符“*”

（5）查找連續字符范圍“{ }”

文本處理器

一、sed工具

二、awk工具

（1）按行輸出文本

（2）按字段輸出文本

(3)通過管道、雙引號調用 she11 命令

基礎正則表達式

1:基礎正則表達式示例

? ? 下面的操作需要提前準備一個名為 test.txt 的測試文件，文件具體內容如下所示：

[root@localhost ~]# cat test.txt
he was short and fat, He was wearing a blue polo shirt with black pants. The home
of Football on BBc Sport online.
the tongue is boneless but it breaks bones.12!google is the best tools for search keyword, The year ahead will test our politicalestablishment to the limit. P3 141592653589793238462643383249901429a wood cross!
Actions speak louder than words
#N0ood #
#N000o0ood #
Axyzxуzxyzxy2C
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

（1）查找特定字符

[root@localhost w]# grep -n"the" test.txt
4:the tongue is boneless but it breaks bones.12!
5:google is the best tools for search keyword, 6:The year ahead will test our politicalestablishment to the limit.
Iroot@localhost w]# grep in "the' test.txt
3:The home of Football on 8Bc Sport online. 4:the tongue is boneless but it breaks
bones.121
5:google is the best tools for search keyword, 6:The year ahead will test our political
establishment to the limit.
若反向選擇，如查找不包含“the”字符的行，則需要通過 grep 命令的“-v"選項實現，并配合“-n”·起使用顯示行號。
[rootglocalhost ~]# grep -vn "the' test.txt
1:he was short and fat, 2:He was wearing a blue polo shirt with black pants. 3:Thehome of football on BBc Sport online,7:P-3.1415926535897932384626433832499014298:a wood cross!
9:Actions speak louder than words
10:
11:#woood #
12:#wo0oo0o0d #
13:Axyzxyzxyzxy2C
14:I bet this place is really spooky late at night!
15:Misfortunes never come alone/single, 16:I shouldn't have lett so tast.

（2）利用中括號“[ ]”l來查找集合字符

[rootglocalhost ~]# grep -n 'sh[io]rt" test.txt1:he was short and fat, 2:He was wearing a blue polo shirt with black pants.
若要查找包含重復單個字符“oo"時，只需要執行以下命令即可
[rootglocalhost ~]# grep -n 'oo' test.txt
3:The home of Football on 8Bc Sport online, 5:google is the best tools for search
keyword. 8:a wood cross!
11:#woood #
12:#wo0oocood #
14:I bet this place is really spooky late at night!
若查找“oo”前面不是“w”的字符串，只需要通過集合字符的反向選擇“[^]”來實現該目的。例如執行"grep -n'[^w]oo"test.txt"命令表示在 test.txt 文本中査找“oo"前面不是“w”的字符串。
[rootglocalhost ~]# grep -n'[^w]oo' test.txt3:The home of Football on BBc Sport online. 5:google is the best tools for searchkeyword. 11:#woood #
12:#wo0o00o0d #
14:I bet this place is really spooky late at night!
若查找“oo”前面不是“w”的字符串，只需要通過集合字符的反向選擇“[^]”來實現該目的。例如執行“grep -n'[^w]oo"test.txt"命令表示在 test.txt 文本中査找“oo”前面不是“w”的字符串
[root@localhost ~]# grep -n'['w]oo' test.txt
3:The home of Football on BBc Sport online. 5:google is the best tools for searchkeyword， 11:#woood #
12:#ho0o00o0d #
14:I bet this place is really spooky late at night!
在上述命令的執行結果中發現“woood”與"woo0oood"也符合匹配規則，二者均包含“w”。其實通過執行結果就可以看出，符合匹配標準的字符加粗顯示，而上還結果中可以得知。“#woood #”中加粗顯示的是“o0o”，而“oo”前面的“o”是符合匹配規則的。同理“#woooo0ood #”也符合匹配規則。若不希望“oo”前而存在小寫字母，可以使用“grep -n'[^a-z]oo'test.txt”命令實現，其中 a-z”表示小寫字母，大寫字母則通過“A-2”表示。
[rootglocalhost ~]# erep -n '[^a-z]oo" test.txt
3:The home of Football on BBc sport online.
查找包含數字的行可以通過“grep -n'[0-9]’test.txt”命令來實現。
[rootglocalhost ~]# grep -n '[8-9]" test.txt
4:the tongue is boneless but it breaks bones.12!
7:P1-3.141592653589793238462643383249901429

（3）查找行首“^”與行尾字符“$”

[root@localhost ~]# grep -n '^the' test.txt
4:the tongue is boneless but it breaks bones.12!
查詢以小寫字母開頭的行可以通過“^[a-z]"規則來過濾，查詢大寫字母開頭的行則使用“^[A-Z]“規若查詢不以字母開頭的行則使用“^[^a-zA-2]”規則。
[rootglocalhost ~]# grep -n '^[a-z]’ test.txt1:he was short and fat, 4:the tongue is boneless but it breaks bones.12!5:google is the best tools for search keyword, 8:a wood cross!
[root@localhost ~]# grep -n'^[A-2]' test.txt2:He was wearing a blue polo shirt with black pants, 3:The home of Football on 88CSport online. 6:The year ahead will test our political establishment to the limit.7:PI-3.1415926535897932384626433832499014299:Actions speak louder than words
13:AxyzxYzxy2xy2C
14:I bet this place is really spooky late at night!
15:Misfortunes never come alone/single. 16:I shouldn't have lett so tast.
[root@localhost ~]# grep -n'^[^a-zA-2]" test.txt
11:#woood #
12:#wo0o00o0d #
[root@localhost ~]# grep -n'.$"test.txt
1:he was short and fat, 2:He was wearing a blue polo shirt with black pants. 3:Thehome of Football on BBc Sport online. 5:google is the best tools for search keyword.6:The year ahead will test our political establishment to the limit. 15:Misfortunesnever come alone/single, 16:I shouldn't have lett so tast.
當查詢空白行時，執行“grep -n'ns"test.txt”命令即可，
[root@localhost ~]# grep -n'^g'test.txt106

（4）查找任意一個字符“.”與重新字符“*”

[root@localhost ~]# grep -n 'w..d' test.txt5:google is the best tools for search keyword,8:a wood cross!
9:Actions speak louder than words
12:#wo0o0co0d #
14:I bet this place is really spooky late at night!

[rootglocalhost ~]# grep -n 'ooo*' test.txt3:The home of Football on BBc Sport online. 5:google is the best tools for searchkeyword, 8:a wood cross!
11:#wo00d #
12:#wo0oo0ood #
14:I bet this place is really spooky late at night!
查詢以 w 開頭 d結尾，中間包含至少一個 o 的字符串，執行以下命令即可實現。[rootglocalhost ~]# grep -n'woo*d" test.txt
8:a wood cross!
11:#woood #
12:#woooo0ood #
執行以下命令即可查詢以 w開頭 d 結尾，中間的字符可有可無的字符申,
[root@localhost ~]# erep -n 'w.*d' test.txt1:he was short and fat. 5:google is the best tools for search keyword. 8:a wood cross!9:Actions speak louder than words
11:#woood #
12:#wo0oocood #
執行以下命令即可查詢任意數字所在行。
[rootglocalhost ~]# grep -n'[8-9][8-9].' test.txt4:the tongue is boneless but it breaks bones.12!
7:PI 3.141592653589793238462643383249901429

（5）查找連續字符范圍“{ }”

查詢兩個0的字符,

[root@localhost ~]# grep -n'o{2}' test.txt
3:The home of football on BBc Sport online. 5:google is the best tools for searchkeyword. 8:a wood cross!
11:#wo0od #
12:#wo0000o0d #
14:I bet this place is really spooky late at night!

查詢以 w 開頭以 d 結尾，中間包含 2~5 個0的字符串,

[rootglocalhost ~]# grep -n 'wo'{2,5'}d" test.txt
B:a wood cross!
11:#wo0od #

查詢以 w 開頭以 d 結尾，中間包含 2個或 2 個以上0的字符串。
?

[rootglocalhost ~]# grep n 'woW2,\}d' test.txt
B:a wood cross!
11:#wo0od #
12:#wo000co0d #

元字符總結
字符	說明
\	將下一個字符標記為一個特殊字符、或一個原義字符、或一個向后引用、或一個八進制轉義符；
^	匹配輸入字符串的開始位置；
$	匹配輸入字符串的結束位置；
*	匹配前面的子表達式零次或多次；
+	匹配前面的子表達式一次或多次；
?	匹配前面的子表達式零次或一次；
.	匹配除換行符（ \n、\r ）之外的任何單個字符；
[a-z]	字符范圍。匹配指定范圍內的任意字符；
{n}	n是一個非負整數，匹配確定的n次；
{n,}	n是一個非負整數，至少匹配n次；
{n,m}	m和n均為非負整數，其中n<=m。最少匹配n次且最多匹配m次
\d	匹配一個數字字符。等價于【0~9】；
\D	匹配一個非數字字符。等價于【^0~9】；
\s	匹配任何空白字符，包括空格、制表符、換頁符等等。等價于【^、\f、\n、\r、\t、\v】；
\S	匹配任何非空白字符。等價于【A~Z、a~z、0~9】;
\w	匹配字母、數字、下劃線。等價于`【A~Z、a~z、0~9、_】`；
\W	匹配非字母、數字、下劃線。等價于`【^A~Z、a~z、0~9、_】`；
\n	匹配一個換行符
\f	匹配一個換頁符
\r	匹配一個回車符

文本處理器

一、sed工具

? ?sed工具是一個強大而簡單的文本解析轉換工具，可以讀取文本，并根據指定的條件對文本內容進行編輯(刪除、替換、添加、移動等)，最后輸出所有行或者僅輸出處理的某些行。

? ?它也可以在無交互的情況下實現相當復雜的文本處理操作，被廣泛應用于 shel1 腳本中，用以完成各種自動化處理任務。工作流程主要包括讀取、執行和顯示三個過程。以下為詳細介紹：

讀取:sed 從輸入流(文件、管道、標準輸入)中讀取一行內容并存儲到臨時的緩沖區中。
執行:默認情況下,所有的 sed 命令都在模式空間中順序地執行,除非指定了行的地址,否則 sed命令將會在所有的行上依次執行。
顯示:發送修改后的內容到輸出流。在發送數據后，模式空間將會被清空。

? ? 格式如下：?

sed [選項] `操作` 參數
sed [選項] -f scriptfile 參數

? ? 常見的 sed 命令選項主要包含以下幾種：

-e或--expression=:表示用指定命令或者腳本來處理輸入的文本文件
-f 或--file=:表示用指定的腳本文件來處理輸入的文本文件。
-h或--help:顯示幫助。
-n、--quiet 或 silent:表示僅顯示處理后的結果。
-i:直接編輯文本文件。

? ? ?常見的操作包括以下幾種：

a:增加，在當前行下面增加一行指定內容。
c:替換，將選定行替換為指定內容。
d:刪除，刪除選定的行。
i:插入，在選定行上面插入一行指定內容。
P:打印，如果同時指定行，表示打印指定行;如果不指定行，則表示打印所有內容;如果有非
打印字符，則以 ASCII 碼輸出。其通常與“-n”選項一起使用。
s:替換，替換指定字符。
y:字符轉換。

(1)輸出符合條件的文本(p 表示正常輸出)

[root@localhost ~]# sed -n 'p' test.txt
//輸出所有內容,等同于 cat test.txt
he was short and fat, He was wearing a blue polo shirt with black pants. The homeof Footbal1on BBc sport online.
.….//省略部分內容
[root@localhost ~]# sed -n'3p' test.txt
//輸出第 3 行
The home of Football on BBc Sport online.
[root@localhost ~]# sed -n '3,5p'test.txt//輸出 3~5 行
The home of Football on BBc Sport online.the tongue is boneless but it breaks bones.12!google is the best tools for search keyword,
[root@localhost ~]# sed -n 'p;n' test.txt
//輸出所有奇數行,n 表示讀入下一行資料
he was short and fat, The home of football on BBc Sport online, google is the besttools for search keyword.…//省略部分內容
[root@localhost ~]# sed -n 'n;p' test.txt
//輸出所有偶數行,n 表示讀入下一行資料
He was wearing a blue polo shirt with black pants.
the tongue is boneless but it breaks bones.12!
The year ahead will test our political establishment to the limit.….//省略部分內容
[root@localhost ~]# sed -n'1,5{p;n}" test.txt//輸出第 1~5 行之間的奇數行(第 1、3、 5 行)he was short and fat. The home of Football on BBc sport online, google is the besttools for search keyword.
[root@localhost ~]# sed -n'10,${n;p}' test.txt//輸出第 18 行至文件尾之間的偶數行
#woood #
AxyzxуzxуzxyzC
Misfortunes never come alone/single.

(2)刪除符合條件的文本

[root@localhost ~]# nl test.txt sed '3d'    //刪除第 3 行
1 he was short and fat, 2 He was wearing a blue polo shirt with black pants. 4 thetongue is boneless but it breaks bones.12!
5 google is the best tools for search keyword. 6 The year ahead will test our politicalestablishment to the limit.…
//省略部分內容?
[root@localhost ~]# nl test.txt |sed '3,5d'
//刪除第 3~5 行
1 he was short and fat, 2 He was wearing a blue polo shirt with black pants. 6 Theahead will test our politicalestablishment to the limit.yearPI 3.141592653589793238462643383249901429
8 a wood cross!...
//省略部分內容
[root@localhost ~]# nl test.txt sed '/cross/d'//刪除包含 cross 的行,原本的第 8 行被刪除;如果要刪除不包含 cross 的行,用!符號表示取反操作，如'/cross/!d’…
//省略部分內容
7 PI-3.141592653589793238462643383249901429
9 Actions speak louder than words .
//省略部分內容
[root@localhost ~]# sed "/^[a-z]/d' test.txt
//刪除以小寫字母開頭的行
He was wearing a blue polo shirt with black pants, The home of football on BBc Sportonline, The year ahead will test our political establishment to the limit.P-3.141592653589793238462643383249901429
Actions speak louder than words
#woood #
#w000o0ood #
Axyzxyzxyzxy2C
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.
[root@localhost ~]# sed "/\.$/d' test.txt
//刪除以"."結尾的行
the tongue is boneless but it breaks bones.12!PI=3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words
#woood #
#wocoo0ood 并
AxyzxyzxyzxyzC
I bet this place is really spooky late at night!
[root@localhost ~]# sed "/^$/d' test.txt
//刪除所有空行
he was short and fat, He was wearing a blue polo shirt with black pants. The homeof Football on BBc sport online.
the tongue is boneless but it breaks bones.12!google is the best tools for search keyword, The year ahead will test our politicalestablishment to the limit. p 3.141592653589793238462643383249901429
a wood cross!
Actions speak louder than words
#woood 并
#W000000od #
Axyzxyzxyzxy2C
I bet this place is really spooky late at night!
Misfortunes never come alone/single.
I shouldn't have lett so tast.

?(3)替換符合條件的文本

在使用 sed 命令進行替換操作時需要用到 s(字符申替換)、c(整行/整塊營換)、y(字符轉換)命令選項，常見的用法如下所示。
sed 's/the/THE/" test.txt
//將每行中的第一個 the 替換為 THE
sed 's/l/L/2'test.txt
//將綠行中的第 2 個 1 替換為L
sed 's/the/THE/g ?test.txt
//將文件中的所有 the 普換為 THE
sed 's/o/'g' test.txt
//將文件中的所有 o 刪除(替換為空申)
sed 's/^/#/' test.txt
//在綠行行首插入#號
sed '/the/s/^/#/' test.txt
//在包含 the 的每行行首插入#號
sed 's/$/E0F/" test.txt
//在每行行尾插入字符申EOF
sed '3,5s/the/THE/e' test.txt
//將第 3~5 行中的所有 the 替換為 THE
鄭州課
sed '/the/s/o/o/g ?test.txt
//將包含 the 的所有行中的 。 都普換為 0

(4)遷移符合條件的文本

在使用 sed 命令遷移符合條件的文本時，常用到以下參數:
> H:女制到剪貼板;
g、G:將剪貼板中的數據覆蓋/追加至指定行:
w:保存為文件;
>r:讀取指定文件;
a:追加指定內容。。
具體操作方法如下所示。
關
sed '/the/(H;d};$6' test.txt
//將包含 the 的行遷移至文件末尾,{;}用于多個操作
sed '1,5(H;d};176' test.txt
//將第 1~5 行內容轉移至第 17 行后
sed '/the/w out.file" test.txt
//將包含 the 的行另存為文件 out.file
sed '/the'r /etc/hostname" test.txt
//將文件/etc/hostname 的內容添加到包含 the 的行以后
sed '3aNew" test.txt
//在第 3 行后插入一個新行,內容為 New
sed '/the/aNew' test.txt
//在包含 the 的每行后插入一個新行,內容為 Wew
sed '3aNewl\nNew2' test.txt
//在第 3 行后插入多行內容,中間的\n 表示換行

二、awk工具

? ? awk?是一種強大的?文本處理工具?，尤其適合處理結構化數據（如日志、CSV文件）。它不僅是命令行工具，還是一種編程語言，能夠高效完成數據提取、統計、格式化輸出等任務。通常情況下awk所使用的命令格式如下所示：

awk 選項 `模式或條件 {編輯命令}` 文件1 文件2 ...    //過濾并輸出文件中符合條件的內容
awk -f 腳本文件 文件1 文件2 ...    //從腳本中調用編輯指令，過濾并輸出內容

? ? ?其中，單引號加上大括號“{ }”用于設置對數據進行的處理動作。awk可以直接處理目標文件，也可以通過“ -f ”讀取腳本對目標腳本進行處理。

特殊的內建變量（可直接使用），如下：

FS：指定每行文本的字段分隔符，默認為空格或制表位；
NF：當前處理的行的字段個數；
NR：當前處理的行的行數（序數）；
$0：當前處理的行的整行的內容；
$n：當前處理的第n個字段（第n列）；
FILENAME：被處理的文件名；
RS：數據記錄分隔，默認為\n，即每行為一條記錄。

? ? 相對于sed命令，awk則傾向于將一行分成多個“字段” 然后再進行處理，且默認情況下字段的分隔符為空格或Tab鍵。而且其執行結果也可通過print的功能將字段數據打印顯示。在使用awk命令的過程中，可以也使用邏輯操作符“&&”表示“與”、“ || ”表示“或”、“ ！”表示“非”。范例如下：

[root@localhost ~]#awk -F ':' `{print $1,$3,$4}` /etc/passwd
root 0 0
bin 1 1
daemon 2 2
...    ...        //省略部分內容

? ? 處理邏輯過程如下：

?用法示例：

（1）按行輸出文本

按行輸出文本
awk '{print}' test.txt
//輸出所有內容,等同于 cat test.txt
awk '{print $o}' test.txt
//輸出所有內容,等同于 cat test.txt
awk 'NR==1,NR==3fprint}' test.txt
//輸出第 1~3 行內容
awk (NR>=1)&&(NR<=3){print}' test.txt//輸出第 1~3 行內容
awk 'NR==1/|NR==3{print}" test.txt//輸出第 1 行、第 3 行內容
awk "(NR%2)==1{print}'test.txt
//輸出所有奇數行的內容
awk "(NR%2)==0{print}' test.txt
//輸出所有偶數行的內容
awk "/^root/{print}'/etc/passwd
//輸出以 root 開頭的行

awk 'BEGIN {x=0};/\/bin\/bash$/{x++};END {print x}' /etc/passwd

//統計以/bin/bash 結尾的行數,等同于 grep-c"/bin/bash$"/etc/passwd
awk BEGIN{RS=""};END{print NR}" /etc/squid/squid.conf

//統計以空行分隔的文本段落數

（2）按字段輸出文本

awk '{print $3}' test.txt
//輸出每行中(以空格或制表位分隔)的第 3 個字段
awk '{print $1,$3}'test.txt
//輸出每行中的第 1、3 個字段
awk -F ":"'$2==""{print}'/etc/shadow//輸出密碼為空的用戶的 shadow 記錄
aWk 'BEGIN {FS=":"};$2==""{print}' /etc/shadow//輸出密碼為空的用戶的 shadow 記錄
awk -F":"'$7~"/bash"{print $1}" /etc/passwd//輸出以冒號分隔且第 7個字段中包含/bash 的行的第 1 個字段
awk '($1~"nfs")&&(NF==8){print $1,$2}' /etc/services//輸出包含 8 個字段且第 1 個字段中包含 nfs 的行的第 1、2 個字段
awk -F":"'($7!="/bin/bash")&&($7!="/sbin/nologin"){print}' /etc/passwd//輸出第7個字段既不為/bin/bash 也不為/sbin/nologin 的所有行?

(3)通過管道、雙引號調用 she11 命令

awk -F:'/bash$/{print|"wc -l"}'/etc/passwd//調用 wc -1 命令統計使用 bash 的用戶個數,等同于 grep -c"bash$"/etc/passwd
awk "BEGIN {while("w"|getline)n++ ;{print n-2}}//調用 w命令,并用來統計在線用戶數
awk 'BEGIN("hostname"getline ; print $0}//調用 hostname,并輸出當前的主機名