Linux正則表達式

文章目錄

一、Linux正則表達式與三劍客知識
- 1.什么是正則表達式？
- 2.為什么要學習正則表達式？
- 3.有關正則表達式容易混淆的事項
- 4.學習正則表達式注意事項
- 5. 正則表達式的分類
- - 5.1 基本的正則表達式（BRE）集合
- 6. 正則表達式測試題
- 7. 擴展正則
- 8. 擴展正則測試題
- 9. 元字符
- 10. 特殊預定義表達式
二、Sed命令語法及參數說明

一、Linux正則表達式與三劍客知識

1.什么是正則表達式？

??? 簡單地說,正則表達式就是為處理大量的字符串及文本而定義的一套規則和方法。假設"@"代表"Iam”, "!"代表"buffes”，則執行 echo "@!“的結果就是輸出“I am buffes”。通過這些特殊符號的輔助,管理員就可以快速過濾、替換或輸出需要的字符串，讓Linux運維工作更高效。

Linux三劍客的正則表達式有如下幾個特點:
??? ??? 為處理大量文本及字符串而定義的一套規則和方法。
??? ??? 其工作時以行為單位進行，即一次處理一行。
??? ??? 通過正則表達式可以將復雜的處理任務化繁為簡，提高操作Linux的效率。
??? ??? 僅被三劍客（grep/egrep、sed、awk）命令支持，其他命令無法使用。

2.為什么要學習正則表達式？

??? 實際企業中,運維工程師在做Linux運維工作時,通常都會面對大量帶有字符串的內容,比如文本配置、程序、命令輸出及日志文件等,而我們經常會有迫切的需要,比如,要從大量的字符串內容中查找符合工作需要的特定的字符串，這就要靠正則表達式了。因此，可以說正則表達式就是為過濾這樣特殊字符串而生的！
??? 例如:要從ifconfig的輸出中取出IP地址段內容,就可以利用如下命令配合正則表達式字符匹配實現。

[root@buffes ~]# ifconfig eth0 | sed -rn '2s#".*addr: (.*) Bc.*$#\1#gp'#<=Centos6 下命令。
10.0.0.7[root@buffes  ~]# ifconfig eth0 | sed -rn '2s#",*inet (.*) net.*$#\1#gp'#<==Centos7下命令。
10.0.0.7

3.有關正則表達式容易混淆的事項

??? 正則表達式的應用非常廣泛,存在于各種語言中,例如: Python、 Java、 Perl (PCRE)等。但是,本文講的是Linux系統運維工作中的正則表達式,即Linux正則表達式,應用正則表達式的命令就是grep(egrep), sed.awk,換句話說, Linux三劍客要想工作更高效,那一定離不開正則表達式的配合。注意,其他普通命令正常情況下無法使用正則表達式。
??? 正則表達式和前文講解的通配符、特殊字符是有本質區別的,正則在Linux中是通過三劍客命令在文件(或數據流)中過濾內容的。而通配符是大部分普通命令都支持的,它主要是用來查找文件或目錄的,比如說查以txt結尾的文件時,就是用"*.txt"這樣的字符串匹配。這一點伙伴們要注意。

4.學習正則表達式注意事項

??????? 在學習正則表達式之前,先來看看下面幾點說明:
（1）Linux正則表達式是按照行為單位進行處理的。
（2）正則表達式僅適合三劍客命令,為了方便講解,本章更多使用grep和egrep命令進行演示,并且還會建議給它們加上一個別名配置,示例如下:

alias grep='grep --color=auto'
alias egrep='egrep --color=auto' #<==配置后會把匹配上的內容用紅色顯示，僅CentOS6 需要配置。

（3）注意LC_ALL環境變量的設置,建議為:

export LC_ALL=C    #<==配置后操作時不會出現異常匹配情況。

完整的處理及生效命令為：

cat >>/etc/profile<<EOF   #<==注意 EOF 后面不要有多余的空格。
alias grep='grep --color=auto'
alias egrep='egrep --color=auto'
export LC_ALL=C
EOF  #<==注意 EOF 前后都沒有空格或其他符號。source /etc/profile    #<==使修改的內容生效。

5. 正則表達式的分類

Linux三劍客的正則表達式分為兩類，即:
1）基本正則表達式（BRE, basic regular expression)
?????? BRE 對應的元字符有“^S.[]*”。
2）擴展正則表達式(ERE, extended regular expression)
?????? ERE在BRE的基礎上增加了“0{)?+1”等字符。

5.1 基本的正則表達式（BRE）集合

字符	作用
^	尖角號，用法為^koboid，表示匹配以koboid單詞開頭的行
$	美元符，用法為koboid$，表示匹配以koboid單詞結尾的行
^$	組合符，表示空行，邏輯解釋就是以^結尾的行，或者以$開頭的行
.	點號，表示匹配任意一個且只有一個字符（但是不能匹配空行）
\	轉義字符，讓有特殊含義的字符脫掉馬甲，現出原形，如.只表示小數點
*	重復前一個字符（連續出現）0次或N次
.*	組合符，匹配所有內容
^.*	組合符，匹配以任意多個字符開頭的內容
.*$	組合符，以任意多個字符結尾的內容
[abc]	匹配[]集合內的任意一個字符a或b或c；[abc]也可寫成[a-c]
[^abc]	匹配不包含`^`后的任意字符a或b或c，這里的`^`表示對[abc]的取反，^不能用 ! 替代

測試實驗：

^ 尖角號，用法為^koboid，表示匹配以koboid單詞開頭的行

	[root@buffes test]# grep ^I buffes.txt #輸出以I開頭的行I am buffes teacher!I teach linux.I like badminton ball ,billiard ball and chinese chess![root@buffes test]# ls -l ~|grep ^d #輸出以d開頭的行drwxr-xr-x. 2 1000 root       33 Jul  1  2030 abcdrwxr-xr-x. 2 root root       45 Jun  6  2019 girlLovedrwxr-xr-x. 3 root root       17 May  8  2021 buffes_dirdrwxr-xr-x. 2 root root       24 May 25 11:24 test

$ 美元符，用法為buffes$，表示匹配以buffes單詞結尾的行

[root@buffes test]# grep m$ buffes.txt #輸出以m結尾的行
our site is http://www.buffes.cn[root@buffes test]# ls -lF ~|grep /$
drwxr-xr-x. 2 1000 root       33 Jul  1  2030 abc/
drwxr-xr-x. 2 root root       45 Jun  6  2019 girlLove/
drwxr-xr-x. 3 root root       17 May  8  2021 buffes_dir/
drwxr-xr-x. 2 root root       24 May 25 11:24 test/

^$ 組合符，表示空行，邏輯解釋就是以^結尾的行，或者以$開頭的行

[root@buffes test]# cat buffes.txt -n1	I am buffes teacher!2	I teach linux.3	4	I like badminton ball ,billiard ball and chinese chess!5	our site is http://www.buffes.cn6	my qq num is 1234567.7	8	not 1234567.9	my god ,i am not buffes,but buffes!
[root@buffes test]# grep ^$ buffes.txt [root@buffes test]# grep -n ^$ buffes.txt  #輸出空行并打印行號
3:
7:

. 點號，表示匹配任意一個且只有一個字符（但是不能匹配空行）

[root@buffes test]# grep . buffes.txt 
I am buffes teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
my god ,i am not buffes,but buffes!
[root@buffes test]# grep -n . buffes.txt 
1:I am buffes teacher!
2:I teach linux.
4:I like badminton ball ,billiard ball and chinese chess!
5:our site is http://www.buffes.cn
6:my qq num is 1234567.
8:not 1234567.
9:my god ,i am not buffes,but buffes!

\ 轉義字符，讓有特殊含義的字符脫掉馬甲，現出原形，如.只表示小數點

[root@buffes test]# grep "\." buffes.txt #匹配帶點的行
I teach linux.
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
[root@buffes test]# grep "\.$" buffes.txt #匹配以點結尾的行
I teach linux.
my qq num is 1234567.
not 1234567.
[root@buffes test]# grep ".$" buffes.txt  #匹配以任意一個字符結尾的行
I am buffes teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
my god ,i am not buffes,but buffes!

* 重復前一個字符（連續出現）0次或N次

0*
空
0
00
00000[root@buffes test]# grep "0*" buffes.txt 
I am buffes teacher!
I teach linux.I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.not 1234567.
my god ,i am not oldbey,but buffes!
[root@buffes test]# grep "00*" buffes.txt 
my qq num is 1234567.
not 1234567.注意，當重復0次的時候，表示啥也沒有（空），即匹配所有內容

.* 組合符，匹配所有內容

[root@buffes test]# grep ".*" buffes.txt 
I am oldboy teacher!
I teach linux.I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.not 1234567.
my god ,i am not buffes,but BUFFES!^.*	組合符，匹配以任意多個字符開頭的內容.*$	組合符，以任意多個字符結尾的內容

[abc] 匹配[]集合內的任意一個字符a或b或c；[abc]也可寫成[a-c]

[root@buffes test]# grep "[a-z0-9A-Z\.\!:,/]" buffes.txt 
I am buffes teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
my god ,i am not buffes,but buffes![root@buffes test]# grep "." buffes.txt 
I am buffes teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
my god ,i am not buffes,but BUFFES!

[^abc] 匹配不包含^后的任意字符a或b或c，這里的^表示對[abc]的取反，^不能用!替代

[root@buffes test]# grep "[a-z0-9A-Z]" buffes.txt 
I am buffes teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
my god ,i am not oldbey,but buffes!
[root@buffes test]# 
[root@buffes test]# 
[root@buffes test]# 
[root@buffes test]# 
[root@buffes test]# grep "[^a-z0-9A-Z]" buffes.txt 
I am buffes teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
my god ,i am not buffes,but buffes!

6. 正則表達式測試題

1.過濾/etc/passwd中以nologin結尾的行。
grep "nologin$" /etc/passwd2.過濾/etc/passwd中以o開頭的行。
grep "^o" /etc/passwd3.過濾/etc/passwd中至少含有1個0字符串的行。
grep "00*" /etc/passwd4.過濾/etc/passwd中的空行。
grep "^$" /etc/passwd5.過濾/etc/目錄中（不含子目錄）下的所有文件。
ls -l /etc|grep "^-"6.過濾/etc/services中含有點號的行。
grep "\." /etc/services

7. 擴展正則

字符	作用
+	匹配前一個字符1次或多次
?	匹配前一個字符0次或1次
\|	表示或者，即同時過濾多個字符串
()	分組過濾被括起來的東西表示一個整體另外()的內容可以被后面的\n引用，n為數字，表示引用第幾個括號的內容
\n	引用前面()小括號里的內容，例如：(aa)\1，匹配aaaa
a{n,m}	匹配前一個字符最少n次，最多m次
a{n,}	匹配前一個字符最少n次
a{n}	匹配前一個字符正好n次
a{,m}	匹配前一個字符最多m次

+ 匹配前一個字符1次或多次

和*區別，*可以匹配0次。grep "0*" buffes.txt  #匹配0個0，或1個0或多個0
egrep "0+" buffes.txt #1個0或多個0[root@buffes test]# grep "0*" buffes.txt 
I am buffes teacher!
I teach linux.I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.not 1234567.
my god ,i am not buffes,but buffes!
[root@buffes test]# egrep "0+" buffes.txt
my qq num is 1234567.
not 1234567.

[:/]+ 匹配括號內的:或/字符1次或多次

[root@buffes test]# cat a.txt 
buffes
[root@buffes test]# egrep -o "." a.txt 
o
l
d
b
o
y[root@buffes test]# egrep "[:/]+" buffes.txt 
our site is http://www.buffe.cn
:::
///
:d::f
/etc/buffes//
[root@buffes test]# egrep "[:/]" buffes.txt 
our site is http://www.buffe.cn
:::
///
:d::f
/etc/buffes//

？匹配前一個字符0次或1次

[root@buffes test]# egrep "0?" buffes.txt 
I am buffes teacher!
I teach linux.I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.not 1234567.
my god ,i am not buffes,but buffes!
:::
///
:d::f
/etc/buffes//

| 表示或者，即同時過濾多個字符串

[root@buffes test]# egrep "000|buffes" buffes.txt 
I am buffes teacher!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
/etc/buffes//

()分組過濾被括起來的東西表示一個整體
另外()的內容可以被后面的\n引用，n為數字，表示引用第幾個括號的內容

\n	引用前面()小括號里的內容，例如：(aa)\1，匹配aaaa[root@buffes test]# 
[root@buffes test]# egrep "(0)(0)\1\2" buffes.txt 
not 1234567.
\1可以取出第1個括號的內容
\2可以取出第2個括號的內容。
.....
sed命令（下面幾個沒啥用）
a*
a+
a{n,m}	匹配前一個字符最少n次，最多m次a{n,}	匹配前一個字符最少n次a{n}	匹配前一個字符正好n次a{,m}	匹配前一個字符最多m次實踐：
[root@buffes test]# egrep "0{3,4}" buffes.txt 
my qq num is 1234567.
not 1234567.
[root@buffes test]# egrep "0{3,}" buffes.txt 
my qq num is 1234567.
not 1234567.
[root@buffes test]# egrep "0{3}" buffes.txt 
my qq num is 1234567.
not 1234567.
[root@buffes test]# egrep "0{,3}" buffes.txt 
I am buffes teacher!
I teach linux.I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.not 1234567.
my god ,i am not buffes,but buffes!
:::
///
:d::f
/etc/buffes//

8. 擴展正則測試題

測試題：
1.過濾/etc/passwd中含有root或buffes的行。grep -E "root|buffes" /etc/passwd2.過濾/etc/passwd中至少含有1個0字符串的行。egrep "0+" /etc/passwd3.過濾/etc/passwd中匹配o字符0次或1次的行。
egrep "o?" /etc/passwd4.過濾/etc/passwd中匹配0字符1次到3次的行。
egrep "0{1,3}" /etc/passwd5.過濾/etc/shadow中匹配含有連續多個冒號或斜線的行。
egrep "[:/]+" /etc/shadow

9. 元字符

表達式	描述
\b	匹配單詞邊界，例如：\bbuffes\b只匹配buffes單詞不匹配buffes*
\B	匹配非單詞的邊界例如：buffes\B 匹配buffes123中的buffes，不匹配單獨的buffes單詞
\w	匹配字母、數字與下劃線，等價[_[:alnum:]]
\W	匹配字母、數字與下劃線以外的字符，等價[^_[:alnum:]]
\d	匹配單個數字字符，注意，這個表達式需要使用grep -P參數才能識別*
\D	匹配單個非數字字符，注意，這個表達式需要使用grep -P參數才能識別
\s	匹配1位空白字符，注意，這個表達式需要使用grep -P參數才能識別
\S	匹配1位非空白字符，注意，這個表達式需要使用grep -P參數才能識別

測試：

[root@buffes test]# egrep "buffes\b" /etc/passwd
buffes:x:5023:5023::/home/buffes:/bin/bash
[root@buffes test]# 
[root@buffes test]# egrep "\bbuffes\b" /etc/passwd
buffes:x:5023:5023::/home/buffes:/bin/bash
[root@buffes test]# egrep -w "buffes" /etc/passwd
buffes:x:5023:5023::/home/buffes:/bin/bash

10. 特殊預定義表達式

正則表達式	描述	示例
[:alnum:]	匹配任意一個字母或數字字符，相當于[a-zA-Z0-9]	[[:alnum:]]
[:alpha:]	匹配任意一個大小寫字母字符，相當于[a-zA-Z]	[[:alpha:]]
[:blank:]	空格與制表符（橫向和縱向）	[[:blank:]]
[:digit:]	匹配任意一個數字字符，相當于[0-9]	[[:digit:]]
[:lower:]	匹配小寫字母，相當于[a-z]	[[:lower:]]
[:upper:]	匹配大寫字母，相當于[A-Z]	[[:upper:]]
[:punct:]	匹配標點符號	[[:punct:]]
[:space:]	匹配一個包括換行符、回車等在內的所有空白符	[[:space:]]
[:graph:]	匹配任何一個可以看得見的且可以打印的字符	[[:graph:]]
[:xdigit:]	任何一個十六進制數（即：0-9，a-f，A-F）	[[:xdigit:]]
[:cntrl:]	任何一個控制字符（ASCII字符集中的前32個字符)	[[:cntrl:]]
[:print:]	任何一個可以打印的字符	[[:print:]]

[root@buffes test]# egrep "[[:alnum:]]" buffes.txt 
I am buffes teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
my god ,i am not buffes,but buffes!
:d::f
/etc/buffes//
[root@buffes test]# egrep "[[:alpha:]]" buffes.txt 
I am buffes teacher!
I teach linux.
I like badminton ball ,billiard ball and chinese chess!
our site is http://www.buffes.cn
my qq num is 1234567.
not 1234567.
my god ,i am not buffes,but buffes!
:d::f
/etc/buffes//
[root@buffes test]# egrep "[:digit:]" buffes.txt 
grep: character class syntax is [[:space:]], not [:space:]
[root@buffes test]# egrep "[[:digit:]]" buffes.txt 
my qq num is 1234567.
not 1234567.
[root@buffes test]# -A after 顯示過濾的字符串和它之后的多少行
-B before 顯示過濾的字符串和它之前的多少行
-C context 顯示過濾的字符串和它之前之后的多少行
[root@buffes test]# grep -A 3 5 b.txt 
5
6
7
8
[root@buffes test]# grep -B 3 5 b.txt 
2
3
4
5
[root@buffes test]# grep -C 3 5 b.txt 
2
3
4
5
6
7
8

二、Sed命令語法及參數說明

【功能說明】
Sed是Stream Editor(字符流編輯器)的縮寫，簡稱流編輯器。
Sed是操作、過濾和轉換文本內容的強大工具。常用功能有對文件實現快速增刪改查（增加、刪除、修改、查詢），其中查詢的功能中最常用的2大功能是過濾（過濾指定字符串）和取行（取出指定行）。

取行和替換

【語法格式】
sed [選項] [sed內置命令字符] [輸入文件]

options[選項] 解釋說明（帶※的為重點）
-n 取消默認sed的輸出，常與sed內置命令的p連用※
輸出想要的內容。
-i 直接修改文件內容，而不是輸出到終端。
-e 允許多次編輯

p 全拼print，表示打印匹配行的內容,通常p會與選項-n一起使用※

[root@buffes test]# cat -n buffes.txt 1	I am buffes teacher!2	I teach linux.3	4	I like badminton ball ,billiard ball and chinese chess!5	our site is http://www.buffes.cn6	my qq num is 1234567.7	8	not 1234567.9	my god ,i am not buffes,but buffes!10	:::11	///12	:d::f13	/etc/buffes//

sed 命令的練習

取buffes.txt 2-4行sed -n '2,4p' buffes.txt[root@buffes test]# sed -n '2,4p' buffes.txt
I teach linux.I like badminton ball ,billiard ball and chinese chess!取第4行
[root@buffes test]# sed -n '4p' buffes.txt
I like badminton ball ,billiard ball and chinese chess!筆試至少5個答案問題2：過濾出含有buffes字符串的行※。
sed -n '//p' #框架化
方法1：
[root@oldboy test]# grep buffes buffes.txt 
I am buffesteacher!
our site is http://www.buffes.cn
/etc/buffes//
方法2:
[root@buffes test]# sed -n '/buffes/p' buffes.txt 
I am buffes teacher!
our site is http://www.buffes.cn
/etc/buffes//