正則表達式
特殊字符序列,匹配檢索和替換文本
普通字符 +?特殊字符 +?數量,普通字符用來定邊界
更改字符思路
字符串函數 >?正則 >?for循環
元字符 匹配一個字符
#?元字符大寫,一般都是取小寫的反
1. 0~9?整數 \d 取反 \D
import re
example_str = "Beautiful is better than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r"\d", example_str))
print(re.findall(r"\D", example_str))
2.?字母、數字、下劃線 ? ?\w 取反 \W
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘\w‘, example_str))
print(re.findall(r‘\W‘, example_str))
3.?空白字符(空格、\t、\t、\n) ?\s 取反 \S
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘\s‘, example_str))
print(re.findall(r‘\S‘, example_str))
4.?字符集中出現任意一個 [] 0-9 a-z A-Z 取反 [^]
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘[0-9]‘, example_str))
print(re.findall(r‘[^0-9]‘, example_str))
5.?除 \n?之外任意字符
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r".", example_str))
數量詞 指定前面一個字符出現次數
1. 貪婪和非貪婪
a. 默認情況下是貪婪匹配,盡可能最大匹配直至某個字符不滿足條件才會停止(最大滿足匹配)
b. 非貪婪匹配, 在數量詞后面加上 ? ,最小滿足匹配
c. 貪婪和非貪婪的使用,是程序引起bug重大原因
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘.*u‘, example_str))
print(re.findall(r‘.*?u‘, example_str))
2.?重復指定次數 {n} {n, m}
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘\d{3}‘, example_str))
3.?0次和無限多次 ? *
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘.*‘, example_str))
4.?1次和無限多次 +
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘\d+‘, example_str))
5.?0次或1次 ? ?? 使用思路: 去重
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘7896?‘, example_str))
邊界匹配
1. 從字符串開頭匹配 ^
2. 從字符串結尾匹配 $
正則表達式或關系 |
滿足 |?左邊或者右邊的正則表達式
import re
example_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"
print(re.findall(r‘\d+|\w+‘, example_str))
組
() 括號內的正則表達式當作單個字符,并且返回()內正則匹配的內容,可以多個,與關系
Python-正則相關模塊-re
1.?從字符中找到匹配正則的字符 findall()
import re
name = "Hello Python 3.7, 123456789"
total = re.findall(r"\d+", name)
print(total)
2.?替換正則匹配者字符串 sub()
import re
def replace(value):
return str(int(value.group()) + 1)
result_str = re.sub(r"\d", replace, name, 0)
print(result_str)
匹配一個中文字符???[\u4E00-\u9FA5]