JavaScript正則表達式快速簡單的指南

Interested in learning JavaScript? Get my ebook at jshandbook.com
有興趣學習JavaScript嗎？在jshandbook.com上獲取我的電子書

正則表達式簡介 (Introduction to Regular Expressions)

A regular expression (also called regex for short) is a fast way to work with strings of text.

正則表達式(也簡稱為regex )是處理文本字符串的快速方法。

By formulating a regular expression with a special syntax, you can:

通過使用特殊語法制定正則表達式，您可以：

search for text in a string
搜索字符串中的文本
replace substrings in a string
替換字符串中的子字符串
and extract information from a string
并從字符串中提取信息

Almost every programming language features some implementation of regular expressions. There are small differences between each implementation, but the general concepts apply almost everywhere.

幾乎每種編程語言都具有一些正則表達式的實現。每種實現之間的差異很小，但是一般概念幾乎適用于所有地方。

Regular Expressions date back to the 1950s, when they were formalized as a conceptual search pattern for string processing algorithms.

正則表達式的歷史可以追溯到1950年代，當時正則表達式被正規化為字符串處理算法的概念搜索模式。

Implemented in UNIX tools like grep, sed, and in popular text editors, regexes grew in popularity. They were introduced into the Perl programming language, and later into many others as well.

在UNIX工具(例如grep，sed)和流行的文本編輯器中實現后，正則表達式越來越流行。它們被引入Perl編程語言，后來也引入許多其他語言。

JavaScript, along with Perl, is one of the programming languages that has support for regular expressions directly built into the language.

JavaScript與Perl一起是一種編程語言，它支持直接內置在該語言中的正則表達式。

困難但有用 (Hard but useful)

Regular expressions can seem like absolute nonsense to the beginner, and many times to the professional developer as well, if you don’t invest the time necessary to understand them.

如果您不花時間去理解正則表達式，那么對于初學者來說，正則表達式似乎絕對是胡說八道，對于專業開發人員而言，正則表達式也是如此。

Cryptic regular expressions are hard to write, hard to read, and hard to maintain/modify.

隱秘的正則表達式很難編寫 ， 難以閱讀 ，并且難以維護/修改 。

But sometimes a regular expression is the only sane way to perform some string manipulation, so it’s a very valuable tool in your pocket.

但是有時，正則表達式是執行某些字符串操作的唯一明智的方法 ，因此它是您口袋中非常有價值的工具。

This tutorial aims to introduce you to JavaScript Regular Expressions in a simple way, and to give you all the information to read and create regular expressions.

本教程旨在以簡單的方式向您介紹JavaScript正則表達式，并為您提供閱讀和創建正則表達式的所有信息。

The rule of thumb is that simple regular expressions are simple to read and write, while complex regular expressions can quickly turn into a mess if you don’t deeply grasp the basics.

經驗法則是， 簡單的正則表達式易于 讀寫，而如果您不太了解基本知識，則復雜的正則表達式會很快變成一團糟 。

正則表達式是什么樣的？ (What does a Regular Expression look like?)

In JavaScript, a regular expression is an object, which can be defined in two ways.

在JavaScript中，正則表達式是object ，可以通過兩種方式定義。

The first is by instantiating a new RegExp object using the constructor:

首先是通過使用構造函數實例化新的RegExp對象 ：

const re1 = new RegExp('hey')

The second is using the regular expression literal form:

第二種是使用正則表達式文字形式：

const re1 = /hey/

You know that JavaScript has object literals and array literals? It also has regex literals.

您知道JavaScript有對象文字和數組文字嗎？它還具有正則表達式文字 。

In the example above, hey is called the pattern. In the literal form it’s delimited by forward slashes, while with the object constructor, it’s not.

在上面的示例中， hey被稱為pattern 。在文字形式中，它由正斜杠定界，而在對象構造函數中則不是。

This is the first important difference between the two forms, but we’ll see others later.

這是這兩種形式之間的第一個重要區別，但稍后我們將介紹其他形式。

它是如何工作的？ (How does it work?)

The regular expression we defined as re1 above is a very simple one. It searches the string hey, without any limitation. The string can contain lots of text, and hey in the middle, and the regex is satisfied. It could also contain just hey, and the regex would be satisfied as well.

我們在上面定義為re1的正則表達式是一個非常簡單的表達式。它搜索字符串hey ，沒有任何限制。該字符串可以包含很多文本，中間是hey ，并且滿足正則表達式。它也可能只包含hey ，并且正則表達式也將得到滿足。

That’s pretty simple.

那很簡單。

You can test the regex using RegExp.test(String), which returns a boolean:

您可以使用RegExp.test(String)來測試正則表達式，它返回一個布爾值：

re1.test('hey') //? re1.test('blablabla hey blablabla') //? re1.test('he') //? re1.test('blablabla') //?

In the above example, we just checked if "hey" satisfies the regular expression pattern stored in re1.

在上面的示例中，我們只是檢查"hey"滿足存儲在re1的正則表達式模式。

This is the simplest it can be, but now you already know lots of concepts about regexes.

這可能是最簡單的，但是現在您已經了解了許多有關正則表達式的概念。

錨定 (Anchoring)

/hey/

matches hey wherever it was put inside the string.

匹配hey無論它放在字符串中的什么位置。

If you want to match strings that start with hey, use the ^ operator:

如果要匹配以hey 開頭的字符串，請使用^運算符：

/^hey/.test('hey') //? /^hey/.test('bla hey') //?

If you want to match strings that end with hey, use the $ operator:

如果要匹配以hey 結尾的字符串，請使用$運算符：

/hey$/.test('hey') //? /hey$/.test('bla hey') //? /hey$/.test('hey you') //?

Combine those, and match strings that exactly match hey, and just that string:

合并這些，并匹配與hey完全匹配的字符串，然后匹配該字符串：

/^hey$/.test('hey') //?

To match a string that starts with a substring and ends with another, you can use .*, which matches any character repeated 0 or more times:

要匹配以子字符串開頭和以子字符串結尾的字符串，可以使用.* ，它匹配重復0次或多次的任何字符：

/^hey.*joe$/.test('hey joe') //? /^hey.*joe$/.test('heyjoe') //? /^hey.*joe$/.test('hey how are you joe') //? /^hey.*joe$/.test('hey joe!') //?

匹配范圍內的項目 (Match items in ranges)

Instead of matching a particular string, you can choose to match any character in a range, like:

您可以選擇匹配范圍內的任何字符，而不是匹配特定的字符串，例如：

/[a-z]/ //a, b, c, ... , x, y, z /[A-Z]/ //A, B, C, ... , X, Y, Z /[a-c]/ //a, b, c /[0-9]/ //0, 1, 2, 3, ... , 8, 9

These regexes match strings that contain at least one of the characters in those ranges:

這些正則表達式匹配包含以下范圍內至少一個字符的字符串：

/[a-z]/.test('a') //? /[a-z]/.test('1') //? /[a-z]/.test('A') //? /[a-c]/.test('d') //? /[a-c]/.test('dc') //?

Ranges can be combined:

范圍可以組合：

/[A-Za-z0-9]/

/[A-Za-z0-9]/.test('a') //? /[A-Za-z0-9]/.test('1') //? /[A-Za-z0-9]/.test('A') //?

多次匹配范圍項 (Matching a range item multiple times)

You can check if a string contains one and only one character in a range by using the - char:

您可以檢查是否字符串包含一個且只有一個在一個范圍內使用字符-字符：

/^[A-Za-z0-9]$/

/^[A-Za-z0-9]$/.test('A') //? /^[A-Za-z0-9]$/.test('Ab') //?

否定模式 (Negating a pattern)

The ^ character at the beginning of a pattern anchors it to the beginning of a string.

模式開頭的^字符會將其錨定到字符串的開頭。

Used inside a range, it negates it, so:

在范圍內使用時，它會否定它，因此：

/[^A-Za-z0-9]/.test('a') //? /[^A-Za-z0-9]/.test('1') //? /[^A-Za-z0-9]/.test('A') //? /[^A-Za-z0-9]/.test('@') //?

\d matches any digit, equivalent to [0-9]
\d匹配任何數字，等于[0-9]
\D matches any character that’s not a digit, equivalent to [^0-9]
\D匹配任何不是數字的字符，等效于[^0-9]
\w matches any alphanumeric character, equivalent to [A-Za-z0-9]
\w匹配任何字母數字字符，等效于[A-Za-z0-9]
\W matches any non-alphanumeric character, equivalent to [^A-Za-z0-9]
\W匹配任何非字母數字字符，等效于[^A-Za-z0-9]
\s matches any whitespace character: spaces, tabs, newlines and Unicode spaces
\s匹配任何空白字符：空格，制表符，換行符和Unicode空格
\S matches any character that’s not a whitespace
\S匹配任何非空格字符
\0 matches null
\0匹配null
\n matches a newline character
\n匹配換行符
\t matches a tab character
\t匹配制表符
\uXXXX matches a unicode character with code XXXX (requires the u flag)
\uXXXX將一個Unicode字符與代碼XXXX匹配(需要u標志)
. matches any character that is not a newline char (e.g. \n) (unless you use the s flag, explained later on)
. 匹配不是換行符的任何字符(例如\n )(除非您使用s標志，稍后再解釋)
[^] matches any character, including newline characters. It’s useful on multiline strings.
[^]匹配任何字符，包括換行符。在多行字符串上很有用。

正則表達式選擇 (Regular expression choices)

If you want to search one string or another, use the | operator.

如果要搜索一個或另一個字符串，請使用| 操作員。

/hey|ho/.test('hey') //? /hey|ho/.test('ho') //?

量詞 (Quantifiers)

Say you have this regex that checks if a string has one digit in it, and nothing else:

假設您有這個正則表達式，用于檢查字符串中是否包含一位數字，而沒有其他內容：

/^\d$/

You can use the ? quantifier to make it optional, thus requiring zero or one:

您可以使用? 量詞以使其為可選，因此需要零或一：

/^\d?$/

but what if you want to match multiple digits?

但是如果要匹配多個數字怎么辦？

You can do it in 4 ways, using +, *, {n} and {n,m}. Let’s look at these one by one.

您可以使用+ ， * ， {n}和{n,m}四種方式來實現。讓我們一一看一下。

`+` (`+`)

Match one or more (>=1) items

匹配一個或多個(> = 1)項目

/^\d+$/

/^\d+$/.test('12') //? /^\d+$/.test('14') //? /^\d+$/.test('144343') //? /^\d+$/.test('') //? /^\d+$/.test('1a') //?

`` (``)

Match 0 or more (>= 0) items

匹配0個或更多(> = 0)項目

/^\d+$/

/^\d*$/.test('12') //? /^\d*$/.test('14') //? /^\d*$/.test('144343') //? /^\d*$/.test('') //? /^\d*$/.test('1a') //?

`{n}` (`{n}`)

Match exactly n items

完全匹配n項目

/^\d{3}$/

/^\d{3}$/.test('123') //? /^\d{3}$/.test('12') //? /^\d{3}$/.test('1234') //? /^[A-Za-z0-9]{3}$/.test('Abc') //?

`{n,m}` (`{n,m}`)

Match between n and m times:

在n和m次之間匹配：

/^\d{3,5}$/

/^\d{3,5}$/.test('123') //? /^\d{3,5}$/.test('1234') //? /^\d{3,5}$/.test('12345') //? /^\d{3,5}$/.test('123456') //?

m can be omitted to have an open ending, so you have at least n items:

m可以省略以具有一個開放的結尾，因此您至少有n項目：

/^\d{3,}$/

/^\d{3,}$/.test('12') //? /^\d{3,}$/.test('123') //? /^\d{3,}$/.test('12345') //? /^\d{3,}$/.test('123456789') //?

可選項目 (Optional items)

Following an item with ? makes it optional:

以下項目帶有? 使它成為可選的：

/^\d{3}\w?$/

/^\d{3}\w?$/.test('123') //? /^\d{3}\w?$/.test('123a') //? /^\d{3}\w?$/.test('123ab') //?

團體 (Groups)

Using parentheses, you can create groups of characters: (...)

使用括號可以創建字符組： (...)

This example matches exactly 3 digits followed by one or more alphanumeric characters:

本示例完全匹配3個數字，后跟一個或多個字母數字字符：

/^(\d{3})(\w+)$/

/^(\d{3})(\w+)$/.test('123') //? /^(\d{3})(\w+)$/.test('123s') //? /^(\d{3})(\w+)$/.test('123something') //? /^(\d{3})(\w+)$/.test('1234') //?

Repetition characters put after a group closing parentheses refer to the whole group:

分組結束括號后的重復字符是指整個分組：

/^(\d{2})+$/

/^(\d{2})+$/.test('12') //? /^(\d{2})+$/.test('123') //? /^(\d{2})+$/.test('1234') //?

捕獲組 (Capturing groups)

So far, we’ve seen how to test strings and check if they contain a certain pattern.

到目前為止，我們已經看到了如何測試字符串并檢查它們是否包含特定模式。

A very cool feature of regular expressions is the ability to capture parts of a string, and put them into an array.

正則表達式的一個非常酷的功能是能夠捕獲字符串的各個部分 ，并將它們放入數組中。

You can do so using Groups, and in particular Capturing Groups.

您可以使用“組”，尤其是“ 捕獲組”來執行此操作。

By default, a Group is a Capturing Group. Now, instead of using RegExp.test(String), which just returns a boolean if the pattern is satisfied, we use either String.match(RegExp) or RegExp.exec(String).

默認情況下，組是捕獲組。現在，我們使用String.match(RegExp)或RegExp.exec(String) ，而不是使用RegExp.test(String)如果滿足模式則僅返回布爾值RegExp.exec(String) 。

They are exactly the same, and return an Array with the whole matched string in the first item, then each matched group content.

它們是完全相同的，并返回一個數組，該數組的第一行是整個匹配的字符串，然后是每個匹配的組內容。

If there is no match, it returns null:

如果不匹配，則返回null ：

'123s'.match(/^(\d{3})(\w+)$/) //Array [ "123s", "123", "s" ]

/^(\d{3})(\w+)$/.exec('123s') //Array [ "123s", "123", "s" ]

'hey'.match(/(hey|ho)/) //Array [ "hey", "hey" ]

/(hey|ho)/.exec('hey') //Array [ "hey", "hey" ]

/(hey|ho)/.exec('ha!') //null

When a group is matched multiple times, only the last match is put in the result array:

當一個組被多次匹配時，只有最后一個匹配項被放入結果數組中：

'123456789'.match(/(\d)+/) //Array [ "123456789", "9" ]

可選組 (Optional groups)

A capturing group can be made optional by using (...)?. If it’s not found, the resulting array slot will contain undefined:

可以使用(...)?將捕獲組設為可選組(...)? 。如果找不到，則結果數組插槽將包含undefined ：

/^(\d{3})(\s)?(\w+)$/.exec('123 s') //Array [ "123 s", "123", " ", "s" ]

/^(\d{3})(\s)?(\w+)$/.exec('123s') //Array [ "123s", "123", undefined, "s" ]

參考匹配組 (Reference matched groups)

Every group that’s matched is assigned a number. $1 refers to the first, $2 to the second, and so on. This will be useful when we talk later on about replacing parts of a string.

每個匹配的組都會分配一個數字。 $1指向第一個， $2指向第二個，依此類推。當我們稍后討論替換字符串的部分時，這將很有用。

命名捕獲組 (Named capturing groups)

This is a new ES2018 feature.

這是ES2018的新功能。

A group can be assigned to a name, rather than just being assigned a slot in the resulting array:

可以為一個組分配一個名稱，而不僅僅是在結果數組中分配一個插槽：

const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/ const result = re.exec('2015-01-02')

// result.groups.year === '2015'; // result.groups.month === '01'; // result.groups.day === '02';

在沒有組的情況下使用match和exec (Using match and exec without groups)

There is a difference between using match and exec without groups: the first item in the array is not the whole matched string, but the match directly:

在不使用分組的情況下使用match和exec之間是有區別的：數組中的第一項不是整個匹配的字符串，而是直接匹配：

/hey|ho/.exec('hey') // [ "hey" ]

/(hey).(ho)/.exec('hey ho') // [ "hey ho", "hey", "ho" ]

非捕獲組 (Noncapturing groups)

Since by default groups are Capturing Groups, you need a way to ignore some groups in the resulting array. This is possible using Noncapturing Groups, which start with a (?:...)

由于默認情況下組是捕獲組，因此您需要一種方法來忽略結果數組中的某些組。這可以使用非捕獲組 (?:...)以(?:...)開頭

'123s'.match(/^(\d{3})(?:\s)(\w+)$/)//null

'123 s'.match(/^(\d{3})(?:\s)(\w+)$/) //Array [ "123 s", "123", "s" ]

標志 (Flags)

You can use the following flags on any regular expression:

您可以在任何正則表達式上使用以下標志：

g: matches the pattern multiple times
g ：多次匹配模式
i: makes the regex case insensitive
i ：使正則表達式不區分大小寫
m: enables multiline mode. In this mode, ^ and $ match the start and end of the whole string. Without this, with multiline strings they match the beginning and end of each line.
m ：啟用多行模式。在此模式下， ^和$匹配整個字符串的開始和結束。否則，多行字符串將匹配每行的開頭和結尾。
u: enables support for unicode (introduced in ES6/ES2015)
u ：啟用對unicode的支持(在ES6 / ES2015中引入)
s: (new in ES2018) short for single line, it causes the . to match new line characters as well.
s ：( ES2018中的新增功能 ) 單行的簡稱，它會導致. 以匹配換行符。

Flags can be combined, and they are added at the end of the string in regex literals:

可以組合標志，并在正則表達式文字中的字符串末尾添加標志：

/hey/ig.test('HEy') //?

or as the second parameter with RegExp object constructors:

或作為RegExp對象構造函數的第二個參數：

new RegExp('hey', 'ig').test('HEy') //?

檢查正則表達式 (Inspecting a regex)

Given a regex, you can inspect its properties:

給定一個正則表達式，您可以檢查其屬性：

source the pattern string
source模式字符串
multiline true with the m flag
帶m標志的multiline true
global true with the g flag
帶g標志的global true
ignoreCase true with the i flag
ignoreCase與i標志一起為true
lastIndex
lastIndex

/^(\w{3})$/i.source //"^(\\d{3})(\\w+)$" /^(\w{3})$/i.multiline //false /^(\w{3})$/i.lastIndex //0 /^(\w{3})$/i.ignoreCase //true /^(\w{3})$/i.global //false

轉義 (Escaping)

These characters are special:

這些字符是特殊的：

\
\
/
/
[ ]
[ ]
( )
( )
{ }
{ }
?
?
+
+
*
*
|
|
.
.
^
^
$
$

They are special because they are control characters that have a meaning in the regular expression pattern. If you want to use them inside the pattern as matching characters, you need to escape them, by prepending a backslash:

它們之所以特別是因為它們是在正則表達式模式中具有含義的控制字符。如果要在模式中將它們用作匹配字符，則需要通過在其前面加上反斜杠來對其進行轉義：

/^\\$/ /^\^$/ // /^\^$/.test('^') ? /^\$$/ // /^\$$/.test('$') ?

字符串邊界 (String boundaries)

\b and \B let you inspect whether a string is at the beginning or at the end of a word:

\b和\B讓您檢查字符串是在單詞的開頭還是結尾：

\b matches a set of characters at the beginning or end of a word
\b匹配單詞開頭或結尾的一組字符
\B matches a set of characters not at the beginning or end of a word
\B匹配不在單詞開頭或結尾的一組字符

Example:

例：

'I saw a bear'.match(/\bbear/) //Array ["bear"] 'I saw a beard'.match(/\bbear/) //Array ["bear"] 'I saw a beard'.match(/\bbear\b/) //null 'cool_bear'.match(/\bbear\b/) //null

使用正則表達式替換 (Replace, using Regular Expressions)

We already saw how to check if a string contains a pattern.

我們已經看到了如何檢查字符串是否包含模式。

We also saw how to extract parts of a string to an array, matching a pattern.

我們還看到了如何將字符串的一部分提取到與模式匹配的數組中。

Let’s see how to replace parts of a string based on a pattern.

讓我們看看如何根據模式替換字符串的各個部分 。

The String object in JavaScript has a replace() method, which can be used without regular expressions to perform a single replacement on a string:

JavaScript中的String對象具有replace()方法，無需使用正則表達式就可以對字符串執行單個替換 ：

"Hello world!".replace('world', 'dog') //Hello dog!

"My dog is a good dog!".replace('dog', 'cat') //My cat is a good dog!

This method also accepts a regular expression as argument:

此方法還接受正則表達式作為參數：

"Hello world!".replace(/world/, 'dog') //Hello dog!

Using the g flag is the only way to replace multiple occurrences in a string in vanilla JavaScript:

使用g標志是替換香草JavaScript字符串中多次出現的唯一方法 ：

"My dog is a good dog!".replace(/dog/g, 'cat') //My cat is a good cat!

Groups let us do more fancy things, like moving around parts of a string:

組讓我們做更多更有趣的事情，例如在字符串的各個部分之間移動：

"Hello, world!".replace(/(\w+), (\w+)!/, '$2: $1!!!') // "world: Hello!!!"

Instead of using a string you can use a function, to do even fancier things. It will receive a number of arguments like the one returned by String.match(RegExp) or RegExp.exec(String), with a number of arguments that depends on the number of groups:

除了使用字符串，您還可以使用函數來執行更出色的操作。它將收到許多參數，例如String.match(RegExp)或RegExp.exec(String)返回的參數，其中的參數取決于組的數量：

"Hello, world!".replace(/(\w+), (\w+)!/, (matchedString, first, second) => {   console.log(first);   console.log(second);

return `${second.toUpperCase()}: ${first}!!!` })

//"WORLD: Hello!!!"

貪婪 (Greediness)

Regular expressions are said to be greedy by default.

正則表達式默認說是貪婪的。

What does it mean?

這是什么意思？

Take this regex:

使用此正則表達式：

/\$(.+)\s?/

It is supposed to extract a dollar amount from a string:

應該從字符串中提取美元金額：

/\$(.+)\s?/.exec('This costs $100')[1] //100

but if we have more words after the number, it freaks out:

但是如果我們在數字后再加上一些字，它就會嚇到了：

/\$(.+)\s?/.exec('This costs $100 and it is less than $200')[1] //100 and it is less than $200

Why? Because the regex after the $ sign matches any character with .+, and it won’t stop until it reaches the end of the string. Then, it finishes off because \s? makes the ending space optional.

為什么？因為$符號后的正則表達式與.+匹配任何字符，并且直到到達字符串末尾時它才會停止。然后，它結束了，因為\s? 使結尾空間為可選。

To fix this, we need to tell the regex to be lazy, and perform the least amount of matching possible. We can do so using the ? symbol after the quantifier:

要解決此問題，我們需要告訴正則表達式是惰性的，并執行盡可能少的匹配。我們可以使用? 量詞后的符號：

/\$(.+?)\s/.exec('This costs $100 and it is less than $200')[1] //100

I removed the ? after \s . Otherwise it matched only the first number, since the space was optional
我刪除了? 在\s 。 否則，它僅與第一個數字匹配，因為空格是可選的

So, ? means different things based on its position, because it can be both a quantifier and a lazy mode indicator.

那么， ? 根據其位置，意味著不同的事物，因為它既可以是量詞又可以是惰性模式指示符。

前瞻：根據字符串匹配字符串 (Lookaheads: match a string depending on what follows it)

Use ?= to match a string that’s followed by a specific substring:

使用?=匹配后面跟特定子字符串的字符串：

/Roger(?=Waters)/

/Roger(?= Waters)/.test('Roger is my dog') //false /Roger(?= Waters)/.test('Roger is my dog and Roger Waters is a famous musician') //true

?! performs the inverse operation, matching if a string is not followed by a specific substring:

?! 執行逆運算，如果字符串后沒有特定的子字符串則匹配：

/Roger(?!Waters)/

/Roger(?! Waters)/.test('Roger is my dog') //true /Roger(?! Waters)/.test('Roger Waters is a famous musician') //false

Lookbehinds：根據字符串的開頭匹配字符串 (Lookbehinds: match a string depending on what precedes it)

This is an ES2018 feature.

這是ES2018的功能。

Lookaheads use the ?= symbol. Lookbehinds use ?&lt;=.

提前使用?=符號。后視使用?&l t; =。

/(?<=Roger) Waters/

/(?<=Roger) Waters/.test('Pink Waters is my dog') //false

/(?<=Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician') //true

A lookbehind is negated using ?&lt;!:

使用?&l t ;!可以使后向否定：

/(?<!Roger) Waters/

/(?<!Roger) Waters/.test('Pink Waters is my dog') //true

/(?<!Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician') //false

正則表達式和Unicode (Regular expressions and Unicode)

The u flag is mandatory when working with Unicode strings. In particular, this applies when you might need to handle characters in astral planes (the ones that are not included in the first 1600 Unicode characters).

使用Unicode字符串時， u標志是必需的。特別是，當您可能需要處理星體平面中的字符(前1600個Unicode字符中未包含的字符)時，這適用。

Emojis are a good example, but they’re not the only one.

表情符號是一個很好的例子，但并非唯一的表情符號。

If you don’t add that flag, this simple regex that should match one character will not work, because for JavaScript that emoji is represented internally by 2 characters (see Unicode in JavaScript):

如果不添加該標志，則此應匹配一個字符的簡單正則表達式將不起作用，因為對于JavaScript，表情符號在內部由2個字符表示(請參見JavaScript中的Unicode )：

/^.$/.test('a') //? /^.$/.test('?') //? /^.$/u.test('?') //?

So, always use the u flag.

Unicode, just like normal characters, handle ranges:

/[a-z]/.test('a') //? /[1-9]/.test('1') //? /[?-?]/u.test('?') //? /[?-?]/u.test('?') //?

JavaScript checks the internal code representation, so ? < ? < ? because \u1F436 < \u1F43A <; \u1F98A. Check the full Emoji list to get those codes, and to find out the order (tip: the macOS Emoji picker has some emojis in a mixed order, so don’t count on it).

JavaScript checks the internal code representation, so ? < ? < ? becau se \u1F 436 < \ u1F 43A < ; \u1F98A. C heck the full E moji list to get those codes, and to find out the order (tip: the macOS Emoji picker has some emojis in a mixed order, so don't count on it).

`Unicode property escapes` (`Unicode property escapes`)

As we saw above, in a regular expression pattern you can use \d to match any digit, \s to match any character that’s not a white space, \w to match any alphanumeric character, and so on.

As we saw above, in a regular expression pattern you can use \d to match any digit, \s to match any character that's not a white space, \w to match any alphanumeric character, and so on.

The Unicode property escapes is an ES2018 feature that introduces a very cool feature, extending this concept to all Unicode characters introducing \p{} and its negation \P{}.

The Unicode property escapes is an ES2018 feature that introduces a very cool feature, extending this concept to all Unicode characters introducing \p{} and its negation \P{} .

Any Unicode character has a set of properties. For example Script determines the language family, ASCII is a boolean that’s true for ASCII characters, and so on. You can put this property in the graph parentheses, and the regex will check for that to be true:

Any Unicode character has a set of properties. For example Script determines the language family, ASCII is a boolean that's true for ASCII characters, and so on. You can put this property in the graph parentheses, and the regex will check for that to be true:

/^\p{ASCII}+$/u.test('abc') //? /^\p{ASCII}+$/u.test('ABC@') //? /^\p{ASCII}+$/u.test('ABC?') //?

ASCII_Hex_Digit is another boolean property that checks if the string only contains valid hexadecimal digits:

/^\p{ASCII_Hex_Digit}+$/u.test('0123456789ABCDEF') //? /^\p{ASCII_Hex_Digit}+$/u.test('h') //?

There are many other boolean properties, which you just check by adding their name in the graph parentheses, including Uppercase, Lowercase, White_Space, Alphabetic, Emoji and more:

There are many other boolean properties, which you just check by adding their name in the graph parentheses, including Uppercase , Lowercase , White_Space , Alphabetic , Emoji and more:

/^\p{Lowercase}$/u.test('h') //? /^\p{Uppercase}$/u.test('H') //?

/^\p{Emoji}+$/u.test('H') //? /^\p{Emoji}+$/u.test('??') //?

In addition to those binary properties, you can check any of the unicode character properties to match a specific value. In this example, I check if the string is written in the Greek or Latin alphabet:

/^\p{Script=Greek}+$/u.test('ελληνικ?') //? /^\p{Script=Latin}+$/u.test('hey') //?

Read more about all the properties you can use directly on the proposal.

Read more about all the properties you can use directly on the proposal .

`Examples` (`Examples`)

Supposing a string has only one number you need to extract, /\d+/ should do it:

'Test 123123329'.match(/\d+/) // Array [ "123123329" ]

`Match an email address` (`Match an email address`)

A simplistic approach is to check non-space characters before and after the @ sign, using \S:

A simplistic approach is to check non-space characters before and after the @ sign, using \S :

/(\S+)@(\S+)\.(\S+)/

/(\S+)@(\S+)\.(\S+)/.exec('copesc@gmail.com') //["copesc@gmail.com", "copesc", "gmail", "com"]

This is a simplistic example, however, as many invalid emails are still satisfied by this regex.

`Capture text between double quotes` (`Capture text between double quotes`)

Suppose you have a string that contains something in double quotes, and you want to extract that content.

The best way to do so is by using a capturing group, because we know the match starts and ends with ", and we can easily target it, but we also want to remove those quotes from our result.

The best way to do so is by using a capturing group , because we know the match starts and ends with " , and we can easily target it, but we also want to remove those quotes from our result.

We’ll find what we need in result[1]:

We'll find what we need in result[1] :

const hello = 'Hello "nice flower"' const result = /"([^']*)"/.exec(hello) //Array [ "\"nice flower\"", "nice flower" ]

`Get the content inside an HTML tag` (`Get the content inside an HTML tag`)

For example get the content inside a span tag, allowing any number of arguments inside the tag:

/<span\b[^>]*>(.*?)&lt;\/span>/

/<span\b[^>]*>(.*?)<\/span>/.exec('test')// null

/<span\b[^>]*>(.*?)<\/span>/.exec('<span>test</span>') // ["&lt;span>test</span>", "test"]

/<span\b[^>]*>(.*?)<\/span>/.exec('<span class="x">test</span>') // ["<span class="x">test</span>", "test"]

Interested in learning JavaScript? Get my ebook at jshandbook.com
Interested in learning JavaScript? Get my ebook at jshandbook.com

翻譯自: https://www.freecodecamp.org/news/a-quick-and-simple-guide-to-javascript-regular-expressions-48b46a68df29/