JavaScript正則表達式快速簡單的指南

Interested in learning JavaScript? Get my ebook at jshandbook.com

有興趣學習JavaScript嗎? 在jshandbook.com上獲取我的電子書

正則表達式簡介 (Introduction to Regular Expressions)

A regular expression (also called regex for short) is a fast way to work with strings of text.

正則表達式(也簡稱為regex )是處理文本字符串的快速方法。

By formulating a regular expression with a special syntax, you can:

通過使用特殊語法制定正則表達式,您可以:

  • search for text in a string

    搜索字符串中的文本

  • replace substrings in a string

    替換字符串中的字符串

  • and extract information from a string

    并從字符串中提取信息

Almost every programming language features some implementation of regular expressions. There are small differences between each implementation, but the general concepts apply almost everywhere.

幾乎每種編程語言都具有一些正則表達式的實現。 每種實現之間的差異很小,但是一般概念幾乎適用于所有地方。

Regular Expressions date back to the 1950s, when they were formalized as a conceptual search pattern for string processing algorithms.

正則表達式的歷史可以追溯到1950年代,當時正則表達式被正規化為字符串處理算法的概念搜索模式。

Implemented in UNIX tools like grep, sed, and in popular text editors, regexes grew in popularity. They were introduced into the Perl programming language, and later into many others as well.

在UNIX工具(例如grep,sed)和流行的文本編輯器中實現后,正則表達式越來越流行。 它們被引入Perl編程語言,后來也引入許多其他語言。

JavaScript, along with Perl, is one of the programming languages that has support for regular expressions directly built into the language.

JavaScript與Perl一起是一種編程語言,它支持直接內置在該語言中的正則表達式。

困難但有用 (Hard but useful)

Regular expressions can seem like absolute nonsense to the beginner, and many times to the professional developer as well, if you don’t invest the time necessary to understand them.

如果您不花時間去理解正則表達式,那么對于初學者來說,正則表達式似乎絕對是胡說八道,對于專業開發人員而言,正則表達式也是如此。

Cryptic regular expressions are hard to write, hard to read, and hard to maintain/modify.

隱秘的正則表達式很難編寫難以閱讀 ,并且難以維護/修改

But sometimes a regular expression is the only sane way to perform some string manipulation, so it’s a very valuable tool in your pocket.

但是有時,正則表達式是執行某些字符串操作的唯一明智的方法 ,因此它是您口袋中非常有價值的工具。

This tutorial aims to introduce you to JavaScript Regular Expressions in a simple way, and to give you all the information to read and create regular expressions.

本教程旨在以簡單的方式向您介紹JavaScript正則表達式,并為您提供閱讀和創建正則表達式的所有信息。

The rule of thumb is that simple regular expressions are simple to read and write, while complex regular expressions can quickly turn into a mess if you don’t deeply grasp the basics.

經驗法則是, 簡單的正則表達式易于 讀寫 ,而如果您不太了解基本知識,則復雜的正則表達式會很快變成一團糟

正則表達式是什么樣的? (What does a Regular Expression look like?)

In JavaScript, a regular expression is an object, which can be defined in two ways.

在JavaScript中,正則表達式是object ,可以通過兩種方式定義。

The first is by instantiating a new RegExp object using the constructor:

首先是通過使用構造函數實例化新的RegExp對象

const re1 = new RegExp('hey')

The second is using the regular expression literal form:

第二種是使用正則表達式文字形式:

const re1 = /hey/

You know that JavaScript has object literals and array literals? It also has regex literals.

您知道JavaScript有對象文字數組文字嗎? 它還具有正則表達式文字

In the example above, hey is called the pattern. In the literal form it’s delimited by forward slashes, while with the object constructor, it’s not.

在上面的示例中, hey被稱為pattern 。 在文字形式中,它由正斜杠定界,而在對象構造函數中則不是。

This is the first important difference between the two forms, but we’ll see others later.

這是這兩種形式之間的第一個重要區別,但稍后我們將介紹其他形式。

它是如何工作的? (How does it work?)

The regular expression we defined as re1 above is a very simple one. It searches the string hey, without any limitation. The string can contain lots of text, and hey in the middle, and the regex is satisfied. It could also contain just hey, and the regex would be satisfied as well.

我們在上面定義為re1的正則表達式是一個非常簡單的表達式。 它搜索字符串hey ,沒有任何限制。 該字符串可以包含很多文本,中間是hey ,并且滿足正則表達式。 它也可能只包含hey ,并且正則表達式也將得到滿足。

That’s pretty simple.

那很簡單。

You can test the regex using RegExp.test(String), which returns a boolean:

您可以使用RegExp.test(String)來測試正則表達式,它返回一個布爾值:

re1.test('hey') //? re1.test('blablabla hey blablabla') //? re1.test('he') //? re1.test('blablabla') //?

In the above example, we just checked if "hey" satisfies the regular expression pattern stored in re1.

在上面的示例中,我們只是檢查"hey"滿足存儲在re1的正則表達式模式。

This is the simplest it can be, but now you already know lots of concepts about regexes.

這可能是最簡單的,但是現在您已經了解了許多有關正則表達式的概念。

錨定 (Anchoring)

/hey/

matches hey wherever it was put inside the string.

匹配hey無論它放在字符串中的什么位置。

If you want to match strings that start with hey, use the ^ operator:

如果要匹配以hey 開頭的字符串,請使用^運算符:

/^hey/.test('hey') //? /^hey/.test('bla hey') //?

If you want to match strings that end with hey, use the $ operator:

如果要匹配以hey 結尾的字符串,請使用$運算符:

/hey$/.test('hey') //? /hey$/.test('bla hey') //? /hey$/.test('hey you') //?

Combine those, and match strings that exactly match hey, and just that string:

合并這些,并匹配與hey完全匹配的字符串,然后匹配該字符串:

/^hey$/.test('hey') //?

To match a string that starts with a substring and ends with another, you can use .*, which matches any character repeated 0 or more times:

要匹配以子字符串開頭和以子字符串結尾的字符串,可以使用.* ,它匹配重復0次或多次的任何字符:

/^hey.*joe$/.test('hey joe') //? /^hey.*joe$/.test('heyjoe') //? /^hey.*joe$/.test('hey how are you joe') //? /^hey.*joe$/.test('hey joe!') //?

匹配范圍內的項目 (Match items in ranges)

Instead of matching a particular string, you can choose to match any character in a range, like:

您可以選擇匹配范圍內的任何字符,而不是匹配特定的字符串,例如:

/[a-z]/ //a, b, c, ... , x, y, z /[A-Z]/ //A, B, C, ... , X, Y, Z /[a-c]/ //a, b, c /[0-9]/ //0, 1, 2, 3, ... , 8, 9

These regexes match strings that contain at least one of the characters in those ranges:

這些正則表達式匹配包含以下范圍內至少一個字符的字符串:

/[a-z]/.test('a') //? /[a-z]/.test('1') //? /[a-z]/.test('A') //? /[a-c]/.test('d') //? /[a-c]/.test('dc') //?

Ranges can be combined:

范圍可以組合:

/[A-Za-z0-9]/
/[A-Za-z0-9]/.test('a') //? /[A-Za-z0-9]/.test('1') //? /[A-Za-z0-9]/.test('A') //?

多次匹配范圍項 (Matching a range item multiple times)

You can check if a string contains one and only one character in a range by using the - char:

您可以檢查是否字符串包含一個且只有一個在一個范圍內使用字符-字符:

/^[A-Za-z0-9]$/
/^[A-Za-z0-9]$/.test('A') //? /^[A-Za-z0-9]$/.test('Ab') //?

否定模式 (Negating a pattern)

The ^ character at the beginning of a pattern anchors it to the beginning of a string.

模式開頭的^字符會將其錨定到字符串的開頭。

Used inside a range, it negates it, so:

在范圍內使用時,它會否定它,因此:

/[^A-Za-z0-9]/.test('a') //? /[^A-Za-z0-9]/.test('1') //? /[^A-Za-z0-9]/.test('A') //? /[^A-Za-z0-9]/.test('@') //?
  • \d matches any digit, equivalent to [0-9]

    \d匹配任何數字,等于[0-9]

  • \D matches any character that’s not a digit, equivalent to [^0-9]

    \D匹配任何不是數字的字符,等效于[^0-9]

  • \w matches any alphanumeric character, equivalent to [A-Za-z0-9]

    \w匹配任何字母數字字符,等效于[A-Za-z0-9]

  • \W matches any non-alphanumeric character, equivalent to [^A-Za-z0-9]

    \W匹配任何非字母數字字符,等效于[^A-Za-z0-9]

  • \s matches any whitespace character: spaces, tabs, newlines and Unicode spaces

    \s匹配任何空白字符:空格,制表符,換行符和Unicode空格

  • \S matches any character that’s not a whitespace

    \S匹配任何非空格字符

  • \0 matches null

    \0匹配null

  • \n matches a newline character

    \n匹配換行符

  • \t matches a tab character

    \t匹配制表符

  • \uXXXX matches a unicode character with code XXXX (requires the u flag)

    \uXXXX將一個Unicode字符與代碼XXXX匹配(需要u標志)

  • . matches any character that is not a newline char (e.g. \n) (unless you use the s flag, explained later on)

    . 匹配不是換行符的任何字符(例如\n )(除非您使用s標志,稍后再解釋)

  • [^] matches any character, including newline characters. It’s useful on multiline strings.

    [^]匹配任何字符,包括換行符。 在多行字符串上很有用。

正則表達式選擇 (Regular expression choices)

If you want to search one string or another, use the | operator.

如果要搜索一個另一個字符串,請使用| 操作員。

/hey|ho/.test('hey') //? /hey|ho/.test('ho') //?

量詞 (Quantifiers)

Say you have this regex that checks if a string has one digit in it, and nothing else:

假設您有這個正則表達式,用于檢查字符串中是否包含一位數字,而沒有其他內容:

/^\d$/

You can use the ? quantifier to make it optional, thus requiring zero or one:

您可以使用? 量詞以使其為可選,因此需要零或一:

/^\d?$/

but what if you want to match multiple digits?

但是如果要匹配多個數字怎么辦?

You can do it in 4 ways, using +, *, {n} and {n,m}. Let’s look at these one by one.

您可以使用+*{n}{n,m}四種方式來實現。 讓我們一一看一下。

+ (+)

Match one or more (>=1) items

匹配一個或多個(> = 1)項目

/^\d+$/
/^\d+$/.test('12') //? /^\d+$/.test('14') //? /^\d+$/.test('144343') //? /^\d+$/.test('') //? /^\d+$/.test('1a') //?

* (*)

Match 0 or more (>= 0) items

匹配0個或更多(> = 0)項目

/^\d+$/
/^\d*$/.test('12') //? /^\d*$/.test('14') //? /^\d*$/.test('144343') //? /^\d*$/.test('') //? /^\d*$/.test('1a') //?

{n} ({n})

Match exactly n items

完全匹配n項目

/^\d{3}$/
/^\d{3}$/.test('123') //? /^\d{3}$/.test('12') //? /^\d{3}$/.test('1234') //? /^[A-Za-z0-9]{3}$/.test('Abc') //?

{n,m} ({n,m})

Match between n and m times:

nm次之間匹配:

/^\d{3,5}$/
/^\d{3,5}$/.test('123') //? /^\d{3,5}$/.test('1234') //? /^\d{3,5}$/.test('12345') //? /^\d{3,5}$/.test('123456') //?

m can be omitted to have an open ending, so you have at least n items:

m可以省略以具有一個開放的結尾,因此您至少有n項目:

/^\d{3,}$/
/^\d{3,}$/.test('12') //? /^\d{3,}$/.test('123') //? /^\d{3,}$/.test('12345') //? /^\d{3,}$/.test('123456789') //?

可選項目 (Optional items)

Following an item with ? makes it optional:

以下項目帶有? 使它成為可選的:

/^\d{3}\w?$/
/^\d{3}\w?$/.test('123') //? /^\d{3}\w?$/.test('123a') //? /^\d{3}\w?$/.test('123ab') //?

團體 (Groups)

Using parentheses, you can create groups of characters: (...)

使用括號可以創建字符組: (...)

This example matches exactly 3 digits followed by one or more alphanumeric characters:

本示例完全匹配3個數字,后跟一個或多個字母數字字符:

/^(\d{3})(\w+)$/
/^(\d{3})(\w+)$/.test('123') //? /^(\d{3})(\w+)$/.test('123s') //? /^(\d{3})(\w+)$/.test('123something') //? /^(\d{3})(\w+)$/.test('1234') //?

Repetition characters put after a group closing parentheses refer to the whole group:

分組結束括號后的重復字符是指整個分組:

/^(\d{2})+$/
/^(\d{2})+$/.test('12') //? /^(\d{2})+$/.test('123') //? /^(\d{2})+$/.test('1234') //?

捕獲組 (Capturing groups)

So far, we’ve seen how to test strings and check if they contain a certain pattern.

到目前為止,我們已經看到了如何測試字符串并檢查它們是否包含特定模式。

A very cool feature of regular expressions is the ability to capture parts of a string, and put them into an array.

正則表達式的一個非常酷的功能是能夠捕獲字符串的各個部分 ,并將它們放入數組中。

You can do so using Groups, and in particular Capturing Groups.

您可以使用“組”,尤其是“ 捕獲組”來執行此操作。

By default, a Group is a Capturing Group. Now, instead of using RegExp.test(String), which just returns a boolean if the pattern is satisfied, we use either String.match(RegExp) or RegExp.exec(String).

默認情況下,組是捕獲組。 現在,我們使用String.match(RegExp)RegExp.exec(String) ,而不是使用RegExp.test(String)如果滿足模式則僅返回布爾值RegExp.exec(String)

They are exactly the same, and return an Array with the whole matched string in the first item, then each matched group content.

它們是完全相同的,并返回一個數組,該數組的第一行是整個匹配的字符串,然后是每個匹配的組內容。

If there is no match, it returns null:

如果不匹配,則返回null

'123s'.match(/^(\d{3})(\w+)$/) //Array [ "123s", "123", "s" ]
/^(\d{3})(\w+)$/.exec('123s') //Array [ "123s", "123", "s" ]
'hey'.match(/(hey|ho)/) //Array [ "hey", "hey" ]
/(hey|ho)/.exec('hey') //Array [ "hey", "hey" ]
/(hey|ho)/.exec('ha!') //null

When a group is matched multiple times, only the last match is put in the result array:

當一個組被多次匹配時,只有最后一個匹配項被放入結果數組中:

'123456789'.match(/(\d)+/) //Array [ "123456789", "9" ]

可選組 (Optional groups)

A capturing group can be made optional by using (...)?. If it’s not found, the resulting array slot will contain undefined:

可以使用(...)?將捕獲組設為可選組(...)? 。 如果找不到,則結果數組插槽將包含undefined

/^(\d{3})(\s)?(\w+)$/.exec('123 s') //Array [ "123 s", "123", " ", "s" ]
/^(\d{3})(\s)?(\w+)$/.exec('123s') //Array [ "123s", "123", undefined, "s" ]

參考匹配組 (Reference matched groups)

Every group that’s matched is assigned a number. $1 refers to the first, $2 to the second, and so on. This will be useful when we talk later on about replacing parts of a string.

每個匹配的組都會分配一個數字。 $1指向第一個, $2指向第二個,依此類推。 當我們稍后討論替換字符串的部分時,這將很有用。

命名捕獲組 (Named capturing groups)

This is a new ES2018 feature.

這是ES2018的新功能。

A group can be assigned to a name, rather than just being assigned a slot in the resulting array:

可以為一個組分配一個名稱,而不僅僅是在結果數組中分配一個插槽:

const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/ const result = re.exec('2015-01-02')
// result.groups.year === '2015'; // result.groups.month === '01'; // result.groups.day === '02';

在沒有組的情況下使用match和exec (Using match and exec without groups)

There is a difference between using match and exec without groups: the first item in the array is not the whole matched string, but the match directly:

在不使用分組的情況下使用matchexec之間是有區別的:數組中的第一項不是整個匹配的字符串,而是直接匹配:

/hey|ho/.exec('hey') // [ "hey" ]
/(hey).(ho)/.exec('hey ho') // [ "hey ho", "hey", "ho" ]

非捕獲組 (Noncapturing groups)

Since by default groups are Capturing Groups, you need a way to ignore some groups in the resulting array. This is possible using Noncapturing Groups, which start with a (?:...)

由于默認情況下組是捕獲組,因此您需要一種方法來忽略結果數組中的某些組。 這可以使用非捕獲組 (?:...)(?:...)開頭

'123s'.match(/^(\d{3})(?:\s)(\w+)$/)//null
'123 s'.match(/^(\d{3})(?:\s)(\w+)$/) //Array [ "123 s", "123", "s" ]

標志 (Flags)

You can use the following flags on any regular expression:

您可以在任何正則表達式上使用以下標志:

  • g: matches the pattern multiple times

    g :多次匹配模式

  • i: makes the regex case insensitive

    i :使正則表達式不區分大小寫

  • m: enables multiline mode. In this mode, ^ and $ match the start and end of the whole string. Without this, with multiline strings they match the beginning and end of each line.

    m :啟用多行模式。 在此模式下, ^$匹配整個字符串的開始和結束。 否則,多行字符串將匹配每行的開頭和結尾。

  • u: enables support for unicode (introduced in ES6/ES2015)

    u :啟用對unicode的支持(在ES6 / ES2015中引入)

  • s: (new in ES2018) short for single line, it causes the . to match new line characters as well.

    s :( ES2018中的新增功能 ) 單行的簡稱,它會導致. 以匹配換行符。

Flags can be combined, and they are added at the end of the string in regex literals:

可以組合標志,并在正則表達式文字中的字符串末尾添加標志:

/hey/ig.test('HEy') //?

or as the second parameter with RegExp object constructors:

或作為RegExp對象構造函數的第二個參數:

new RegExp('hey', 'ig').test('HEy') //?

檢查正則表達式 (Inspecting a regex)

Given a regex, you can inspect its properties:

給定一個正則表達式,您可以檢查其屬性:

  • source the pattern string

    source模式字符串

  • multiline true with the m flag

    m標志的multiline true

  • global true with the g flag

    g標志的global true

  • ignoreCase true with the i flag

    ignoreCasei標志一起為true

  • lastIndex

    lastIndex

/^(\w{3})$/i.source //"^(\\d{3})(\\w+)$" /^(\w{3})$/i.multiline //false /^(\w{3})$/i.lastIndex //0 /^(\w{3})$/i.ignoreCase //true /^(\w{3})$/i.global //false

轉義 (Escaping)

These characters are special:

這些字符是特殊的:

  • \

    \

  • /

    /

  • [ ]

    [ ]

  • ( )

    ( )

  • { }

    { }

  • ?

    ?

  • +

    +

  • *

    *

  • |

    |

  • .

    .

  • ^

    ^

  • $

    $

They are special because they are control characters that have a meaning in the regular expression pattern. If you want to use them inside the pattern as matching characters, you need to escape them, by prepending a backslash:

它們之所以特別是因為它們是在正則表達式模式中具有含義的控制字符。 如果要在模式中將它們用作匹配字符,則需要通過在其前面加上反斜杠來對其進行轉義:

/^\\$/ /^\^$/ // /^\^$/.test('^') ? /^\$$/ // /^\$$/.test('$') ?

字符串邊界 (String boundaries)

\b and \B let you inspect whether a string is at the beginning or at the end of a word:

\b\B讓您檢查字符串是在單詞的開頭還是結尾:

  • \b matches a set of characters at the beginning or end of a word

    \b匹配單詞開頭或結尾的一組字符

  • \B matches a set of characters not at the beginning or end of a word

    \B匹配不在單詞開頭或結尾的一組字符

Example:

例:

'I saw a bear'.match(/\bbear/) //Array ["bear"] 'I saw a beard'.match(/\bbear/) //Array ["bear"] 'I saw a beard'.match(/\bbear\b/) //null 'cool_bear'.match(/\bbear\b/) //null

使用正則表達式替換 (Replace, using Regular Expressions)

We already saw how to check if a string contains a pattern.

我們已經看到了如何檢查字符串是否包含模式。

We also saw how to extract parts of a string to an array, matching a pattern.

我們還看到了如何將字符串的一部分提取到與模式匹配的數組中。

Let’s see how to replace parts of a string based on a pattern.

讓我們看看如何根據模式替換字符串各個部分

The String object in JavaScript has a replace() method, which can be used without regular expressions to perform a single replacement on a string:

JavaScript中的String對象具有replace()方法,無需使用正則表達式就可以對字符串執行單個替換

"Hello world!".replace('world', 'dog') //Hello dog!
"My dog is a good dog!".replace('dog', 'cat') //My cat is a good dog!

This method also accepts a regular expression as argument:

此方法還接受正則表達式作為參數:

"Hello world!".replace(/world/, 'dog') //Hello dog!

Using the g flag is the only way to replace multiple occurrences in a string in vanilla JavaScript:

使用g標志是替換香草JavaScript字符串中多次出現的唯一方法

"My dog is a good dog!".replace(/dog/g, 'cat') //My cat is a good cat!

Groups let us do more fancy things, like moving around parts of a string:

組讓我們做更多更有趣的事情,例如在字符串的各個部分之間移動:

"Hello, world!".replace(/(\w+), (\w+)!/, '$2: $1!!!') // "world: Hello!!!"

Instead of using a string you can use a function, to do even fancier things. It will receive a number of arguments like the one returned by String.match(RegExp) or RegExp.exec(String), with a number of arguments that depends on the number of groups:

除了使用字符串,您還可以使用函數來執行更出色的操作。 它將收到許多參數,例如String.match(RegExp)RegExp.exec(String)返回的參數,其中的參數取決于組的數量:

"Hello, world!".replace(/(\w+), (\w+)!/, (matchedString, first, second) => {   console.log(first);   console.log(second);
return `${second.toUpperCase()}: ${first}!!!` })
//"WORLD: Hello!!!"

貪婪 (Greediness)

Regular expressions are said to be greedy by default.

正則表達式默認說是貪婪的。

What does it mean?

這是什么意思?

Take this regex:

使用此正則表達式:

/\$(.+)\s?/

It is supposed to extract a dollar amount from a string:

應該從字符串中提取美元金額:

/\$(.+)\s?/.exec('This costs $100')[1] //100

but if we have more words after the number, it freaks out:

但是如果我們在數字后再加上一些字,它就會嚇到了:

/\$(.+)\s?/.exec('This costs $100 and it is less than $200')[1] //100 and it is less than $200

Why? Because the regex after the $ sign matches any character with .+, and it won’t stop until it reaches the end of the string. Then, it finishes off because \s? makes the ending space optional.

為什么? 因為$符號后的正則表達式與.+匹配任何字符,并且直到到達字符串末尾時它才會停止。 然后,它結束了,因為\s? 使結尾空間為可選。

To fix this, we need to tell the regex to be lazy, and perform the least amount of matching possible. We can do so using the ? symbol after the quantifier:

要解決此問題,我們需要告訴正則表達式是惰性的,并執行盡可能少的匹配。 我們可以使用? 量詞后的符號:

/\$(.+?)\s/.exec('This costs $100 and it is less than $200')[1] //100

I removed the ? after \s . Otherwise it matched only the first number, since the space was optional

我刪除了? \s 否則,它僅與第一個數字匹配,因為空格是可選的

So, ? means different things based on its position, because it can be both a quantifier and a lazy mode indicator.

那么, ? 根據其位置,意味著不同的事物,因為它既可以是量詞又可以是惰性模式指示符。

前瞻:根據字符串匹配字符串 (Lookaheads: match a string depending on what follows it)

Use ?= to match a string that’s followed by a specific substring:

使用?=匹配后面跟特定子字符串的字符串:

/Roger(?=Waters)/
/Roger(?= Waters)/.test('Roger is my dog') //false /Roger(?= Waters)/.test('Roger is my dog and Roger Waters is a famous musician') //true

?! performs the inverse operation, matching if a string is not followed by a specific substring:

?! 執行逆運算,如果字符串后沒有特定的子字符串則匹配:

/Roger(?!Waters)/
/Roger(?! Waters)/.test('Roger is my dog') //true /Roger(?! Waters)/.test('Roger Waters is a famous musician') //false

Lookbehinds:根據字符串的開頭匹配字符串 (Lookbehinds: match a string depending on what precedes it)

This is an ES2018 feature.

這是ES2018的功能。

Lookaheads use the ?= symbol. Lookbehinds use ?&lt;=.

提前使用?=符號。 后視使用?&l t; =。

/(?<=Roger) Waters/
/(?<=Roger) Waters/.test('Pink Waters is my dog') //false
/(?<=Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician') //true

A lookbehind is negated using ?&lt;!:

使用?&l t ;!可以使后向否定:

/(?<!Roger) Waters/
/(?<!Roger) Waters/.test('Pink Waters is my dog') //true
/(?<!Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician') //false

正則表達式和Unicode (Regular expressions and Unicode)

The u flag is mandatory when working with Unicode strings. In particular, this applies when you might need to handle characters in astral planes (the ones that are not included in the first 1600 Unicode characters).

使用Unicode字符串時, u標志是必需的。 特別是,當您可能需要處理星體平面中的字符(前1600個Unicode字符中未包含的字符)時,這適用。

Emojis are a good example, but they’re not the only one.

表情符號是一個很好的例子,但并非唯一的表情符號。

If you don’t add that flag, this simple regex that should match one character will not work, because for JavaScript that emoji is represented internally by 2 characters (see Unicode in JavaScript):

如果不添加該標志,則此應匹配一個字符的簡單正則表達式將不起作用,因為對于JavaScript,表情符號在內部由2個字符表示(請參見JavaScript中的Unicode ):

/^.$/.test('a') //? /^.$/.test('?') //? /^.$/u.test('?') //?

So, always use the u flag.

So, always use the u flag.

Unicode, just like normal characters, handle ranges:

Unicode, just like normal characters, handle ranges:

/[a-z]/.test('a') //? /[1-9]/.test('1') //? /[?-?]/u.test('?') //? /[?-?]/u.test('?') //?

JavaScript checks the internal code representation, so ? < ? < ? because \u1F436 < \u1F43A <; \u1F98A. Check the full Emoji list to get those codes, and to find out the order (tip: the macOS Emoji picker has some emojis in a mixed order, so don’t count on it).

JavaScript checks the internal code representation, so ? < ? < ? becau se \u1F 436 < \ u1F 43A < ; \u1F98A. C heck the full E moji list to get those codes, and to find out the order (tip: the macOS Emoji picker has some emojis in a mixed order, so don't count on it).

Unicode property escapes (Unicode property escapes)

As we saw above, in a regular expression pattern you can use \d to match any digit, \s to match any character that’s not a white space, \w to match any alphanumeric character, and so on.

As we saw above, in a regular expression pattern you can use \d to match any digit, \s to match any character that's not a white space, \w to match any alphanumeric character, and so on.

The Unicode property escapes is an ES2018 feature that introduces a very cool feature, extending this concept to all Unicode characters introducing \p{} and its negation \P{}.

The Unicode property escapes is an ES2018 feature that introduces a very cool feature, extending this concept to all Unicode characters introducing \p{} and its negation \P{} .

Any Unicode character has a set of properties. For example Script determines the language family, ASCII is a boolean that’s true for ASCII characters, and so on. You can put this property in the graph parentheses, and the regex will check for that to be true:

Any Unicode character has a set of properties. For example Script determines the language family, ASCII is a boolean that's true for ASCII characters, and so on. You can put this property in the graph parentheses, and the regex will check for that to be true:

/^\p{ASCII}+$/u.test('abc') //? /^\p{ASCII}+$/u.test('ABC@') //? /^\p{ASCII}+$/u.test('ABC?') //?

ASCII_Hex_Digit is another boolean property that checks if the string only contains valid hexadecimal digits:

ASCII_Hex_Digit is another boolean property that checks if the string only contains valid hexadecimal digits:

/^\p{ASCII_Hex_Digit}+$/u.test('0123456789ABCDEF') //? /^\p{ASCII_Hex_Digit}+$/u.test('h') //?

There are many other boolean properties, which you just check by adding their name in the graph parentheses, including Uppercase, Lowercase, White_Space, Alphabetic, Emoji and more:

There are many other boolean properties, which you just check by adding their name in the graph parentheses, including Uppercase , Lowercase , White_Space , Alphabetic , Emoji and more:

/^\p{Lowercase}$/u.test('h') //? /^\p{Uppercase}$/u.test('H') //?
/^\p{Emoji}+$/u.test('H') //? /^\p{Emoji}+$/u.test('??') //?

In addition to those binary properties, you can check any of the unicode character properties to match a specific value. In this example, I check if the string is written in the Greek or Latin alphabet:

In addition to those binary properties, you can check any of the unicode character properties to match a specific value. In this example, I check if the string is written in the Greek or Latin alphabet:

/^\p{Script=Greek}+$/u.test('ελληνικ?') //? /^\p{Script=Latin}+$/u.test('hey') //?

Read more about all the properties you can use directly on the proposal.

Read more about all the properties you can use directly on the proposal .

Examples (Examples)

Supposing a string has only one number you need to extract, /\d+/ should do it:

Supposing a string has only one number you need to extract, /\d+/ should do it:

'Test 123123329'.match(/\d+/) // Array [ "123123329" ]

Match an email address (Match an email address)

A simplistic approach is to check non-space characters before and after the @ sign, using \S:

A simplistic approach is to check non-space characters before and after the @ sign, using \S :

/(\S+)@(\S+)\.(\S+)/
/(\S+)@(\S+)\.(\S+)/.exec('copesc@gmail.com') //["copesc@gmail.com", "copesc", "gmail", "com"]

This is a simplistic example, however, as many invalid emails are still satisfied by this regex.

This is a simplistic example, however, as many invalid emails are still satisfied by this regex.

Capture text between double quotes (Capture text between double quotes)

Suppose you have a string that contains something in double quotes, and you want to extract that content.

Suppose you have a string that contains something in double quotes, and you want to extract that content.

The best way to do so is by using a capturing group, because we know the match starts and ends with ", and we can easily target it, but we also want to remove those quotes from our result.

The best way to do so is by using a capturing group , because we know the match starts and ends with " , and we can easily target it, but we also want to remove those quotes from our result.

We’ll find what we need in result[1]:

We'll find what we need in result[1] :

const hello = 'Hello "nice flower"' const result = /"([^']*)"/.exec(hello) //Array [ "\"nice flower\"", "nice flower" ]

Get the content inside an HTML tag (Get the content inside an HTML tag)

For example get the content inside a span tag, allowing any number of arguments inside the tag:

For example get the content inside a span tag, allowing any number of arguments inside the tag:

/<span\b[^>]*>(.*?)&lt;\/span>/
/<span\b[^>]*>(.*?)<\/span>/.exec('test')// null
/<span\b[^>]*>(.*?)<\/span>/.exec('<span>test</span>') // ["&lt;span>test</span>", "test"]
/<span\b[^>]*>(.*?)<\/span>/.exec('<span class="x">test</span>') // ["<span class="x">test</span>", "test"]

Interested in learning JavaScript? Get my ebook at jshandbook.com

Interested in learning JavaScript? Get my ebook at jshandbook.com

翻譯自: https://www.freecodecamp.org/news/a-quick-and-simple-guide-to-javascript-regular-expressions-48b46a68df29/

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/394054.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/394054.shtml
英文地址,請注明出處:http://en.pswp.cn/news/394054.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

leetcode104. 二叉樹的最大深度(dfs)

給定一個二叉樹&#xff0c;找出其最大深度。二叉樹的深度為根節點到最遠葉子節點的最長路徑上的節點數。說明: 葉子節點是指沒有子節點的節點。示例&#xff1a; 給定二叉樹 [3,9,20,null,null,15,7]&#xff0c;3/ \9 20/ \15 7 返回它的最大深度 3 。代碼 class Soluti…

[解讀REST] 3.基于網絡應用的架構

鏈接上文[解讀REST] 2.REST用來干什么的&#xff1f;&#xff0c;上文中解釋到什么是架構風格和應該以怎樣的視角來理解REST&#xff08;Web的架構風格&#xff09;。本篇來介紹一組自洽的術語&#xff0c;用它來描述和解釋軟件架構&#xff1b;以及列舉下對于基于網絡的應用來…

js判斷對象還是數組

1.對于Javascript 1.8.5&#xff08;ECMAScript 5&#xff09;&#xff0c;變量名字.isArray( )可以實現這個目的 var a[]; var b{}; Array.isArray(a);//true Array.isArray(b)//false 2.如果你只是用typeof來檢查該變量&#xff0c;不論是array還是object&#xff0c;都將返回…

mysql 除去列名打印_sql – 使用beeline時避免在列名中打印表名

在beeline中使用hive時使用簡單的select查詢我想在列名中返回沒有表名的表作為默認值.例數據CREATE TABLE IF NOT EXISTS employee ( eid int, name String,salary String, destination String)COMMENT Employee detailsROW FORMAT DELIMITEDFIELDS TERMINATED BY \tLINES TERM…

移動應用程序和網頁應用程序_如何開發感覺像本機移動應用程序的漸進式Web應用程序...

移動應用程序和網頁應用程序by Samuele Dassatti通過薩穆爾達薩蒂 如何開發感覺像本機移動應用程序的漸進式Web應用程序 (How you can develop Progressive Web Apps that feel like native mobile apps) I’m currently developing a Progressive Web App that will also ser…

leetcode1162. 地圖分析(bfs)

你現在手里有一份大小為 N x N 的「地圖」&#xff08;網格&#xff09; grid&#xff0c;上面的每個「區域」&#xff08;單元格&#xff09;都用 0 和 1 標記好了。其中 0 代表海洋&#xff0c;1 代表陸地&#xff0c;請你找出一個海洋區域&#xff0c;這個海洋區域到離它最近…

mysql修改root密碼的方法

在 Navicat for MySQL 下面直接執行 SET PASSWORD FOR rootlocalhost PASSWORD(newpass); 就可以 方法1&#xff1a; 用SET PASSWORD命令 mysql -u root mysql> SET PASSWORD FOR rootlocalhost PASSWORD(newpass); 方法2&#xff1a;用mysqladmin mysqladmin -u root …

android 上下偏差怎么寫_詳解 Android 熱更新升級如何突破底層結構差異?

知道了 native 替換方式兼容性問題的原因&#xff0c;我們是否有辦法尋求一種新的方式&#xff0c;不依賴于 ROM 底層方法結構的實現而達到替換效果呢&#xff1f;我們發現&#xff0c;這樣 native 層面替換思路&#xff0c;其實就是替換 ArtMethod 的所有成員。那么&#xff0…

Python3 Flask+nginx+Gunicorn部署(上)

前言&#xff1a;一般在本地運行flask項目通常是直接python3 文件名.py&#xff0c;然后打開&#xff1a;http://127.0.0.1:5000 查看代碼結果 這次主要是記錄flask在python3 環境結合nginx gunicorn在服務器上進行項目的部署 &#xff08;一&#xff09;運行環境&#xff1a;虛…

NOIP2011 鋪地毯

題目描述 為了準備一個獨特的頒獎典禮&#xff0c;組織者在會場的一片矩形區域&#xff08;可看做是平面直角坐標系的第一象限&#xff09;鋪上一些矩形地毯&#xff0c;一共有n張地毯&#xff0c;編號從 1 到n。現在將這些地毯按照編號從小到大的順序平行于坐標軸先后鋪設&…

java lock可重入_Java源碼解析之可重入鎖ReentrantLock

本文基于jdk1.8進行分析。ReentrantLock是一個可重入鎖&#xff0c;在ConcurrentHashMap中使用了ReentrantLock。首先看一下源碼中對ReentrantLock的介紹。如下圖。ReentrantLock是一個可重入的排他鎖&#xff0c;它和synchronized的方法和代碼有著相同的行為和語義&#xff0c…

matlab的qammod函數_基于-MATLAB下的16QAM仿真.doc

1.課程設計目的隨著現代通信技術的發展&#xff0c;特別是移動通信技術高速發展&#xff0c;頻帶利用率問題越來越被人們關注。在頻譜資源非常有限的今天&#xff0c;傳統通信系統的容量已經不能滿足當前用戶的要求。正交幅度調制QAM(Quadrature Amplitude Modulation)以其高頻…

POJ3264 【RMQ基礎題—ST-線段樹】

ST算法Code&#xff1a; //#include<bits/stdc.h> #include<cstdio> #include<math.h> #include<iostream> #include<queue> #include<algorithm> #include<string.h> using namespace std; typedef long long LL;const int N5e410;…

leetcode199. 二叉樹的右視圖(bfs)

給定一棵二叉樹&#xff0c;想象自己站在它的右側&#xff0c;按照從頂部到底部的順序&#xff0c;返回從右側所能看到的節點值。示例:輸入: [1,2,3,null,5,null,4] 輸出: [1, 3, 4] 解釋:1 <---/ \ 2 3 <---\ \5 4 <---解題思…

開發人員工作周報_如何增加找到開發人員工作的機會

開發人員工作周報In a recent job as a senior developer, I helped interview and hire many of my employer’s development team members. This is a brain dump of my advice based on those interviews.在最近擔任高級開發人員的工作中&#xff0c;我幫助面試和雇用了許多…

安全專家教你如何利用Uber系統漏洞無限制的免費乘坐?

本文講的是安全專家教你如何利用Uber系統漏洞無限制的免費乘坐&#xff1f;&#xff0c;近日&#xff0c;根據外媒報道&#xff0c;美國一名安全研究人員發現Uber上存在一處安全漏洞&#xff0c;允許發現這一漏洞的任何用戶在全球范圍內免費享受Uber乘車服務。據悉&#xff0c;…

flume介紹

flume 1.flume是什么 Flume:** Flume是Cloudera提供的一個高可用的&#xff0c;高可靠的&#xff0c;分布式的海量日志采集、傳輸、聚合的系統。** Flume僅僅運行在linux環境下** flume.apache.org(Documentation--Flume User Guide) Flume體系結構(Architecture)&#xff1a; …

threadx 信號量 應用_操作系統及ThreadX簡介.ppt

操作系統及ThreadX簡介操作系統及ThreadX簡介 軟件二部 2006.09 主要內容 多任務操作系統概述 ThreadX簡介 關于驅動的交流 操作系統概述 什么是操作系統 管理計算機的所有資源&#xff0c;并為應用程序提供服務的最重要的系統軟件 操作系統的目的 為用戶編程提供簡單的接口&am…

java中同步組件_Java并發編程(自定義同步組件)

并發包結構圖&#xff1a;編寫一個自定義同步組件來加深對同步器的理解業務要求&#xff1a;* 編寫一個自定義同步組件來加深對同步器的理解。* 設計一個同步工具&#xff1a;該工具在同一時刻&#xff0c;只允許至多兩個線程同時訪問&#xff0c;超過兩個線程的* 訪問將被阻塞…

maven學習資料

maven學習資料maven學習教程&#xff1a;What、How、Whyhttp://www.flyne.org/article/167Maven 那點事兒 https://my.oschina.net/huangyong/blog/194583項目管理工具&#xff1a;Maven教程http://www.flyne.org/article/884轉載于:https://www.cnblogs.com/zhao1949/p/634641…