Unicode 字元類轉義：\p{...}, \P{...}

Baseline 已廣泛支援

此特性已經十分成熟，可在許多裝置和瀏覽器版本上使用。自 2020 年 7 月以來，它已在各大瀏覽器中可用。

Unicode 字元類別轉義是一種字元類別轉義，它匹配由 Unicode 屬性指定的一組字元。它僅在Unicode 感知模式下受支援。當啟用v 標誌時，它也可以用於匹配有限長度的字串。

試一試

const sentence = "A ticket to 大阪 costs ¥2000 👌.";

const regexpEmojiPresentation = /\p{Emoji_Presentation}/gu;
console.log(sentence.match(regexpEmojiPresentation));
// Expected output: Array ["👌"]

const regexpNonLatin = /\P{Script_Extensions=Latin}+/gu;
console.log(sentence.match(regexpNonLatin));
// Expected output: Array [" ", " ", " 大阪 ", " ¥2000 👌."]

const regexpCurrencyOrPunctuation = /\p{Sc}|\p{P}/gu;
console.log(sentence.match(regexpCurrencyOrPunctuation));
// Expected output: Array ["¥", "."]

語法

正則表示式

\p{loneProperty}
\P{loneProperty}

\p{property=value}
\P{property=value}

引數

loneProperty（獨立屬性）: 一個獨立的 Unicode 屬性名或值，其語法與 value 相同。它指定 General_Category 屬性的值，或一個二元屬性名。在v 模式下，它也可以是字串的二元 Unicode 屬性。

注意：ICU 語法也允許省略 Script 屬性名，但 JavaScript 不支援這一點，因為大多數情況下 Script_Extensions 比 Script 更有用。
property（屬性）: 一個 Unicode 屬性名。必須由ASCII 字母（A–Z, a–z）和下劃線（_）組成，並且必須是非二元屬性名之一。
value: 一個 Unicode 屬性值。必須由 ASCII 字母（A–Z, a–z）、下劃線（_）和數字（0–9）組成，並且必須是PropertyValueAliases.txt 中列出的受支援值之一。

描述

\p 和 \P 僅在Unicode 感知模式下受支援。在 Unicode 非感知模式下，它們是 p 或 P 字元的恆等轉義。

每個 Unicode 字元都有一組描述它的屬性。例如，字元a 具有 General_Category 屬性，值為 Lowercase_Letter，以及 Script 屬性，值為 Latn。\p 和 \P 轉義序列允許你根據字元的屬性進行匹配。例如，a 可以透過 \p{Lowercase_Letter}（General_Category 屬性名是可選的）以及 \p{Script=Latn} 來匹配。\P 建立一個補集類別，它由不具有指定屬性的程式碼點組成。

當設定了i 標誌時，\P 字元類別在 u 和 v 模式下的處理方式略有不同。在 u 模式下，大小寫摺疊發生在減法之後；在 v 模式下，大小寫摺疊發生在減法之前。更具體地說，在 u 模式下，\P{property} 匹配 caseFold(allCharacters - charactersWithProperty)。這意味著 /\P{Lowercase_Letter}/iu 仍然匹配 "a"，因為 A 不是 Lowercase_Letter。在 v 模式下，\P{property} 匹配 caseFold(allCharacters) - caseFold(charactersWithProperty)。這意味著 /\P{Lowercase_Letter}/iv 不匹配 "a"，因為 A 甚至不在所有大小寫摺疊的 Unicode 字元集中。另請參閱補集類別和不區分大小寫的匹配。

要組合多個屬性，請使用啟用 v 標誌的字元集交集語法，或者參閱模式減法和交集。

在 v 模式下，\p 可以匹配一系列程式碼點，這在 Unicode 中定義為“字串屬性”。這對於表情符號最有用，表情符號通常由多個程式碼點組成。但是，\P 只能補充字元屬性。

注意： 有計劃將字串屬性功能也移植到 u 模式。

示例

通用類別

通用類別用於對 Unicode 字元進行分類，並且可以使用子類別來定義更精確的分類。在 Unicode 屬性轉義中可以使用短形式或長形式。

它們可用於匹配字母、數字、符號、標點符號、空格等。有關通用類別的更詳盡列表，請參閱Unicode 規範。

// finding all the letters of a text
const story = "It's the Cheshire Cat: now I shall have somebody to talk to.";

// Most explicit form
story.match(/\p{General_Category=Letter}/gu);

// It is not mandatory to use the property name for General categories
story.match(/\p{Letter}/gu);

// This is equivalent (short alias):
story.match(/\p{L}/gu);

// This is also equivalent (conjunction of all the subcategories using short aliases)
story.match(/\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}/gu);

指令碼和指令碼擴充套件

有些語言使用不同的指令碼作為其書寫系統。例如，英語和西班牙語使用拉丁指令碼書寫，而阿拉伯語和俄語使用其他指令碼（分別是阿拉伯語和西里爾語）。Script 和 Script_Extensions Unicode 屬性允許正則表示式根據字元主要使用的指令碼（Script）或根據它們所屬的指令碼集（Script_Extensions）來匹配字元。

例如，A 屬於 Latin 指令碼，ε 屬於 Greek 指令碼。

const mixedCharacters = "aεЛ";

// Using the canonical "long" name of the script
mixedCharacters.match(/\p{Script=Latin}/u); // a

// Using a short alias (ISO 15924 code) for the script
mixedCharacters.match(/\p{Script=Grek}/u); // ε

// Using the short name sc for the Script property
mixedCharacters.match(/\p{sc=Cyrillic}/u); // Л

有關更多詳細資訊，請參閱Unicode 規範、ECMAScript 規範中的指令碼表和ISO 15924 指令碼程式碼列表。

如果一個字元只在有限的指令碼集中使用，那麼 Script 屬性將只匹配“主要”使用的指令碼。如果我們要根據“非主要”指令碼匹配字元，我們可以使用 Script_Extensions 屬性（簡稱 scx）。

// ٢ is the digit 2 in Arabic-Indic notation
// while it is predominantly written within the Arabic script
// it can also be written in the Thaana script

"٢".match(/\p{Script=Thaana}/u);
// null as Thaana is not the predominant script

"٢".match(/\p{Script_Extensions=Thaana}/u);
// ["٢", index: 0, input: "٢", groups: undefined]

Unicode 屬性轉義與字元類別

使用 JavaScript 正則表示式，也可以使用字元類別，特別是 \w 或 \d 來匹配字母或數字。但是，這些形式只匹配拉丁指令碼中的字元（換句話說，\w 匹配 a 到 z 和 A 到 Z，\d 匹配 0 到 9）。如這個例子所示，處理非拉丁文字可能會有點笨拙。

Unicode 屬性轉義類別涵蓋了更多的字元，\p{Letter} 或 \p{Number} 將適用於任何指令碼。

// Trying to use ranges to avoid \w limitations:

const nonEnglishText = "Приключения Алисы в Стране чудес";
const regexpBMPWord = /([\u0000-\u0019\u0021-\uFFFF])+/gu;
// BMP goes through U+0000 to U+FFFF but space is U+0020

console.table(nonEnglishText.match(regexpBMPWord));

// Using Unicode property escapes instead
const regexpUPE = /\p{L}+/gu;
console.table(nonEnglishText.match(regexpUPE));

匹配價格

以下示例匹配字串中的價格

function getPrices(str) {
  // Sc stands for "currency symbol"
  return [...str.matchAll(/\p{Sc}\s*[\d.,]+/gu)].map((match) => match[0]);
}

const str = `California rolls $6.99
Crunchy rolls $8.49
Shrimp tempura $10.99`;
console.log(getPrices(str)); // ["$6.99", "$8.49", "$10.99"]

const str2 = `US store $19.99
Europe store €18.99
Japan store ¥2000`;
console.log(getPrices(str2)); // ["$19.99", "€18.99", "¥2000"]

匹配字串

使用 v 標誌，\p{...} 可以透過使用字串屬性來匹配可能長於單個字元的字串

const flag = "🇺🇳";
console.log(flag.length); // 2
console.log(/\p{RGI_Emoji_Flag_Sequence}/v.exec(flag)); // [ '🇺🇳' ]

但是，你不能使用 \P 來匹配“不具有某個屬性的字串”，因為不清楚應該消耗多少個字元。

/\P{RGI_Emoji_Flag_Sequence}/v; // SyntaxError: Invalid regular expression: Invalid property name

規範

規範
ECMAScript® 2026 語言規範 # prod-CharacterClassEscape

瀏覽器相容性

另見

字元類指南
正則表示式
字元類：[...]、[^...]
字元類轉義：\d、\D、\w、\W、\s、\S
字元轉義：\n、\u{...}
析取：|
維基百科上的 Unicode 字元屬性
ES2018：RegExp Unicode 屬性轉義，作者：Dr. Axel Rauschmayer (2017)
Unicode 正則表示式 § 屬性
Unicode 實用程式：UnicodeSet
v8.dev 上帶集合表示法和字串屬性的 RegExp v 標誌 (2022)

幫助改進 MDN

瞭解如何貢獻

本頁面最後修改於 2025 年 8 月 3 日，由 MDN 貢獻者修改。

在 GitHub 上檢視此頁面 • 報告此內容的問題