The following definitions use the .Net Implementation
Quick Reference
(?i) IgnoreCase; (?n) ExplicitCapture; (?x) IgnorePatternWhitespace; (?m) Multiline; (?s) Singleline;
^ Start of line; $ End of line; .* ≥0 character(s); .+ ≥0 character(s);
\w* Any word; \S* Any non-whitespace; \s* Any whitespace;
(.*) Unnamed group; (?<user>.*) Named group called user
Miscellaneous Constructs
| Construct | Definition | Example |
|---|---|---|
(?imnsx-imnsx) | Sets or disables options such as case insensitivity in the middle of a pattern. For more information, see Regular Expression Options. | \bA(?i)b\w+\b matches "ABA", "Able" in "ABA Able Act" |
(?# comment ) | Inline comment. The comment ends at the first closing parenthesis. | \bA(?#Matches words starting with A)\w+\b |
# to end of line | X-mode comment. The comment starts at an unescaped # and continues to the end of the line. | (?x)\bA\w+\b#Matches words starting with A |
Engine Interpretation Options
Start pattern with (?enabled options-disabled options) e.g: (?imnsx-imnsx)
| Inline character | RegexOptions member | Effect |
|---|---|---|
| Not available | None | Use default behavior. For more information, see Default Options. |
i | IgnoreCase | Use case-insensitive matching. For more information, see Case-Insensitive Matching. |
m | Multiline | Use multiline mode, where ^ and $ match the beginning and end of each line (instead of the beginning and end of the input string). For more information, see Multiline Mode. |
s | Singleline | Use single-line mode, where the period (.) matches every character (instead of every character except \n). For more information, see Single-line Mode. |
n | ExplicitCapture | Do not capture unnamed groups. The only valid captures are explicitly named or numbered groups of the form (?<name> subexpression). For more information, see Explicit Captures Only. |
| Not available | Compiled | Compile the regular expression to an assembly. For more information, see Compiled Regular Expressions. |
x | IgnorePatternWhitespace | Exclude unescaped white space from the pattern, and enable comments after a number sign (#). For more information, see Ignore White Space. |
| Not available | RightToLeft | Change the search direction. Search moves from right to left instead of from left to right. For more information, see Right-to-Left Mode. |
| Not available | ECMAScript | Enable ECMAScript-compliant behavior for the expression. For more information, see ECMAScript Matching Behavior. |
| Not available | CultureInvariant | Ignore cultural differences in language. For more information, see Comparison Using the Invariant Culture. |
Single Characters
| Use | To match any character |
|---|---|
[set] | In that set |
[^set] | Not in that set |
\p{Ll} | Letter in lowercase |
\p{Lu} | Letter in uppercase |
[a–m] | In the a-m range |
[^0–5] | Not in the 0-5 range |
. | Any except \n (new line) |
\char | Escaped special character |
Control Characters
| Use | To match | Unicode |
|---|---|---|
\t | Horizontal tab | \u0009 |
\v | Vertical tab | \u000B |
\b | Backspace | \u0008 |
\e | Escape | \u001B |
\r | Carriage return | \u000D |
\f | Form feed | \u000C |
\n | New line | \u000A |
\a | Bell (alarm) | \u0007 |
\c char | ASCII control character | - |
Non-ASCII Codes
| Use | To match character with |
|---|---|
\octal | 2-3 digit octal character code |
\x hex | 2-digit hex character code |
\u hex | 4-digit hex character code |
Character Classes
| Use | To match character |
|---|---|
\p{ctgry} | In that Unicode Category or block |
\P{ctgry} | Not in that Unicode Category or block |
\w | Word character |
\W | Non-word character |
\d | Decimal digit |
\D | Not a decimal digit |
\s | White-space character |
\S | Non-white-space char |
Quantifiers
| Greedy as many as possible | Lazy as few as possible | Matches |
|---|---|---|
* | *? | 0 or more times |
+ | +? | 1 or more times |
? | ?? | 0 or 1 time |
{n} | {n}? | Exactly n times |
{n,} | {n,}? | At least n times |
{n,m} | {n,m}? | From n to m times |
Anchors
| Use | To specify position |
|---|---|
^ | At start of string or line |
\A | At start of string |
\z | At end of string |
\Z | At end (or before \n at end) of string |
$ | At end (or before \n at end) of string or line |
\G | Where previous match ended |
\b | On word boundary |
\B | Not on word boundary |
Groups
| Use | To define |
|---|---|
(exp) | Indexed group |
(?<name>exp) | Named group |
(?<name1-name2>exp) | Balancing group |
(?:exp) | Noncapturing group |
(?=exp) | Zero-width positive lookahead the following character must match exp |
(?!exp) | Zero-width negative lookahead the following character cannot match exp |
(?<=exp) | Zero-width positive lookbehind the previous characters must match exp |
(?<!exp) | Zero-width negative lookbehind the previous characters cannot match exp |
(?>exp) | Non-backtracking (greedy) |
Example:
The price of SCHLÜMPFE ice cream is 20 €.
-
capture an indexed group accessible via
$1,$2,$3etc.(exp)e.g.
(\p{Lu}{2,})captures the uppercase wordSCHLÜMPFEin$1 -
capture a named group accessible via
$name(?<name>exp)e.g.
(?<price>\d+)captures the number20in$price -
the following characters must match subpattern exp using
Zero-width positive lookahead(?=exp) -
the following characters cannot match subpattern exp using
Zero-width negative lookahead(?!exp) -
the previous characters must match subpattern exp using
Zero-width positive lookbehind(?<=exp) -
the previous characters must match subpattern exp using
Zero-width negative lookbehind(?<!exp) -
Non-backtracking (greedy)
(?>exp)
.Net regex flavor exclusive
-
capture multiple expressions into the same repeated group
(?<repeated>exp1).*(?<repeated>exp2)e.g.
(?<words>\w+) is (?<words>\d+)capturescreamand20in$words -
pop last capture from repeated group if expression matches
Only matches if last capture exists.(?<repeated>exp).*(?<-repeated>exp)e.g.
(?<words>\w+\s)+(?<-words>cream)captures all words until 2nd-last word before cream in$words -
capture expression and pop repeated group
(?<repeated>exp).*(?<name-repeated>exp) -
capture nested content in delimiters using balanced group
Fails if delimiters are misbalanced.[^<>]ignores no delimiters(?<Open>[<])opening delimiter<pushes onto balanced group(?<Content-Open>[>])closing delimiter>pops balanced group are captures the content in between(?(Open)(?!))if opening delimiter left over, fail
^(?:[^<>]|(?<Open>[<])|(?<Content-Open>[>]))*(?(Open)(?!))$e.g. text
<<a>-<b>>+<c>is balanced and capturesa,b,<a>-<b>andc
Inline Options
| Option | Effect on match |
|---|---|
i | Case-insensitive |
m | Multiline mode |
n | Explicit (named) |
s | Single-line mode |
x | Ignore white space |
| Use | To |
|---|---|
(?imnsx-imnsx) | Set or disable the specified options |
(?imnsx-imnsx:exp) | Set or disable the specified options within the expression |
Backreferences
| Use | To match |
|---|---|
\n | Indexed group |
\k<name> | Named group |
Alternation
| Use | To match |
|---|---|
a|b | Either a or b |
(?(exp)yes|no) | yes if exp is matched no if exp isn’t matched |
(?(name)yes|no) | yes if name is matched no if name isn’t matched |
Substitution
| Use | To substitute |
|---|---|
$n | Substring matched by group number n |
${name} | Substring matched by group name |
$$ | Literal $ character |
$& | Copy of whole match |
$` | Text before the match |
$' | Text after the match |
$+ | Last captured group |
$_ | Entire input string |
Comments
| Use | To |
|---|---|
(?# comment) | Add inline comment |
# | Add x-mode comment |
Supported Unicode Categories
| Category | Description |
|---|---|
Lu | Letter, uppercase |
LI | Letter, lowercase |
Lt | Letter, title case |
Lm | Letter, modifier |
Lo | Letter, other |
L | Letter, all |
Mn | Mark, nonspacing combining |
Mc | Mark, spacing combining |
Me | Mark, enclosing combining |
M | Mark, all diacritic |
Nd | Number, decimal digit |
Nl | Number, letterlike |
No | Number, other |
N | Number, all |
Pc | Punctuation, connector |
Pd | Punctuation, dash |
Ps | Punctuation, opening mark |
Pe | Punctuation, closing mark |
Pi | Punctuation, initial quote mark |
Pf | Puntuation, final quote mark |
Po | Punctuation, other |
P | Punctuation, all |
Sm | Symbol, math |
Sc | Symbol, currency |
Sk | Symbol, modifier |
So | Symbol, other |
S | Symbol, all |
Zs | Separator, space |
Zl | Separator, line |
Zp | Separator, paragraph |
Z | Separator, all |
Cc | Control code |
Cf | Format control character |
Cs | Surrogate code point |
Co | Private-use character |
Cn | Unassigned |
C | Control characters, all |
Sources:
- 2022-01-26: Quick Reference (MS Docs)
- 2022-04-04: Regular Expression Language - Quick Reference - Microsoft Docs
- 2022-01-26: Visual Studio (MS Docs)
- 2022-01-26: .NET Framework Regular Expressions
- 2023-01-10: Regex Tutorial - Matching Nested Constructs with Balancing Groups
- 2023-01-10: c# - What are regular expression Balancing Groups? - Stack Overflow
Related: RegEx implementation in .Net
Tags: Computer Language Document conversion