The following definitions use the .Net Implementation

Quick Reference

(?i) IgnoreCase; (?n) ExplicitCapture; (?x) IgnorePatternWhitespace; (?m) Multiline; (?s) Singleline;
^ Start of line; $ End of line; .* ≥0 character(s); .+ ≥0 character(s);
\w* Any word; \S* Any non-whitespace; \s* Any whitespace;
(.*) Unnamed group; (?<user>.*) Named group called user

Miscellaneous Constructs

ConstructDefinitionExample
(?imnsx-imnsx)Sets or disables options such as case insensitivity in the middle of a pattern. For more information, see Regular Expression Options.\bA(?i)b\w+\b matches "ABA", "Able" in "ABA Able Act"
(?# comment )Inline comment. The comment ends at the first closing parenthesis.\bA(?#Matches words starting with A)\w+\b
# to end of lineX-mode comment. The comment starts at an unescaped # and continues to the end of the line.(?x)\bA\w+\b#Matches words starting with A

Engine Interpretation Options

Start pattern with (?enabled options-disabled options) e.g: (?imnsx-imnsx)

Inline characterRegexOptions memberEffect
Not availableNoneUse default behavior. For more information, see Default Options.
iIgnoreCaseUse case-insensitive matching. For more information, see Case-Insensitive Matching.
mMultilineUse multiline mode, where ^ and $ match the beginning and end of each line (instead of the beginning and end of the input string). For more information, see Multiline Mode.
sSinglelineUse single-line mode, where the period (.) matches every character (instead of every character except \n). For more information, see Single-line Mode.
nExplicitCaptureDo not capture unnamed groups. The only valid captures are explicitly named or numbered groups of the form (?&lt;name&gt; subexpression). For more information, see Explicit Captures Only.
Not availableCompiledCompile the regular expression to an assembly. For more information, see Compiled Regular Expressions.
xIgnorePatternWhitespaceExclude unescaped white space from the pattern, and enable comments after a number sign (#). For more information, see Ignore White Space.
Not availableRightToLeftChange the search direction. Search moves from right to left instead of from left to right. For more information, see Right-to-Left Mode.
Not availableECMAScriptEnable ECMAScript-compliant behavior for the expression. For more information, see ECMAScript Matching Behavior.
Not availableCultureInvariantIgnore cultural differences in language. For more information, see Comparison Using the Invariant Culture.

Single Characters

UseTo match any character
[set]In that set
[^set]Not in that set
\p{Ll}Letter in lowercase
\p{Lu}Letter in uppercase
[a–m]In the a-m range
[^0–5]Not in the 0-5 range
.Any except \n (new line)
\charEscaped special character

Control Characters

UseTo matchUnicode
\tHorizontal tab\u0009
\vVertical tab\u000B
\bBackspace\u0008
\eEscape\u001B
\rCarriage return\u000D
\fForm feed\u000C
\nNew line\u000A
\aBell (alarm)\u0007
\c charASCII control character-

Non-ASCII Codes

UseTo match character with
\octal2-3 digit octal character code
\x hex2-digit hex character code
\u hex4-digit hex character code

Character Classes

UseTo match character
\p{ctgry}In that Unicode Category or block
\P{ctgry}Not in that Unicode Category or block
\wWord character
\WNon-word character
\dDecimal digit
\DNot a decimal digit
\sWhite-space character
\SNon-white-space char

Quantifiers

Greedy
as many as possible
Lazy
as few as possible
Matches
**?0 or more times
++?1 or more times
???0 or 1 time
{n}{n}?Exactly n times
{n,}{n,}?At least n times
{n,m}{n,m}?From n to m times

Anchors

UseTo specify position
^At start of string or line
\AAt start of string
\zAt end of string
\ZAt end (or before \n at end) of string
$At end (or before \n at end) of string or line
\GWhere previous match ended
\bOn word boundary
\BNot on word boundary

Groups

UseTo define
(exp)Indexed group
(?<name>exp)Named group
(?<name1-name2>exp)Balancing group
(?:exp)Noncapturing group
(?=exp)Zero-width positive lookahead
the following character must match exp
(?!exp)Zero-width negative lookahead
the following character cannot match exp
(?<=exp)Zero-width positive lookbehind
the previous characters must match exp
(?<!exp)Zero-width negative lookbehind
the previous characters cannot match exp
(?>exp)Non-backtracking (greedy)

Example:

The price of SCHLÜMPFE ice cream is 20 €.
  • capture an indexed group accessible via $1, $2, $3 etc.

    (exp)
    

    e.g. (\p{Lu}{2,}) captures the uppercase word SCHLÜMPFE in $1

  • capture a named group accessible via $name

    (?<name>exp)
    

    e.g. (?<price>\d+) captures the number 20 in $price

  • the following characters must match subpattern exp using
    Zero-width positive lookahead

    (?=exp)
    
  • the following characters cannot match subpattern exp using
    Zero-width negative lookahead

    (?!exp)
    
  • the previous characters must match subpattern exp using
    Zero-width positive lookbehind

    (?<=exp)
    
  • the previous characters must match subpattern exp using
    Zero-width negative lookbehind

    (?<!exp)
    
  • Non-backtracking (greedy)

    (?>exp)
    

.Net regex flavor exclusive

  • capture multiple expressions into the same repeated group

    (?<repeated>exp1).*(?<repeated>exp2)
    

    e.g. (?<words>\w+) is (?<words>\d+) captures cream and 20 in $words

  • pop last capture from repeated group if expression matches
    Only matches if last capture exists.

    (?<repeated>exp).*(?<-repeated>exp)
    

    e.g. (?<words>\w+\s)+(?<-words>cream) captures all words until 2nd-last word before cream in $words

  • capture expression and pop repeated group

    (?<repeated>exp).*(?<name-repeated>exp)
    
  • capture nested content in delimiters using balanced group
        Fails if delimiters are misbalanced.

    • [^<>] ignores no delimiters
    • (?<Open>[<]) opening delimiter < pushes onto balanced group
    • (?<Content-Open>[>]) closing delimiter > pops balanced group are captures the content in between
    • (?(Open)(?!)) if opening delimiter left over, fail
    ^(?:[^<>]|(?<Open>[<])|(?<Content-Open>[>]))*(?(Open)(?!))$
    

    e.g. text <<a>-<b>>+<c> is balanced and captures a, b, <a>-<b> and c

Inline Options

OptionEffect on match
iCase-insensitive
mMultiline mode
nExplicit (named)
sSingle-line mode
xIgnore white space
UseTo
(?imnsx-imnsx)Set or disable the specified options
(?imnsx-imnsx:exp)Set or disable the specified options within the expression

Backreferences

UseTo match
\nIndexed group
\k<name>Named group

Alternation

UseTo match
a|bEither a or b
(?(exp)yes|no)yes if exp is matched
no if exp isn’t matched
(?(name)yes|no)yes if name is matched
no if name isn’t matched

Substitution

UseTo substitute
$nSubstring matched by group number n
${name}Substring matched by group name
$$Literal $ character
$&Copy of whole match
$`Text before the match
$'Text after the match
$+Last captured group
$_Entire input string

Comments

UseTo
(?# comment)Add inline comment
#Add x-mode comment

Supported Unicode Categories

CategoryDescription
LuLetter, uppercase
LILetter, lowercase
LtLetter, title case
LmLetter, modifier
LoLetter, other
L Letter, all
MnMark, nonspacing combining
McMark, spacing combining
MeMark, enclosing combining
M Mark, all diacritic
NdNumber, decimal digit
NlNumber, letterlike
NoNumber, other
N Number, all
PcPunctuation, connector
PdPunctuation, dash
PsPunctuation, opening mark
PePunctuation, closing mark
PiPunctuation, initial quote mark
PfPuntuation, final quote mark
PoPunctuation, other
P Punctuation, all
SmSymbol, math
ScSymbol, currency
SkSymbol, modifier
SoSymbol, other
S Symbol, all
ZsSeparator, space
ZlSeparator, line
ZpSeparator, paragraph
Z Separator, all
CcControl code
CfFormat control character
CsSurrogate code point
CoPrivate-use character
CnUnassigned
C Control characters, all

Sources:

Related:
RegEx implementation in .Net

Tags:
Computer Language
Document conversion