Regular Expression

Table of Contents

1. Syntax

1.1. Character Classes

  • [^e-s0-5."] one of the characters inside the bracket.
  • . any character except newline.
  • \w any character in a word. [A-Za-z0-9_], \d digits, \s any whitespace: space, tab.

1.2. Quantifiers

  • Greedy: + one or more, * zero or more, ? zero or one
    • It matches greedily. It starts to match from the largest repetition of expression possible, and if it does not match the number of repetition is decreased by one.
  • Lazy: +? one or more, *? zero or more, ?? zero or one
    • It starts to match with the smallest repetition of expression possible, and if it does not match the number of repetition is increased by one.
  • Possesive (Java and Python 3.11+): ++, *+, ?+
    • It disables backing off, and matches as far as it can. If it does not match, the match fails.
  • {n} n times, {n,} n times or more, {n,m} between n and m times
    • ? can be used for lazy search, and + for possesive search.

1.3. Groups

  • (X) create group.
  • \1, \2, … backreferences, match the content of n th group.
  • (?>X) atomic grouping
    • The backtracking is diabled within the group.

1.4. Lookaround

  • X(?=Y) match X if before Y
  • X(?!Y) match X if not before Y
  • (?<=Y)X match X if after Y
  • (?<!Y)X match X if not after Y

1.5. Flags

  • g global search
    • Look for multiple matches
  • i ignore cases
  • m multipline
    • ^ and $ matches the starts and ends of lines, instead of the start and end of the entire string.
  • s single line (dotall)
    • . matches newline

2. Flavors

2.1. POSIX

2.1.1. BRE

Basic Regular Expression

2.1.2. ERE

Extended Regular Expression

2.2. PCRE

Perl Compatible Regular Expression

3. Implementation

It is implemented using state machine.

4. External Links

Created: 2025-05-06 Tue 23:25