Regular Expression
Table of Contents
1. Syntax
1.1. Character Classes
[^e-s0-5."]
one of the characters inside the bracket..
any character except newline.\w
any character in a word.[A-Za-z0-9_]
,\d
digits,\s
any whitespace: space, tab.
1.2. Quantifiers
- Greedy:
+
one or more,*
zero or more,?
zero or one- It matches greedily. It starts to match from the largest repetition of expression possible, and if it does not match the number of repetition is decreased by one.
- Lazy:
+?
one or more,*?
zero or more,??
zero or one- It starts to match with the smallest repetition of expression possible, and if it does not match the number of repetition is increased by one.
- Possesive (Java and Python 3.11+):
++
,*+
,?+
- It disables backing off, and matches as far as it can. If it does not match, the match fails.
{n}
n times,{n,}
n times or more,{n,m}
between n and m times?
can be used for lazy search, and+
for possesive search.
1.3. Groups
(X)
create group.\1
,\2
, … backreferences, match the content of n th group.(?>X)
atomic grouping- The backtracking is diabled within the group.
1.4. Lookaround
X(?=Y)
matchX
if beforeY
X(?!Y)
matchX
if not beforeY
(?<=Y)X
matchX
if afterY
(?<!Y)X
matchX
if not afterY
1.5. Flags
g
global search- Look for multiple matches
i
ignore casesm
multipline^
and$
matches the starts and ends of lines, instead of the start and end of the entire string.
s
single line (dotall).
matches newline
2. Flavors
2.1. POSIX
2.1.1. BRE
Basic Regular Expression
2.1.2. ERE
Extended Regular Expression
2.2. PCRE
Perl Compatible Regular Expression
3. Implementation
It is implemented using state machine.