Regular Expression
Table of Contents
1. Syntax
1.1. Character Classes
[^e-s0-5."]one of the characters inside the bracket..any character except newline.\wany character in a word.[A-Za-z0-9_],\ddigits,\sany whitespace: space, tab.
1.2. Quantifiers
- Greedy:
+one or more,*zero or more,?zero or one- It matches greedily. It starts to match from the largest repetition of expression possible, and if it does not match the number of repetition is decreased by one.
- Lazy:
+?one or more,*?zero or more,??zero or one- It starts to match with the smallest repetition of expression possible, and if it does not match the number of repetition is increased by one.
- Possesive (Java and Python 3.11+):
++,*+,?+- It disables backing off, and matches as far as it can. If it does not match, the match fails.
{n}n times,{n,}n times or more,{n,m}between n and m times?can be used for lazy search, and+for possesive search.
1.3. Groups
(X)create group.\1,\2, … backreferences, match the content of n th group.(?>X)atomic grouping- The backtracking is diabled within the group.
1.4. Lookaround
X(?=Y)matchXif beforeYX(?!Y)matchXif not beforeY(?<=Y)XmatchXif afterY(?<!Y)XmatchXif not afterY
1.5. Flags
gglobal search- Look for multiple matches
iignore casesmmultipline^and$matches the starts and ends of lines, instead of the start and end of the entire string.
ssingle line (dotall).matches newline
2. Flavors
2.1. POSIX
2.1.1. BRE
Basic Regular Expression
2.1.2. ERE
Extended Regular Expression
2.2. PCRE
Perl Compatible Regular Expression
3. Implementation
It is implemented using state machine.