Difference between revisions of "Regular Expressions"

Revision as of 18:56, 22 May 2017

A regular expression is a notation for defining all the valid strings of a formal language.

Examples of Regular Expression Notation

Regular Expression	Meaning
a	Matches a string consisting of just the symbol a
b	Matches a string consisting of just the symbol b
ab	Matches a string consisting of the symbol a followed by the symbol b
a*	Matches a string consisting of zero or more a’s
a+	Matches a string consisting of one or more a’s
abb?	Matches the string ab or the string abb. The ? symbol indicates zero or one of the preceding element
a\|b	Matches a string consisting of the symbol a or the symbol b

Precedence Rules

When using regular expressions, the rules of arithmetic precedence are as follows:

+ and * are done first

Concatenation (ie joining elements together) is done next

| comes last

More Examples

Examples of regular expressions using the alphabet {a, b, c}

abc defines the language with only the string ‘abc’
abc | cba defines the language with two strings’ abc’ and ‘cba’
(a | b) c (a | b) gives four strings: ‘aca’, ‘acb’, ‘bca’, ‘bcb’
a+ gives an infinite number of strings: ‘a’, ‘aa’, ‘aaa’, etc
ab* gives an infinite number of strings: ‘a’, ‘ab’, ‘abb’, ‘abbb’, etc
(ab)* gives an infinite number of strings: ‘’, ‘ab’, ‘abab’, ‘ababab’, etc
(a | c)+ gives all possible strings of a and c (not including the empty string)

Regular expression meta-characters


Symbol	Meaning	Example
│	Used to separate alternatives	a│b (Means a or b)
?	Used to denote zero or one of the preceding element	a? (0 or 1 as; matches with ‘’ & ‘a’)
*	Used to denote zero or more of the preceding element	a* (0 or more as; matches with ‘’, ‘a’, ‘aa’, etc.)
( )	Used to group characters together, to indicate the scope of another operator	(ab)* (Example 0 or more abs; matches with ‘’, ‘ab’, ‘abab’, etc.
[ ]	Another way of denoting alternatives (instead of vertical bar). Defines a character class	[ab] (means a or b)
\	The escape character (this turns the metacharacter into an ordinary character)	a\* (the a character followed by the * character. Note: \ is needed as a* would mean zero or more as.)
^	Used to indicate the negation of a character class. Also used to match the position before the first character in a string	a[^bc] (a followed by a character that is not a b or c) ^abc will match with abc only if it is at the beginning of a string
$	Used to match with the position after the last character in a string	abc$ (will match with abc only if it is at the end of a string)
.	Matches with any single character	a.a (will match with any string that has an a followed by any character followed by an a e.g. ‘aca’, ‘aba’)
-	Used to specify a range of values in a character class	[A-Z] (character in the range of A to Z)

Difference between revisions of "Regular Expressions"

Revision as of 18:56, 22 May 2017

Contents

Examples of Regular Expression Notation

Precedence Rules

More Examples

Regular expression meta-characters

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Main Page

AL Paper 1

AL Paper 2

Project

Tools

Changes

@@ Line 41: / Line 41: @@
 ==Regular expression meta-characters==
+{| class="wikitable"
-Symbol	Meaning	Example
+|-
-│	Used to separate alternatives	a│b
+! Symbol !! Meaning !! Example
-Means a or b
+|-
-?	Used to denote zero or one of the preceding element	a?
+|<nowiki>│</nowiki>||	Used to separate alternatives || a│b (Means a or b)
-or 1 as; matches with ‘’ & ‘a’
+|-
-*	Used to denote zero or more of the preceding element	a*
+|?	||Used to denote zero or one of the preceding element	||a? (0 or 1 as; matches with ‘’ & ‘a’)
-or more as; matches with ‘’, ‘a’, ‘aa’, etc.
+|-
-+	Used to denote one or more of the preceding element	a+
+|*	||Used to denote zero or more of the preceding element	||a* (0 or more as; matches with ‘’, ‘a’, ‘aa’, etc.)
-or more as; matches with ‘a’, ‘aa”’etc.
+|-
-( )	Used to group characters together, to indicate the scope of another operator	(ab)*
+|+	||Used to denote one or more of the preceding element	||a+ (1 or more as; matches with ‘a’, ‘aa”’etc.)
-or more abs; matches with ‘’, ‘ab’, ‘abab’, etc.
+|-
-[ ]	Another way of denoting alternatives (instead of vertical bar). Defines a character class	[ab]
+|( )	||Used to group characters together, to indicate the scope of another operator	||(ab)* (Example 0 or more abs; matches with ‘’, ‘ab’, ‘abab’, etc.
-means a or b
+|-
-\	The escape character (this turns the metacharacter into an ordinary character)	a\*
+|[ ]	||Another way of denoting alternatives (instead of vertical bar). Defines a character class	||[ab] (means a or b)
-the a character followed by the * character. Note: \ is needed as a* would mean zero or more as.
+|-
-^	Used to indicate the negation of a character class
+|\	||The escape character (this turns the metacharacter into an ordinary character)	||a\* (the a character followed by the * character. Note: \ is needed as a* would mean zero or more as.)
+|-
-Also used to match the position before the first character in a string
+|^	||Used to indicate the negation of a character class. Also used to match the position before the first character in a string || a[^bc] (a followed by a character that is not a b or c) ^abc will match with abc only if it is at the beginning of a string
-	a[^bc]
+|-
-a followed by a character that is not a b or c
+|$	||Used to match with the position after the last character in a string	||abc$ (will match with abc only if it is at the end of a string)
-^abc
+|-
-will match with abc only if it is at the beginning of a string
+|.	||Matches with any single character	||a.a (will match with any string that has an a followed by any character followed by an a e.g. ‘aca’, ‘aba’)
-$	Used to match with the position after the last character in a string	abc$
+|-
-will match with abc only if it is at the end of a string
+| -	||Used to specify a range of values in a character class	||[A-Z] (character in the range of A to Z)
-.	Matches with any single character	a.a
+|}
-will match with any string that has an a followed by any character followed by an a e.g. ‘aca’, ‘aba’
--	Used to specify a range of values in a character class	[A-Z]
-character in the range of A to Z