
Contents
What is Regular Expression?
Usage of Regular Expression (RE)
Regex in Python
Various Methods of Regular Expressions
- Compile Method
- Search Method
- Match Method
- Findall Method
- Split Method
- Sub Method
1) What is Regular Expression?
A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. Wikipedia
Regular expressions are a generalized way to match patterns with sequences of characters. It is used in every programming language like C++, Java and Python.
Note: We can use regular expressions in Python. The re module provides an interface to the regular expression engine, allowing you to compile regular expressions into objects and then perform matches with them.
Regular expressions uses two types of characters:
Literals characters: such as a, b, 1, 2...
Meta characters: such as opening and closing square brackets ( [ and ] ); backslash ( \ ); caret ( ^ ); dollar sign ( $ ) and etc.
2) Usage of Regular Expression (RE)
3) Regex in Python
Python has a built-in package called re, which can be used to work with Regular Expressions.
Import the re module by;
import re
Note: When you have imported the re module, you can start using regular expressions
4) Various Methods of Regular Expressions
The built-in re package provides multiple methods in order to perform queries on an input string. We will discuss the most commonly used re methods;
- compile()
- search()
- match()
- findall()
- split()
- sub()
All this object instances also have several methods and attributes; the most important ones are:
Method/Attribute | Purpose |
---|---|
group() | Return the string matched by the RE |
start() | Return the starting position of the match |
end() | Return the ending position of the match |
span() | Return a tuple containing the (start, end) positions of the match |
4.1 Compile Method
Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.
re.compile(pattern)
Note: Using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.
import re
sent='iNeuron provides affordable AI courses and AI internship program'
pattern=re.compile('AI')
result=pattern.findall(sent)
result
Out: ['AI', 'AI']
4.2 Search Method
Scan through string looking for a match to the pattern, returning a match object, or None if no match was found.
re.search(pattern, string)
import re
sent='iNeuron provides courses of data science, data anlytics and etc.'
result = re.search('data',sent)
print(result.group())
Out: 'data'
result.group()
Out: 'data'
result.start(), result.end()
Out: (28, 32)
result.span()
Out: (28, 32)
4.3 Match Method
Determine if the regular expressions matches at the beginning of the string
re.match(pattern, string)
import re
string='iNeuron'
## ^ and $ match the start or end of the string respectively
## Matches with any single character
pattern = '^i.....n$'
result = re.match(pattern, string)
print(result)
if result:
print("Search successful.")
else:
print("Search unsuccessful.")
Out: <_sre.SRE_Match object; span=(0, 7), match='iNeuron'>
Search successful.
4.4 Findall Method
Find all substrings where the RE matches, and returns them as a list. It has no such restriction of searching from start or end. While searching it is recommended to use re.findall() because it can work like both re.search() and re.match().
re.findall(pattern, string)
import re
sent='iNeuron provides courses of data science, data anlytics and etc.'
result = re.findall('data',sent)
print(result)
Out: ['data', 'data']
4.5 Split Method
Split the source string by the occurrences of the pattern, returning a list containing the resulting substrings.
re.split(pattern, string, maxsplit=0)
Here, by default maxsplit=0 is set, but as per requirement we can change.
import re
result=re.split('e','iNeuron')
result
Out: ['iN', 'uron']
sent='iNeuron provides courses of data science, data anlytics and etc.'
#It has performed the splits operation based upon the pattern "e".
result=re.split('e',sent)
result
Out: ['iN', 'uron provid', 's cours', 's of data sci', 'nc', ', data anlytics and ', 'tc.']
sent='iNeuron provides courses of data science, data anlytics and etc.'
#It has performed the splits operation based upon the pattern "e" with maxsplit=2.
result=re.split('e',sent,maxsplit=2)
result
Out: ['iN', 'uron provid', 's courses of data science, data anlytics and etc.']
4.6 Sub Method
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string, if the provided pattern is not found, the string remains unchanged.
re.sub(pattern, repl, string)
import re
sent='iNeuron provides the best affordable data science courses in India'
#It has performed the search and replace operation based upon pattern.
result=re.sub('India','World',sent)
result
Out: 'iNeuron provides the best affordable data science courses in World'