Regular Expressions is a tool, to search, find, replace given pattern
Python supports Regular Expressions through import re module
Replacement text can be done using sub function to search and replace
Main job of regular expressions is
Python re module, has 2 ways to execute regular expressions
re.compile method returns Pattern Object, This method is used for repeated execution of Regular Expressions. Pattern Object has same methods as module methods, but pattern is implicitely passed to each method.
Using module methods, does same,but one-time use. Everytime User has to pass, pattern and string with optional Flags.
In general Regular Expressions has 2 parts
This 'pattern' text is going to be searched in 'source text'. This can contain only alphabets, numbers, special characters etc., or combination of all.
Method | Description |
re.compile | compiles pattern for repeated execution of regular expression |
re.match | Returns True or False, if match found returns True otherwise False |
re.search | searches the string for pattern matching, if found returns Match Object otherwise None |
re.findall | |
re.split | Split the source string by the occurance of the pattern returns list containing substring |
re.sub | Replaces the given string with replacement text, After replacement final text will be returned. |
re.match is used to find a pattern in the start of the string,which returns match Object
>>>import re
>>>pattern='do'
>>>source="don't worry jim, i'll get it done by 2pm today"
>>>re.match(pattern,source)
<re.Match object; span=(0, 2), match='do'>
match found at the starting of the string, it span from 0 to 2
# In pattern \ is used to escape following character i.e single quote.
>>>pattern='don\'t'
>>>re.match(pattern,source)
<re.Match object; span=(0, 5), match="don't">
match found at the starting of the string, it span from 0 to 5
>>>pattern='2pm'
>>>
>>>re.match(pattern,source)
"""No match""" returns None
Note: match finds string in the starting of the string,
pattern '2pm' exists somewhere in the string, so 'search' or 'findall' methods can solve this problem
re.search is used to find a pattern in the given string,which returns match Object,if it finds otherwise None.
>>>import re
>>>pattern='do'
>>>source="don't worry jim, i'll get it done by 2pm today"
>>>re.search(pattern,source)
<re.Match object; span=(0, 2), match='do'>
>>>pattern='2pm'
>>>re.search(pattern,source)
<re.Match object; span=(37, 40), match='2pm'>
in first case pattern 'do' is found in multiple places, but search method returns first match,
to solve this , we need to use findall method
re.findall is used to find a pattern in the given string,which returns list,
>>>import re >>>pattern='do' >>>source="don't worry jim, i'll get it done by 2pm today" >>>re.findall(pattern,source) ['do', 'do'] #pattern 'do' is found in 2 places.
Above examples simply searched ,for text strings, in the source text, and also we discussed limitations of each method,
A Pattern can contain aplhabets,numbers and special characters also known as metacharacters in Regular Expressions.
meta characters are discussed in the below section
MetaCharacter | Meaning |
R. | . matches any single character following regular Expression R,match can include space also. |
R+ | + matches one or more occurances of preceding regular expression R |
R? | ? matches zero or one occurances of preceding regular expression R |
R* | * matches zero or more occurances of preceding regular expression R |
>>>price="The C++ Programming by bjarne stroustrup 900.99 in India,
59.99 dollars in USA, 40.99 pounds in UK, 100.99 dollars in Singapore"
using dot(.) metcharacter.
get all prices from the following string 'price'
>>>re.findall('....99',price)
['900.99', ' 59.99', ' 40.99', '100.99']
sub method searches a source string with the pattern, if it finds it,then replaces with the replacement text, and returns modified string. if pattern does not exists in the source text, empty string will be returned.
The following example replaces or inserts a pound sign(£) before price.import re source = "Optical lens cost is 59.99 in UK" pattern = "cost is" replacementText = "cost is £" new_text = re.sub(pattern, replacementText, source) print(new_text) #Output Optical lens cost is £ 59.99 in UKThe above example can be re-written using regular expression grouping text i.e look for a group called "cost is" , when it is found in the source, just replace with "group text+aditional text" in this case group text is "cost is" and additional text/symbol is pound sign £. \1 indicates first group, we have only group. internally replacement text becomes "cost is £"
pattern = "(cost is)" replacementText = r"\1 £" new_text = re.sub(pattern, replacementText, source) print(new_text) #Output Optical lens cost is £ 59.99 in UKCamel case function name ,each word replace with Underscore(_)
for example i have a function name called "EmailNotificationDetails" It is using camel case. Instead of that replace/insert underscore before Capital letter. Function name becomes "Email_Notification_Details"
This can be solved using regular expression findall() method and string methods, second approach is regualar expression sub substitute method. as shown below
import re source = "EmailNotificationDetails" pattern = r"([a-z])([A-Z])" replacementText = r"\1_\2" new_function_name = re.sub (pattern, replacementText, source ) print(new_function_name) #Output Email_Notification_Details
In above example , we have 2 groups one is letter should be lower case([a-z]), second group is letter should upper case([A-Z])
first group should follow second group, i.e it should match lN,nD substring, then replace or insert unsercore between first group and second group.deterministic finite automaton (DFA), A DFA is a finite state machine that doesn't use backtracking
Perl style regular expressions nondeterministic finite automation
ADS