sampleqa.in tutorials, Python Programming  tutorial

Python Regular Expressions


    Regular Expressions is a tool, to search, find, replace given pattern

Python supports Regular Expressions through import re module

Replacement text can be done using sub function to search and replace

Main job of regular expressions is

  • List All
  • Replace
  • Split

Python re module, has 2 ways to execute regular expressions

  • re.compile
  •     re.compile method returns Pattern Object, This method is used for repeated execution of Regular Expressions. Pattern Object has same methods as module methods, but pattern is implicitely passed to each method.

  • using re methods itself
  • Using module methods, does same,but one-time use. Everytime User has to pass, pattern and string with optional Flags.

In general Regular Expressions has 2 parts

  • pattern
  • This 'pattern' text is going to be searched in 'source text'. This can contain only alphabets, numbers, special characters etc., or combination of all.

  • source text
  •    

Python Regular Expression Functions
MethodDescription
re.compilecompiles pattern for repeated execution of regular expression
re.matchReturns True or False, if match found returns True otherwise False
re.searchsearches the string for pattern matching, if found returns Match Object otherwise None
re.findall
re.splitSplit the source string by the occurance of the pattern returns list containing substring
re.subReplaces the given string with replacement text, After replacement final text will be returned.

re.match method

      re.match is used to find a pattern in the start of the string,which returns match Object

			>>>import re
			>>>pattern='do'
			>>>source="don't worry jim, i'll get it done by 2pm today"

			>>>re.match(pattern,source)
			<re.Match object; span=(0, 2), match='do'>
			
			match found at the starting of the string, it span from 0 to 2
			
			# In pattern \ is used to escape following character i.e single quote.
			
			>>>pattern='don\'t'
			>>>re.match(pattern,source)
			<re.Match object; span=(0, 5), match="don't">
			
		        match found at the starting of the string, it span from 0 to 5
		        
		        >>>pattern='2pm'
		        >>>
		        >>>re.match(pattern,source)
		        """No match""" returns None
		        
			Note: match finds string in the starting of the string, 
			pattern '2pm' exists somewhere in the string, so 'search' or 'findall' methods can solve this problem  
		

re.search method

      re.search is used to find a pattern in the given string,which returns match Object,if it finds otherwise None.

			>>>import re
			>>>pattern='do'
			>>>source="don't worry jim, i'll get it done by 2pm today"
			
			>>>re.search(pattern,source)
			<re.Match object; span=(0, 2), match='do'>
			
			>>>pattern='2pm'
			>>>re.search(pattern,source)
			<re.Match object; span=(37, 40), match='2pm'>

in first case pattern 'do' is found in multiple places, but search method returns first match,
to solve this ,  we need to use findall method
			
			
		

re.findall method

      re.findall is used to find a pattern in the given string,which returns list,

			>>>import re
			>>>pattern='do'
			>>>source="don't worry jim, i'll get it done by 2pm today"
			
			>>>re.findall(pattern,source)
			['do', 'do']
			
			#pattern 'do' is found in 2 places.

		

Above examples simply searched ,for text strings, in the source text, and also we discussed limitations of each method,

A Pattern can contain aplhabets,numbers and special characters also known as metacharacters in Regular Expressions.
meta characters are discussed in the below section

Regular Expressions and MetaCharacters

MetaCharacterMeaning
R. . matches any single character following regular Expression R,match can include space also.
R+ + matches one or more occurances of preceding regular expression R
R?? matches zero or one occurances of preceding regular expression R
R** matches zero or more occurances of preceding regular expression R
Example:Regular Expression using meta characters
>>>price="The C++ Programming by bjarne stroustrup 900.99 in India, 
          59.99 dollars in USA, 40.99 pounds in UK, 100.99 dollars in Singapore"

using dot(.) metcharacter.
			
get all prices from the following string 'price'
			
>>>re.findall('....99',price)
['900.99', ' 59.99', ' 40.99', '100.99']

			
		

Replace subtext using RegularExpression's sub method

      sub method searches a source string with the pattern, if it finds it,then replaces with the replacement text, and returns modified string. if pattern does not exists in the source text, empty string will be returned.

The following example replaces or inserts a pound sign(£) before price.
			import re
			
			source = "Optical lens cost is 59.99 in UK"
			pattern = "cost is"
			replacementText = "cost is £" 
			
			new_text = re.sub(pattern, replacementText, source)
			print(new_text)

			#Output
			Optical lens cost is £ 59.99 in UK
		
The above example can be re-written using regular expression grouping text i.e look for a group called "cost is" , when it is found in the source, just replace with "group text+aditional text" in this case group text is "cost is" and additional text/symbol is pound sign £. \1 indicates first group, we have only group. internally replacement text becomes "cost is £"
			pattern = "(cost is)"
			replacementText = r"\1 £"
			new_text = re.sub(pattern, replacementText, source)
			print(new_text)

			#Output
			Optical lens cost is £ 59.99 in UK
		
		
Camel case function name ,each word replace with Underscore(_)

     for example i have a function name called "EmailNotificationDetails" It is using camel case. Instead of that replace/insert underscore before Capital letter. Function name becomes "Email_Notification_Details"

This can be solved using regular expression findall() method and string methods, second approach is regualar expression sub substitute method. as shown below

		import re
		
		source = "EmailNotificationDetails"
		pattern = r"([a-z])([A-Z])"
		replacementText = r"\1_\2"
		
		new_function_name =  re.sub (pattern, replacementText, source  )
		
		print(new_function_name)
		
		
		#Output
		Email_Notification_Details
		

In above example , we have 2 groups one is letter should be lower case([a-z]), second group is letter should upper case([A-Z])

first group should follow second group, i.e it should match lN,nD substring, then replace or insert unsercore between first group and second group.



deterministic finite automaton (DFA), A DFA is a finite state machine that doesn't use backtracking

Perl style regular expressions nondeterministic finite automation

ADS