Menu Close

Python Regular Expression ( With Examples )

Python regular expression

In this article, you will learn about Python regular expression using the Python re module, Python RegEx provides regular expression matching operations. If you want to search for a specific string pattern from lots of strings, then you can use the re module.

Python re module is a built module that means you don’t need to download or install using pip, like requests module. When you installed Python on your machine, the Python Re module was installed along with.

In this guide, we will learn what is Regular Expressions (RegEx) and how to use using re built-in module. If you want to work with Regular Expressions then the re module is best for you.

Before start re module let’s explore a little about regular expression.

What is Regular Expressions ( RegEx )?

Regular Expressions or RegEx is a sequence of characters that form a search pattern.
RegEx is used to check if a string contains a specific search pattern.

Advantages of using RegEx

There are lots of advantages available of using regular expression in Python.

  • Wide range of usages possibility.You can create one regular expression pattern and use various place and validate various kind of the data.
  • You can avoid use of if and else statement to validate pattern.
  • High productivity with less effort.
  • You can search any kind of string from the set of collections of the strings.

Import re module

To use the re module, first of all, you have to import the re module using the import keyword.

Regular Expressions ( RegEx ) in Python

Once you have the imported re module, you are able to work with regular expressions.

Example

Search a string that starts with ‘The‘ and ends with ‘programming‘.


import re
txt = 'The Programming Funda is a best platform to learn programming'
result = re.search("^The.*programming$", txt)
print(result.string)

Output

The Programming Funda is a best platform to learn programming

Python RegEx Functions

Python re the module provides some set of functions that are used to search a string for a match.

FunctionsDescription
findall()Return a list containing all the matches.
search()Return a match object if any matches in the string.
split()Return a list where the string has split at each match.
subReplace one or many matches with a string.

Metacharacters

Metacharacters are symbols or characters that have special meaning in Python regular expressions.

[ ] Square Bracket

Square Bracket specifies a set of characters you want to match.

Here [ams] will match if the string you try to match contain any one of the characters a,m or s.

ExpressionStringMatch
[abc]abcd3 match at abcd
[hello]Coding1 Match

Example


import re
txt = 'Programmig Funda'
result1 = re.findall("[ams]", txt)
result2 = re.findall("[a-s]", txt)
print(result1)
print(result2)

Output


['a', 'm', 'm', 'a']
['r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'g', 'n', 'd', 'a']

You can also specify a range of characters using – inside the square bracket.

  • [a-e] is same as [abcde]
  • [1-5] is same as [12345]
  • [0-45] is same as [012345]

\ Backslash

The backslash is used to escape various characters including all the metacharacters.
for example \d search only digit.

ExpressionStringMatch
.a1 match
..abc2 match
abcd3 Match

Example


import re
txt = 'Programmig Funda'
result = re.findall("Fu..a", txt)
print(result)

The output will be:- ['Funda']

^ Caret

The Caret symbol is used to check if the string starts with a special character.

ExpressionStringMatch
^aa1 Match
^abcabcd1 Match
^cProgrammingNo Match

Example


import re
txt = 'Programmig Funda'
result = re.findall("^Programmig", txt)
print(result)

The output will be:- [‘Programmig’]

$ Doller

The Doller symbol is used to check if the string end with special character.

ExpressionStringMatch
a$a1 Match
n$Python1 Match
a$CodingNo Match

Example


import re
txt = 'Programmig Funda'
result = re.findall("Funda$", txt)
print(result)

Output

['Funda']

Star ( * )

The star symbol ( * ) matches zero or more occurrences.

ExpressionStringMatch
Ra*mRm1 Match
Ram1 Match
RaimNo Match because a is not followed by m

Example


import re
txt = 'Programming'
result = re.search("Program*ing", txt)
print(result.string)

The Output will be:- Programming

Plus ( + )

The Plus symbol ( + ) matches one or more occurrences.

ExpressionStringMatch
Ra+imRmNo Match
Ram1 Match
Raaim1 Match

Example


import re
txt = 'Raim'
result = re.findall("Ra+im", txt)
print(result)

The output will be:- [‘Raim’]

? Question Mark

The question mark match zero or one occurrence of the pattern left to it.

ExpressionStringExample
Ra?mRm1 Match
Ram1 Match
RaamNo Match because a is not followed by m.

Example


import re
txt = 'Programming'
result = re.findall("Programmin?g", txt)
print(result)

Output

['Programmig']

{ } Braces

The braces { } match exactly the specified number of occurrences.For example. {n, m} this means at least n and at most m repetitions.

ExpressionStringMatch
a{2, 3}abc, bcdNo Match
a{2, 3}aabc baad2 Match at (aabc and baad)
a{2, 3}abc baad1 Match at ( baad )

Let’s understand some more complex expressions to match strings. This Python RegEx [0-9]{2, 5} matches at least 2 digits but not more than 5 digits.

ExpressionStringMatch
[0-9]{2, 5}abcd123efgh1 Match at abcd123efgh

Expression


import re
txt = 'Hello Helllo'
result = re.findall("l{2}", txt)
result2 = re.findall("l{2,3}", txt)
print(result)
print(result2)

Output


['ll', 'll']
['ll', 'lll']

| Alternation

The Vertical Operator ( | ) is used to search alternate occurrences.

ExpressionStringMatch
a|cabd1 Match
a|cabc2 Match
a|cafgNo Match

Example


import re
txt = 'hello'
result = re.findall("h|s", txt)
print(result)

The output will be:- [‘h’]

() Group

The Parentheses () is used to capture the group of sub-patterns. For example (x|y|z)abc pattern any string that contains either x, y and z followed by the abc.

ExpressionStringMatch
(x|y|z)abcxy abcNo Match
(x|y|z)abcxyabc1 Match
(x|y|z)abcxabc yabc2 Match

Example


import re
txt = 'hello'
result = re.findall("(h|s)ello", txt) # Match h
print(result)

The output will be:- [‘h’]

Special Sequences

The special sequence represents the basic predefined characters classes which, have special meaning. Each special sequence makes a specific common pattern and it is very easy to use.

Special SequenceDescription
\AMatches if the specified characters start of the string
\bMatches if the specified characters are at the beginning or end of the string
\BIt is the Opposite of the \b. It matches if the specified characters are not at the beginning and end of the string.
\dMatch decimal digit. It is equivalent to [0-9]
\DMatch if the string does not contain digits
\sMatch if the string contains white space characters.
\SMatch if the string does not contain white space characters.
\wMatches if the string contains any word of characters ( a-z, A-Z, 0-5, _).
\WMatches if the string does not contain any word of characters ( a-z, A-Z, 0-5, _).
\ZMatches if the specified characters are at the end of a string.

The findall() Function

The findall() function containing a list containing matches.

Example: Extract ‘mm’ from a string.


import re
txt = 'Programming Funda'
result = re.findall('mm', txt)
print(result)

The Output will be:- [‘mm’]

If there are no matches found, an empty list will be returned.

Example


Import re
txt = 'Programming Funda'
result = re.findall('hello', txt)
print(result)

Example: Extract digits from a string.


import re
txt = 'Programming Funda 12 is the best tutorial site.1230'
result = re.findall("\d+", txt)
print(result)

The output will be:- [’12’, ‘1230’]

split() Function

The re.split() method split the matches where there is a match and return a list of strings.

Example:


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.'
result = re.split("\d+", txt)
print(result)

Output

['Programming Funda ', ' is the best ', ' tutorial site.']

If the pattern is not found, re.split() returns a list containing the original string.

You can pass maxsplit parameter to re.split() method.it a maximum number of split that will occur.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
result = re.split("\d+", txt, maxsplit = 2)
print(result)

Output

['Programming Funda ', ' is the best ', ' tutorial site.1230']

The sub() Function

The re.sub() the method is used to replace the match with the text of your choice.

Syntax

re.sub(pattter, replace, string)

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'

# patter to search
pattern = "\s+"

# replace variable
replace = ''

result = re.sub(pattern, replace, txt)
print(result)

Output

ProgrammingFunda12isthebest23tutorialsite.1230

Note:- If the pattern is not found, re.sub() the function will return the original string.

If the pattern is not found, re.sub() return the original string. You can control the number of occurrences with the count parameter in re.sub() method.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'

# pattern 
pattern = "\d+"

# variable
replace = ''


result = re.sub(pattern, replace, txt, 2)
print(result)

Output

Programming Funda  is the best  tutorial site.1230

The subn() Function

The re.subn() the method is similar to the re.sub() method but it returns a tuple of two items containing the new string and number of substitution.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'

# pattern 
pattern = "\d+"

# variable
replace = ''


result = re.subn(pattern, replace, txt)
print(result)

Output

('Programming Funda  is the best  tutorial site.', 3)

The search() Function

The re.search() function is used to search a string for a matches and return a math objects.
If more than one matches, Only first occurrence will be occurred.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'

# pattern 
pattern = "\d+"

result = re.search(pattern,txt)
print(result)

Output

<re.Match object; span=(18, 20), match='12'>

Match Object

You can get method and attribute of match object using dir() function.

Example


import re
txt = 'Programming Funda 12 is the best 23 tutorial site.1230'
result = re.search('mm',txt)
print(dir(result))

Output


['__class__', '__copy__', '__deepcopy__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'end', 'endpos', 'expand', 'group', 'groupdict', 'groups', 'lastgroup', 'lastindex', 'pos', 're', 'regs', 'span', 'start', 'string']

match.group()

The group() method returns the part of the string where there is a match.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2}) ',txt)
print(result.group())

The Output will be:- 400 20

The group() method return part of the string where there is a matches. You can get the part of string using following parenthesis.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2}) ',txt)
print(result.group(1))
print(result.group(2))
print(result.group(1,2))
print(result.groups())

Output


400
20
('400', '20')
('400', '20')

match.start() and match.end() and match.span()

The match.start() retun the index of start matching substring, similarly match.end() return the index of end matchig substring.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.start(), end=' ')
print(result.end())

The Output will be:- 2 8

The span() function returns a tuple containing start and end index of the matched part.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.span())

The output will be:- (2, 8)

match.re and match.string

The match.re attribute return a regular expression, similarly match.string return passing string.

Example


import re
txt = '39801 400 20 1111'
result = re.search('(\d{3}) (\d{2})',txt)
print(result.re)
print(result.string)

Output


re.compile('(\\d{3}) (\\d{2})')
39801 400 20 1111

Conclusion

So in this article, you have learn all about Python regular expression with the help of the exmaples.re module in Python is going to be very helpful when you want to search more complex pattern string.

I will highly recommend you, If you are Python developer then you should have knowledge of regular expression because it is going to be very helpful in your regular Python journy and also in your real time Python project.

I hope this article will have helpf you, please share with someone which want to learn regular expression in Python.

If you like this artile, please share and keep visit for further Python tutorials.

Reference:- Click Here

~ Thanks for reading

Context Manager in Python
Python Tutorial For Beginners 2024

Related Posts