Python re Module (Regular Expressions)
What is Python re Module?
The re module in Python is used for working with regular expressions (RegEx) — a sequence of characters that forms a search pattern. It's used for string matching, pattern searching, data validation, string manipulation, and more.
Why Use Regular Expressions in Python?
- To search and extract patterns (e.g., emails, phone numbers)
- To validate inputs (e.g., password format, URL structure)
- To split, replace, or clean text using patterns
Importing the re Module
import re
Commonly Used re Functions in Python
Function | Description |
---|---|
re.match() | Checks for a match only at the beginning of the string |
re.search() | Searches the entire string for the first match |
re.findall() | Returns a list of all matches |
re.finditer() | Returns an iterator yielding match objects |
re.sub() | Replaces matched patterns with a string |
re.split() | Splits a string by the matched pattern |
re.compile() | Compiles a pattern into a regex object |
1. re.match() – Match at the Beginning
import re
result = re.match(r'Hello', 'Hello World')
print(result.group())
Output:
Hello
If the pattern is not at the beginning, match() returns None.
2. re.search() – Search for a Pattern Anywhere
import re
result = re.search(r'World', 'Hello World')
print(result.group())
Output:
World
3. re.findall() – Find All Matches
text = 'Email: test1@gmail.com and test2@yahoo.com'
emails = re.findall(r'\S+@\S+', text)
print(emails)
Output:
['test1@gmail.com', 'test2@yahoo.com']
4. re.sub() – Replace Pattern in String
text = "Hello 123, this is 456"
new_text = re.sub(r'\d+', '#', text)
print(new_text)
Output:
Hello #, this is #
5. re.split() – Split String Using Pattern
text = "one,two;three four"
parts = re.split(r'[;, ]', text)
print(parts)
Output:
['one', 'two', 'three', 'four']
6. re.compile() – Compile and Reuse Pattern
pattern = re.compile(r'\d+')
matches = pattern.findall('Item1 = 10, Item2 = 20')
print(matches)
Output:
['1', '10', '2', '20']
Python Regular Expression Patterns (RegEx Syntax)
Pattern | Description |
---|---|
. | Any character except newline |
^ | Beginning of string |
$ | End of string |
* | 0 or more repetitions |
+ | 1 or more repetitions |
? | 0 or 1 repetition |
{n} | Exactly n repetitions |
{n,} | n or more repetitions |
{n,m} | Between n and m repetitions |
[] | Matches characters in brackets |
\d | Digit (0-9) |
\D | Non-digit |
\w | Alphanumeric |
\W | Non-alphanumeric |
\s | Whitespace |
\S | Non-whitespace |
| | OR operator |
() | Capture group |
Real-World Example: Extract Phone Numbers
text = "Call me at 9876543210 or 1234567890"
phones = re.findall(r'\d{10}', text)
print("Phone numbers:", phones)
Output:
Phone numbers: ['9876543210', '1234567890']
Real-World Example: Validate Email Address
email = "user@example.com"
is_valid = re.match(r'^\w+@\w+\.\w+$', email)
print("Valid Email?", bool(is_valid))
Output:
Valid Email? True