Hello everyone......
In this chapter, we are going to learn what is regular expression in python.
Course content
- What is RegEx?
- How it is used?
- Metacharacter.
- The search () function.
- The findall () function.
- The split () function.
- The sub-function.
How it is used?
- The regular expression is a sequence of characters (s) mainly used to find and replace patterns in a string or file.
- Regular expression use two types of characters:
- Metacharacter-are character with special meaning
- Literals (like 1,2,3,a,b.....)
What is RegEx?
- Regular expressions are used to identify whether a pattern exists in a given sequence of characters (string) or not.
- Using these regular expressions, you specify the rules for the set of possible strings that you want to match; this set might contain English sentences, or e-mail addresses, or anything you like.
- They help in manipulating textual data, which is often a prerequisite for data science projects that involve text mining.
RegEx
- The most common uses of regular expressions are:
- Search a string (search and match)
- Finding a string (findall)
- Break string into substrings (split)
- Replace part of a string (sub)
import re
Metacharacters
The search () function
- The re. search() method takes a regular expression pattern and a string and searches for that pattern within the string.
- If the search is successful, search() returns a match object or None otherwise.
- The search proceeds through the string from start to end, stopping at the first match found.
- All of the patterns must be matched, but not all of the string.
- If m_obj = re.search(pat, str) is successful, m_obj is not None and in particular m_obj.group() is the matching text.
- Group extraction-The "group" feature of a regular expression allows you to pick out the parts of the matching text.
import re
string="We are using Regex for finding something in string"
pattern='ing'
m_obj=re.search(pattern,string)
print(m_obj)
Output=re.Match object; span=(9, 12), match='ing'>
print(m_obj.group())
Output=ing
print(m_obj.start(),m_obj.end())
Output=9 12
d=re.search('\d',"I was born on March 3")
if d:
print("Patter found=>",d.group())
else:
print("Pattern is not found")
Output=Patter found=> 3
line="We use regex for finding emails example123.py Jungareh3@gamil.com"
m=re.search('([\w\.]+)@([\w\.]+)',line)
m.group()
Output='Jungareh3@gamil.com'
if m:
print(m.groups())
print(m.group())
print(m.group(1))
print(m.group(2))
Output=('Jungareh3', 'gamil.com')
Jungareh3@gamil.com
Jungareh3
gamil.com
The findall () function
- findall() is the single most powerful function in the re module. findall() finds all the matches and returns them as a list of strings, with each string representing one match.
- Group Extraction
- The parenthesis () group mechanism can be combined with findall().
- If the pattern includes 2 or more parenthesis groups, then instead of an email in emails: of returning a list of strings, findall() returns a list of *tuples*. Each tuple represents one match example123.py xyz.com of the pattern.
line="jungareh3@gmail.com himasnhujungare0304@gmail.com"
m=re.findall('[\w\.]+@[\w\.]+',line)
print(m)
Output=['jungareh3@gmail.com', 'himasnhujungare0304@gmail.com']
emails=re.findall('([\w\.]+)@([\w\.])+',line)
print(emails)
Output=[('jungareh3'), ('himasnhujungare0304')]
for email in emails:
print(email[0],email[1])
Output=jungareh3
himasnhujungare0304
The split () function.
- The split() function returns a list where the string has been split at each match.
- Split string by the occurrences of pattern. If capturing parentheses are used in the pattern, then the text of all groups in the pattern is also returned as part of the resulting list.
- If the max split is nonzero, at most max split splits occur, and the remainder of the string is returned as the final element of the list
- \W- Matches any non-alphanumeric character.
- \D - Matches any non-decimal digit.
- \s - Matches where a string contains any whitespace character.
string="we are spliting the string"
m=re.split('\s',string)
print(m)
Output=['we', 'are', 'spliting', 'the', 'string']
string="we are spliting the string"
m=re.split('\W',string,maxsplit=2)
print(m)
Output=['we', 'are', 'spliting the string']
re.split('[A-Z|a-z]+','03ba3b2004')
Output=['03', '3', '2004']
re.split('\D','03ba3b2004')
Output=['03', '', '3', '2004']
The sub () function
- The re. sub(pat, replacement, str) function searches for all the instances of the pattern in the given string and replaces them.
- If the pattern is not found, re. sub() returns the original string.
- You can control the number of replacements by specifying the count parameter.
string="This is an example of sub()"
n=re.sub('\s','--',string)
print(n)
Output=This--is--an--example--of--sub()
string="This is an example of sub()"
n=re.sub('\s','--',string,count=2)
print(n)
Output=This--is--an example of sub()
Best regards from,
msbtenotes:)
Comments
Post a Comment
If you have any query, please let us know