GOAL
Today’s goal is to check if the string follows the format and to get the value according to the format. The following is an example.
# the format is "My name is <NAME>, the phone number is <NUMBER>" (<NUMBER> is separated by 2 '-')
str1 = "My name is Nako, the phone number is 123-456-7890"
#str1 follows the format, <NAME> is "Nako" and <NUMBER> is "1234567890" in this case
str2 = "I am Nako, the phone number is 1234567890"
# str2 doesn't follow the format
Environment
Python 3.8.7
Method
Use regular expression
Create regular expression object with re.compile(pattern, flags=0). It can be used to check if the string matches the pattern or not by using Pattern.match(string) function.
# coding: utf-8
import re
str1 = "My name is Nako, the phone number is 123-456-7890"
str2 = "I am Nako, the phone number is 1234567890"
prog = re.compile("My name is [A-Za-z]+, the phone number is [0-9]+-[0-9]+-[0-9]+$")
print(prog.match(str1))
# output => <re.Match object; span=(0, 49), match='My name is Nako, the phone number is 123-456-7890>
print(prog.match(str2))
# output => None
You can get the value by grouping the regular expression using (). The value of each group can be get with Match.group([group1, …])
gourp(0) is the entire match and group(n) is the match for n-th group.
# coding: utf-8
import re
str1 = "My name is Nako, the phone number is 123-456-7890"
prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]+-[0-9]+-[0-9]+)$")
match = prog.match(str1)
if match:
print(match.group(0)) # output => My name is Nako, the phone number is 123-456-7890
print(match.group(1)) # output => Nako
print(match.group(2)) # output => 123-456-7890
print(match.group(2).replace('-', '')) # output => 1234567890
Postscript
$ as the end of the string
You should use $ that matches the end of the string because Pattern.match(string) finds zero or more characters at the beginning of string match.
fullmatch method
If you don’t use $, you should use Pattern.fullmatch(string) instead of Pattern.match(string).
str1 = "My name is Nako, the phone number is 123-456-7890ABCDE"
prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]+-[0-9]+-[0-9]+)")
print(prog.match(str1))
# output => <re.Match object; span=(0, 49), match='My name is Nako, the phone number is 123-456-7890>
print(prog.fullmatch(str1))
# output => None
pattern with length limitation
The length of the input can be limited by {} in regular expression.
str1 = "My name is Nako, the phone number is 123-456-78900000"
prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]{3}-[0-9]{3}-[0-9]{4})$")
print(prog.match(str1))
# output => None