How Get Value With Format in Python
GOAL
Today’s goal is to check if the string follows the format and to get the value according to the format. The following is an example.
# the format is "My name is <NAME>, the phone number is <NUMBER>" (<NUMBER> is separated by 2 '-') str1 = "My name is Nako, the phone number is 123-456-7890" #str1 follows the format, <NAME> is "Nako" and <NUMBER> is "1234567890" in this case str2 = "I am Nako, the phone number is 1234567890" # str2 doesn't follow the format
Environment
Python 3.8.7
Method
Use regular expression
Create regular expression object with re.compile(pattern, flags=0). It can be used to check if the string matches the pattern or not by using Pattern.match(string) function.
# coding: utf-8 import re str1 = "My name is Nako, the phone number is 123-456-7890" str2 = "I am Nako, the phone number is 1234567890" prog = re.compile("My name is [A-Za-z]+, the phone number is [0-9]+-[0-9]+-[0-9]+$") print(prog.match(str1)) # output => <re.Match object; span=(0, 49), match='My name is Nako, the phone number is 123-456-7890> print(prog.match(str2)) # output => None
You can get the value by grouping the regular expression using (). The value of each group can be get with Match.group([group1, …])
gourp(0) is the entire match and group(n) is the match for n-th group.
# coding: utf-8 import re str1 = "My name is Nako, the phone number is 123-456-7890" prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]+-[0-9]+-[0-9]+)$") match = prog.match(str1) if match: print(match.group(0)) # output => My name is Nako, the phone number is 123-456-7890 print(match.group(1)) # output => Nako print(match.group(2)) # output => 123-456-7890 print(match.group(2).replace('-', '')) # output => 1234567890
Postscript
$ as the end of the string
You should use $ that matches the end of the string because Pattern.match(string) finds zero or more characters at the beginning of string match.
fullmatch method
If you don’t use $, you should use Pattern.fullmatch(string) instead of Pattern.match(string).
str1 = "My name is Nako, the phone number is 123-456-7890ABCDE" prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]+-[0-9]+-[0-9]+)") print(prog.match(str1)) # output => <re.Match object; span=(0, 49), match='My name is Nako, the phone number is 123-456-7890> print(prog.fullmatch(str1)) # output => None
pattern with length limitation
The length of the input can be limited by {} in regular expression.
str1 = "My name is Nako, the phone number is 123-456-78900000" prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]{3}-[0-9]{3}-[0-9]{4})$") print(prog.match(str1)) # output => None