How Get Value With Format in Python

GOAL

Today’s goal is to check if the string follows the format and to get the value according to the format. The following is an example.

# the format is "My name is <NAME>, the phone number is <NUMBER>" (<NUMBER> is separated by 2 '-')

str1 = "My name is Nako, the phone number is 123-456-7890"
#str1 follows the format, <NAME> is "Nako" and <NUMBER> is "1234567890" in this case

str2 = "I am Nako, the phone number is 1234567890"
# str2 doesn't follow the format

Environment

Python 3.8.7

Method

Use regular expression

Create regular expression object with re.compile(pattern, flags=0). It can be used to check if the string matches the pattern or not by using Pattern.match(string) function.

# coding: utf-8
import re

str1 = "My name is Nako, the phone number is 123-456-7890"
str2 = "I am Nako, the phone number is 1234567890"

prog = re.compile("My name is [A-Za-z]+, the phone number is [0-9]+-[0-9]+-[0-9]+$")

print(prog.match(str1))
# output => <re.Match object; span=(0, 49), match='My name is Nako, the phone number is 123-456-7890>
print(prog.match(str2))
# output => None

You can get the value by grouping the regular expression using (). The value of each group can be get with Match.group([group1, …])

gourp(0) is the entire match and group(n) is the match for n-th group.

# coding: utf-8
import re

str1 = "My name is Nako, the phone number is 123-456-7890"

prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]+-[0-9]+-[0-9]+)$")

match = prog.match(str1)
if match:
    print(match.group(0)) # output => My name is Nako, the phone number is 123-456-7890
    print(match.group(1)) # output => Nako
    print(match.group(2)) # output => 123-456-7890
    print(match.group(2).replace('-', '')) # output => 1234567890

Postscript

$ as the end of the string

You should use $ that matches the end of the string because Pattern.match(string) finds zero or more characters at the beginning of string match.

fullmatch method

If you don’t use $, you should use Pattern.fullmatch(string) instead of Pattern.match(string).

str1 = "My name is Nako, the phone number is 123-456-7890ABCDE"
prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]+-[0-9]+-[0-9]+)")

print(prog.match(str1))
# output => <re.Match object; span=(0, 49), match='My name is Nako, the phone number is 123-456-7890>

print(prog.fullmatch(str1))
# output => None

pattern with length limitation

The length of the input can be limited by {} in regular expression.

str1 = "My name is Nako, the phone number is 123-456-78900000"
prog = re.compile("My name is ([A-Za-z]+), the phone number is ([0-9]{3}-[0-9]{3}-[0-9]{4})$")

print(prog.match(str1))
# output => None