3.3 Performing Matches

Once you have an object representing a compiled regular expression, what do you do with it? RegexObject instances have several methods and attributes. Only the most significant ones will be covered here; consult the Library Reference for a complete listing.

Method/Attribute   Purpose  
match Determine if the RE matches at the beginning of the string. 
search Scan through a string, looking for any location where this RE matches. 
split Split the string into a list, splitting it wherever the RE matches 
sub Find all substrings where the RE matches, and replace them with a different string 
subn Does the same thing as sub(), except you can limit the number of replacements 

These methods return None if no match can be found. If they're successful, a MatchObject instance is returned, containing information about the match: where it starts and ends, the substring it matched, and more.

You can learn about this by interactively experimenting with the re module. (If you have Tkinter available, you may also want to look at regexdemo.py, a demonstration program included with the Python distribution. It allows you to enter REs and strings, and displays whether the RE matches or fails. regexdemo.py can be quite useful when trying to debug a complicated RE.)

First, run the Python interpreter, import the re module, and compile a RE:

Python 1.5.1 (#6, Jul 17 1998, 20:38:08)  [GCC] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import re
>>> p = re.compile('[a-z]+')
>>> p
<re.RegexObject instance at 80c3c28>
Now, you can try matching various strings against the RE [a-z]+. An empty string shouldn't match at all, since + means 'one or more repetitions'. match() should return None in this case, which will cause the interpreter to print no output. You can explicitly print the result of match() to make this clear.

>>> p.match( "" )
>>> print p.match( "" )
Now, let's try it on a string that it should match, such as "tempo". In this case, match() will return a MatchObject, so you should store the result in a variable for later use.

>>> m = p.match( 'tempo')
>>> print m
<re.MatchObject instance at 80c4f68>
Now you can query the MatchObject for information about the matching string. MatchObject instances also have several methods and attributes; the most important ones are:

Method/Attribute   Purpose  
group() Return the string matched by the RE 
start() Return the starting position of the match 
end() Return the ending position of the match 
span() Return a tuple containing the (start, end) of the match 

Trying these methods will soon clarify their meaning:

>>> m.group()
>>> m.start(), m.end()
(0, 5)
>>> m.span()
(0, 5)
group() returns the substring that was matched by the RE. start() and end() return the starting and ending index of the match. span() returns both start and end indexes in a single tuple. Since the match method only checks if the RE matches at the start of a string, start() will always be zero. However, the search method of RegexObject instances scans through the string, so the match may not start at zero in that case.

>>> print p.match('::: message')
>>> m = p.search('::: message') ; print m
<re.MatchObject instance at 80c9650>
>>> m.group()
>>> m.span()
(4, 11)
In actual programs, the most common style is to store the MatchObject in a variable, and then check if it was None. This usually looks like:

p = re.compile( ... )
m = p.match( 'string goes here' )
if m:
    print 'Match found: ', m.group()
    print 'No match'