Another common task is to find all the matches for a pattern, and
replace them with a different string. The sub() method
takes a replacement value, which can be either a string or a function,
and the string to be processed. Python strings are immutable, so this
function will return a new string.
- sub (replacement, string[, count = 0])
-
Returns the string obtained by replacing the leftmost non-overlapping
occurrences of the RE in string by the replacement
replacement. If the pattern isn't found, string is returned
unchanged.
The optional argument count is the maximum number of pattern
occurrences to be replaced; count must be a non-negative
integer. The default value of 0 means to replace all occurrences.
Here's a simple example of using the sub() method.
>>> p = re.compile( '(blue|white|red)')
>>> p.sub( 'colour', 'blue socks and red shoes')
'colour socks and colour shoes'
>>> p.sub( 'colour', 'blue socks and red shoes', 1)
'colour socks and red shoes'
Empty matches are replaced only when not they're not
adjacent to a previous match.
>>> p = re.compile('x*')
>>> p.sub('-', 'abxd')
'-a-b-d-'
If replacement is a string, any backslash escapes in it are
processed. That is, "\n" is converted to a single newline
character, "\r" is converted to a carriage return, and so forth.
Unknown escapes such as "\j" are left alone. Backreferences,
such as "\6", are replaced with the substring matched by the
corresponding group in the RE. This lets you incorporate
portions of the original text in the resulting
replacement string.
>>> p = re.compile('section{ ( [^}]* ) }', re.VERBOSE)
>>> p.sub(r'subsection{\1}','section{First} section{second}')
'subsection{First} subsection{second}'
In addition to character escapes and backreferences as described
above, "\g<name>" will use the substring matched by the group
named "name", as defined by the (?P<name>...) syntax.
"\g<number>" uses the corresponding group number.
"\g<2>" is therefore equivalent to "\2", but isn't
ambiguous in a replacement string such as "\g<2>0". ("\
20" would be interpreted as a reference to group 20, not a reference
to group 2 followed by the literal character "0".) The
following substitutions are all equivalent, but use all three
variations of the replacement string.
>>> p = re.compile('section{ (?P<name> [^}]* ) }', re.VERBOSE)
>>> p.sub(r'subsection{\1}','section{First}')
'subsection{First}'
>>> p.sub(r'subsection{\g<1>}','section{First}')
'subsection{First}'
>>> p.sub(r'subsection{\g<name>}','section{First}')
'subsection{First}'
replacement can also be a function, which gives you even more
powerful control. If replacement is a function, the function is
called for every non-overlapping occurance of pattern, and is
passed a MatchObject argument. The function can use that
information to compute the desired replacement string and return it.
For example:
>>> def hexrepl( match ):
... "Return the hex string for a decimal number"
... value = int( match.group() )
... return hex(value)
...
>>> p = re.compile(r'\d+')
>>> p.sub(hexrepl, 'Call 65490 for printing, 49152 for user code.')
'Call 0xffd2 for printing, 0xc000 for user code.'
When using the module-level re.sub() function, the pattern
is passed as the first argument. The pattern may be a string or a
RegexObject; if you need to specify regular expression flags,
you must either use a RegexObject as the first parameter, or use
embedded modifiers in the pattern, e.g. sub("(?i)b+", "x", "bbbb
BBBB") returns 'x x'.