1. Introduction

The re module was added in Python 1.5, and provides Perl-style regular expressions. Earlier versions of Python provided the regex module, which provides Emacs-style expressions. Emacs-style is slightly less readable, and doesn't provide as many features, so there's not much reason to use the regex module when writing new code, though you should be aware of it in order to read older code.

Regular expressions (or REs) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the re module. Using this little language, you specify the rules for the set of possible strings that you want to match; this set might contain English sentences, or e-mail addresses, or TeX commands, or anything you like. You can then ask questions such as ``Does this string match the pattern?'', or ``Is there a match for the pattern anywhere in this string?''. You can also use REs to modify a string, or to split it apart in various ways.

Regular expression patterns are compiled into a series of bytecodes, which are then executed by a matching engine written in C. For advanced use, it may be necessary to pay careful attention to how the engine will execute a given RE, and optimize the RE in order to produce bytecode that runs faster. Optimization isn't really covered in this document, because it requires that you have a good understanding of the matching engine's internals.

The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions. There are also tasks that can be done with regular expressions, but the expressions turn out to be very complicated. In these cases, you may be better off writing Python code to do the processing; while Python code will be slower than an elaborate regular expression, it will also probably be more understandable.