Page 62 - Mercury
P. 62
57 Content Control
Mercury's Content Control Filtering Language
Making the most of regular expressions
The CONTAINS test does a simple string search, looking for the exact text you provide any-
where in the message. Often, however, you may want to look for patterns of text rather than
exact strings: you can do this by using a MATCHES test instead of a CONTAINS test, because
MATCHES tests use a special pattern-matching mechanism called a regular expression to de-
scribe the general form of text you want to find.
Using regular expressions, you can detect extremely complex patterns of text within the mes-
sages you filter. Mercury's regular expression uses what is known as a metasyntax to describe
the pattern you want to match: in the metasyntax, certain characters have special meanings Mercury’s regular expres-
sion engine predates
that Mercury applies to the text it is testing. The following special characters are recognized Posix, Perl and other
in your expressions: regex implementations, so
if you are used to those
formats, you may find it a
little idiosyncratic.
* Match any number of any characters
? Match any single character (must match at least one character)
+ Match one or more occurrence of the last character or pattern
[...] Set matching: the test will succeed if the next character in the in-
put matches any character in the set. Ranges can be specified in
the set using '-' (for example, [a-k] specifies every character
from a to k inclusive)
[^...] Set negation: the test will succeed if the next character in the in-
put does not match any character in the set.
/w Match zero or more whitespace characters
/W Match one or more whitespace characters
/c Toggle case sensitivity (case-insensitive by default)
You can use any number of metacharacters in an expression - so, for example, to detect all
users at any system within the domain "", you could use the regular expression
The set operation is especially powerful, particularly when combined with the repeat occur-
rence operator: so, to detect a message where the subject line ends in a group of three or more
digits (a common indicator of a spam message) you would use this expression:
In this expression, we use the "*" operator to match the general text within the subject line,
then we use the set "[0-9]" three times to force a minimum of three digits, and a "+" operator
to detect any extra digits following the third one. Because there is no "*" at the end of the
expression, the digits must therefore be the last characters on the line - if there is any text fol-
lowing them, the expression will fail.
Case sensitivity Normally, Mercury compares all text on a case-insenstive basis - that means
that it will regard "hello" and "HELLO" as being the same. In some cases, though, the case
of the text you're matching can be important, so the /c operator allows you to toggle Mercury
between case insensitive and case-sensitive comparisons. So, to detect the string "FREE!" an-
ywhere within the subject line of a message, you would use this expression: