Page 62 - Mercury Manual.book
P. 62
57 Content Control
Mercury's Content Control Filtering Language
Making the most of regular expressions
The CONTAINS test does a simple string search, looking for the exact text you provide any-
where in the message. Often, however, you may want to look for patterns of text rather than
exact strings: you can do this by using a MATCHES test instead of a CONTAINS test, because
MATCHES tests use a special pattern-matching mechanism called a regular expression to de-
scribe the general form of text you want to find.
Using regular expressions, you can detect extremely complex patterns of text within the mes-
sages you filter. Mercury's regular expression uses what is known as a metasyntax to describe
the pattern you want to match: in the metasyntax, certain characters have special meanings Mercury’s regular expres-
sion engine predates
that Mercury applies to the text it is testing. The following special characters are recognized Posix, Perl and other
in your expressions: regex implementations, so
if you are used to those
formats, you may find it a
little idiosyncratic.
* Match any number of any characters
? Match any single character (must match at least one character)
+ Match one or more occurrence of the last character or pattern
[...] Set matching: the test will succeed if the next character in the in-
put matches any character in the set. Ranges can be specified in
the set using '-' (for example, [a-k] specifies every character
from a to k inclusive)
[^...] Set negation: the test will succeed if the next character in the in-
put does not match any character in the set.
/w Match zero or more whitespace characters
/W Match one or more whitespace characters
/c Toggle case sensitivity (case-insensitive by default)
You can use any number of metacharacters in an expression - so, for example, to detect all
users at any system within the domain "spam.com", you could use the regular expression
*@*.spam.com
The set operation is especially powerful, particularly when combined with the repeat occur-
rence operator: so, to detect a message where the subject line ends in a group of three or more
digits (a common indicator of a spam message) you would use this expression:
Subject:*[0-9][0-9][0-9]+
In this expression, we use the "*" operator to match the general text within the subject line,
then we use the set "[0-9]" three times to force a minimum of three digits, and a "+" operator
to detect any extra digits following the third one. Because there is no "*" at the end of the
expression, the digits must therefore be the last characters on the line - if there is any text fol-
lowing them, the expression will fail.
Case sensitivity Normally, Mercury compares all text on a case-insenstive basis - that means
that it will regard "hello" and "HELLO" as being the same. In some cases, though, the case
of the text you're matching can be important, so the /c operator allows you to toggle Mercury
between case insensitive and case-sensitive comparisons. So, to detect the string "FREE!" an-
ywhere within the subject line of a message, you would use this expression:
Subject:/c*FREE!*