Page 62 - Mercury Manual.book
P. 62

57     Content Control
                Mercury's Content Control Filtering Language



               Making the most of regular expressions
               The CONTAINS test does a simple string search, looking for the exact text you provide any-
               where in the message. Often, however, you may want to look for patterns of text rather than
               exact strings: you can do this by using a MATCHES test instead of a CONTAINS test, because
               MATCHES tests use a special pattern-matching mechanism called a regular expression to de-
               scribe the general form of text you want to find.

               Using regular expressions, you can detect extremely complex patterns of text within the mes-
               sages you filter. Mercury's regular expression uses what is known as a metasyntax to describe
               the pattern you want to match: in the metasyntax, certain characters have special meanings   Mercury’s regular expres-
                                                                                                 sion engine predates
               that Mercury applies to the text it is testing. The following special characters are recognized   Posix, Perl and other
               in your expressions:                                                              regex implementations, so
                                                                                                 if you are used to those
                                                                                                 formats, you may find it a
                                                                                                 little idiosyncratic.
                     *         Match any number of any characters
                     ?         Match any single character (must match at least one character)
                     +         Match one or more occurrence of the last character or pattern
                     [...]     Set matching: the test will succeed if the next character in the in-
                               put matches any character in the set. Ranges can be specified in
                               the set using '-' (for example, [a-k] specifies every character
                               from a to k inclusive)
                     [^...]    Set negation: the test will succeed if the next character in the in-
                               put does not match any character in the set.
                     /w        Match zero or more whitespace characters
                     /W        Match one or more whitespace characters
                     /c        Toggle case sensitivity (case-insensitive by default)


               You can use any number of metacharacters in an expression - so, for example, to detect all
               users at any system within the domain "spam.com", you could use the regular expression

                  *@*.spam.com

               The set operation is especially powerful, particularly when combined with the repeat occur-
               rence operator: so, to detect a message where the subject line ends in a group of three or more
               digits (a common indicator of a spam message) you would use this expression:

                  Subject:*[0-9][0-9][0-9]+

               In this expression, we use the "*" operator to match the general text within the subject line,
               then we use the set "[0-9]" three times to force a minimum of three digits, and a "+" operator
               to detect any extra digits following the third one. Because there is no "*" at the end of the
               expression, the digits must therefore be the last characters on the line - if there is any text fol-
               lowing them, the expression will fail.

               Case sensitivity  Normally, Mercury compares all text on a case-insenstive basis - that means
               that it will regard "hello" and "HELLO" as being the same. In some cases, though, the case
               of the text you're matching can be important, so the  /c operator allows you to toggle Mercury
               between case insensitive and case-sensitive comparisons. So, to detect the string "FREE!" an-
               ywhere within the subject line of a message, you would use this expression:

                  Subject:/c*FREE!*
   57   58   59   60   61   62   63   64   65   66   67