wiki:PatternQuoting

The super short story

For each parse-level that you use, you must quote (escape) all the special chars of the current parse-level, including the chars quoting the special chars of a lower level, if they are special to the current level. And this for each level: often you have at least 2 levels: a regexp and some mutt-related or just 2 mutt-related. But each level has its own set of special chars, sometimes overlapping, but never identical with another level, due to the different syntactical demands on every level.


The short story

  • regexp arguments need their special chars "\"-quoted to use them literally (see 4.1 Regular Expressions).
  • mutt pattern arguments (see 4.2 Patterns, consist of "~"-key + regexp) need their own quoting for "(|) " -> double-escape common special chars.
  • muttrc command arguments are parsed with respect to MuttGuide/Syntax special chars -> double-escape common special (mostly space) chars.
    • many muttrc commands use "pattern" rather than "regexp" as arguments -> triple-escape.
  • nesting of commands (full commands as arguments to commands) adds an extra parse-level per MuttGuide/Syntax: double becomes triple, triple becomes quadruple.

Except for regexp level, you can quote a whole level with '' instead of using "\" for each quote on that level. Use '' whenever possible rather than "", so you can save some "\", because single quotes quote backslash while double quotes don't.


The long story

So you've decided to delve into the realm of writing your own complex search pattern, but you've run into problems. You're sure that your regular expression containing backslashes and parenthesis is correct because you've tested it with egrep and it works. So why isn't Mutt accepting it as valid?

What you are overlooking is that Mutt's config language has its own special chars that need to be quoted independently for each level shown above. Mutt patterns have their own grouping operators (parenthesis, alternation, space separator), and so care must be taken to properly quote the regular expression you are trying to use. What this means is that when you are putting a pattern in a *-hook in your muttrc or "limit" or "tag-pattern" function at run-time, you will end up having to double-quote it (or triple or more).

For example, suppose I (incorrectly) define a complex pattern in a hook,

save-hook "~f (john|bob)@mutt\.org" +those-mutt-guys

The intention of this is to use the default save folder +those-mutt-guys whenever the sender of the message is bob@… or john@…. However, you quickly find that Mutt displays an error when parsing this line.

Perhaps the easiest way to explain what is wrong is to describe exactly what the muttrc parser does when attempting to grok the command... First off, the line is tokenized, which is to say that each line is split into the individual words or strings. So that leaves us with

save-hook "~f (john|bob)@mutt\.org" +those-mutt-guys

So far so good. The next step is to remove a layer of quotes (if present) from each token, just like your favorite UNIX command line shell does. When dequoting double-quotes, the backslash character is removed from the string, leaving the following character as a literal. So now the list of tokens becomes

save-hook ~f (john|bob)@mutt.org +those-mutt-guys

Here we start to see the first problem with our original guess as how to write the pattern. You will notice that our backslash followed by a period is now just a single period by itself. Instead of matching a literal period, it will now match any character! Fortunately in this particular instance it doesn't matter much since it isn't very likely to cause a false match, but if you are using a backslash that really matters to the regular expression, you either need to use two backslashes

"~f @mutt\\.org"

or, optionally, you can use single quotes instead of double quotes, because no dequoting is done inside of single quotes:

'~f @mutt\.org'

Continuing the example above that we started with, our next (incorrect) guess is to write

save-hook "~f (john|bob)@mutt\\.org" +those-mutt-guys

But we still get an error! Why? Remember that the parenthesis characters are also used by Mutt's pattern language to indicate logical grouping. For instance, if I wanted to match email to or from a particular address, I would write

save-hook "(~f me@mutt\\.org | ~t me@mutt\\.org)" +me

So what is happening with our example is that mutt is interpretting the parenthesis that are part of the regular expression as part of the logical grouping operators rather than the regular expression. In this particular case, the result is an error because it is not a valid pattern expression. So how do we fix the problem? Quoting of course! We have to quote the regular expression inside of the quotes that protect the entire search expression:

save-hook "~f '(john|bob)@mutt\\.org'" +those-mutt-guys

Now when Mutt parses the search pattern, it first sees

~f '(john|bob)@mutt\.org'

(note the only change here is that the
.
has now become \.) and then knows that the entire second token is a regular expression and doesn't attempt to parse it. Mutt will then dequote the string again, and the resulting regular expression is

(john|bob)@mutt\.org

which is what we really meant to match in the first place.


In the discussed example you've already encountered quad-quoting, since "\" quotes ".", then itself on next level and there are both and "" surrounding it.

Now that you understand why you need multiple quoting levels, you can run into the trouble of needing to add a quoting level, double- becomes triple-quote (or worse) when you combine/ nest commands which need this special quoting treatment (cmd as arg to cmd), like macros, hooks, score, ...: a cmd given as arg to a cmd is parsed when invoked, not when defined!
Take care to quote at the proper place/ level. Horrible example:

folder-hook . 'macro index <tab> "<collapse-all><enter-command>macro index \\\<tab\\\> \"<collapse-thread><buffy-list>\"<enter><buffy-list>"'

This defines a hook to define a macro which once invoked (by another folder-hook . "push <tab>") redefines itself.

However, not all levels (regexp, pattern, muttrc) have the same special chars. It won't hurt to quote chars unnecessarily, but it's easier on the eyes if you limit it only to those requiring it on a given level.

literal regexp pattern cmd-arg(muttrc)
| \|
\|



|
$ \$
$


\$

(The "|" is not special for muttrc, therefore only the "\" are quoted, no extra "\" for the "|". There are more "gaps". "$" is special for regexp and muttrc, but not for patterns. "space" is special for muttrc and pattern, but not for regexp. So watch out.)

You can use '...' for a whole pattern or cmd-arg to save a level of "\" on that '-quoted range (drop "\" per special char of that level). "..." works for pattern+muttrc level, too, but it doesn't quote all chars, see MuttGuide/Syntax. You can nest '-quoting within "-quoting (and vice versa) only once, another level of a repeated quote-char would need the " or ' to be quoted itself.

\\\\\\| -> '\\\|' -> "'\\|'"

You can make your life of nested quoting easier if you use the "source" command instead of trying to install a highly complicated multi-level quoted command.


There is another pretty pathetic case when you

bind editor '\' quote-char

This affects all special chars for any input in dialogs/ prompts, which includes arguments to the "limit-pattern" prompt defined by "macros", too! When you have this (very confusing) binding and a "macro" like this:

macro index Sm '<limit>~b "l.ttery|pr.ce|d.ll.r|\\$[[:digit:]]+"<enter>'

then you will have to quote all occurances of "\" again -> triple altogether, because the binding will "eat up" every other "\" for being quoted at input level.

Last modified 5 years ago Last modified on Feb 18, 2012 4:08:13 AM