Changes between Initial Version and Version 1 of PatternQuoting


Ignore:
Timestamp:
Feb 18, 2012 4:08:13 AM (6 years ago)
Author:
rado
Comment:

migrate wiki.mutt.org

Legend:

Unmodified
Added
Removed
Modified
  • PatternQuoting

    v1 v1  
     1=== The super short story ===
     2For each parse-level that you use, you must quote (escape) all the `special` chars of the current parse-level,
     3including the chars quoting the special chars of a lower level, if they are special to the current level.
     4And this for '''each''' level: often you have at least 2 levels: a regexp and some mutt-related or just 2 mutt-related.
     5'''But''' each level has its own set of special chars, sometimes overlapping, but never identical with another level,
     6due to the different syntactical demands on every level.
     7----
     8== The short story ==
     9 * regexp arguments need their special chars "`\`"-quoted to use them literally (see 4.1 Regular Expressions).
     10 * mutt pattern arguments (see 4.2 Patterns, consist of "`~`"-key + regexp) need their own quoting for "`(|) `" -> double-escape common special chars.
     11 * muttrc command arguments are parsed with respect to MuttGuide/Syntax special chars -> double-escape common special (mostly space) chars.
     12  * many muttrc commands use "pattern" rather than "regexp" as arguments -> triple-escape.
     13 * nesting of commands (full commands as arguments to commands) adds an '''extra''' parse-level per MuttGuide/Syntax: double becomes triple, triple becomes quadruple.
     14Except for regexp level, you can quote a whole level with `''` instead of using "`\`" for each quote on that level.
     15Use `''` whenever possible rather than `""`, so you can save some "`\`", because single quotes quote backslash while double quotes don't.
     16----
     17= The long story =
     18
     19So you've decided to delve into the realm of writing your own
     20complex search pattern, but you've run into problems. You're sure
     21that your regular expression containing backslashes and
     22parenthesis is correct because you've tested it with egrep
     23and it works. So why isn't Mutt accepting it as valid?
     24
     25What you are overlooking is that Mutt's config language has its own
     26special chars that need to be quoted independently for each level shown above.
     27'''Mutt patterns''' have their own grouping operators
     28(''parenthesis, alternation'', '''space separator'''),
     29and so care must be taken to properly quote the regular expression you are trying to use.
     30What this means is that when you are putting a '''pattern'''
     31in a *-hook in your muttrc or "limit" or "tag-pattern" function at run-time,
     32you will end up having to ''double''-quote it (or triple or more).
     33
     34For example, suppose I ('''incorrectly''') define a complex pattern in a hook,
     35 `save-hook "~f (john|bob)@mutt\.org" +those-mutt-guys`
     36The intention of this is to use the default save folder
     37''+those-mutt-guys'' whenever the sender of the message is
     38''bob@mutt.org'' or ''john@mutt.org''. However, you quickly find
     39that Mutt displays an error when parsing this line.
     40
     41Perhaps the easiest way to explain what is wrong is to describe
     42exactly what the muttrc parser does when attempting to grok the
     43command... First off, the line is tokenized, which is to say that
     44each line is split into the individual words or strings. So that
     45leaves us with
     46 ` save-hook "~f (john|bob)@mutt\.org" +those-mutt-guys`
     47So far so good. The next step is to remove a layer of quotes (if
     48present) from each token, just like your favorite UNIX command
     49line shell does. When dequoting double-quotes, the backslash
     50character is removed from the string, leaving the following
     51character as a literal. So now the list of tokens becomes
     52 ` save-hook ~f (john|bob)@mutt.org +those-mutt-guys `
     53Here we start to see the ''first'' problem with our original guess as how to write the pattern.
     54You will notice that our backslash followed by a period is now just a single period by itself.
     55Instead of matching a literal period, it will now match any character!
     56Fortunately in this particular instance it doesn't matter much since
     57it isn't very likely to cause a false match, but if you are using a backslash
     58that really matters to the regular expression, you either need to use two backslashes
     59 `"~f @mutt\\.org"`
     60or, optionally, you can use single quotes instead of double
     61quotes, because no dequoting is done inside of single quotes:
     62 `'~f @mutt\.org'`
     63Continuing the example above that we started with, our next (incorrect) guess is to write
     64 ` save-hook "~f (john|bob)@mutt\\.org" +those-mutt-guys`
     65But we still get an error! Why? Remember that the parenthesis
     66characters are also used by Mutt's pattern language to
     67indicate logical grouping. For instance, if I wanted to match
     68email ''to'' or ''from'' a particular address, I would write
     69 ` save-hook "(~f me@mutt\\.org | ~t me@mutt\\.org)" +me`
     70So what is happening with our example is that mutt is
     71interpretting the parenthesis that are part of the regular
     72expression as part of the logical grouping operators rather than
     73the regular expression. In this particular case, the result is an
     74error because it is not a valid pattern expression. So how
     75do we fix the problem? '''Quoting''' of course! We have to quote
     76the regular expression inside of the quotes that protect the
     77entire search expression:
     78 ` save-hook "~f '(john|bob)@mutt\\.org'" +those-mutt-guys`
     79Now when Mutt parses the search pattern, it first sees
     80 ` ~f '(john|bob)@mutt\.org'`
     81(note the only change here is that the '''\\.''' has now become '''\.''')
     82and then knows that the entire second token is a regular
     83expression and doesn't attempt to parse it. Mutt will then
     84''dequote'' the string again, and the resulting regular expression
     85is
     86 ` (john|bob)@mutt\.org`
     87which is what we really meant to match in the first place.
     88
     89----
     90In the discussed example you've already encountered quad-quoting,
     91since "\" quotes ".", then itself on next level and there are both '' and "" surrounding it.
     92
     93Now that you understand why you need multiple quoting levels, you can run
     94into the trouble of needing to '''add a quoting level''', double-
     95becomes triple-quote (or worse) when you '''combine/ nest'''
     96commands which need this special quoting treatment (cmd as arg to cmd),
     97like macros, hooks, score, ...:
     98'''a cmd given as arg to a cmd is parsed when invoked, not when defined'''!
     99[[BR]]
     100Take care to quote at the proper place/ level. Horrible example:
     101 ` folder-hook . 'macro index <tab> "<collapse-all><enter-command>macro index \\\<tab\\\> \"<collapse-thread><buffy-list>\"<enter><buffy-list>"'`
     102This defines a hook to define a macro which once invoked (by another `folder-hook . "push <tab>"`) redefines itself.
     103
     104However, not all levels (regexp, pattern, muttrc) have the same special chars.
     105It won't hurt to quote chars unnecessarily, but it's easier on the eyes
     106if you limit it only to those requiring it on a given level.
     107|| literal ||  regexp ||  pattern ||  cmd-arg(muttrc) ||
     108||    |    ||  \|     || \\\|     || \\\\\\| ||
     109||    $    ||  \$     || \\$      || \\\\\$ ||
     110(The "|" is not special for muttrc, therefore only the "\" are quoted, no extra "\" for the "|".
     111There are more "gaps". "$" is special for regexp and muttrc, but not for patterns.
     112"space" is special for muttrc and pattern, but not for regexp. So watch out.)
     113
     114You can use `'...'` for a whole pattern or cmd-arg to save a level
     115of "\" on that `'-quoted` range (drop "\" per special char of that level).
     116`"..."` works for pattern+muttrc level, too, but it doesn't quote all chars, see MuttGuide/Syntax.
     117You can nest `'-quoting` within `"-quoting` (and vice versa) only once,
     118another level of a repeated quote-char would need the " or ' to be quoted itself.
     119 `\\\\\\| -> '\\\|' -> "'\\|'"`
     120
     121You can make your life of nested quoting easier if you use the "source"
     122command instead of trying to install a highly complicated multi-level quoted command.
     123
     124----
     125There is another pretty pathetic case when you
     126 ` bind editor '\' quote-char`
     127This affects all special chars for '''any''' input in dialogs/ prompts, which includes arguments
     128to the "limit-pattern" prompt defined by "macros", too!
     129When you have this (very confusing) binding and a "macro" like this:
     130 ` macro index Sm '<limit>~b "l.ttery|pr.ce|d.ll.r|\\$[[:digit:]]+"<enter>'`
     131then you will have to quote all occurances of "\" ''again'' -> triple altogether,
     132because the binding will "eat up" every other "\" for being quoted at input level.