Ticket #3090 (closed defect: fixed)

Opened 8 weeks ago

Last modified 2 weeks ago

Mutt removes In-Reply-To header field

Reported by: azschach Owned by: mutt-dev
Priority: minor Milestone: 1.6
Component: mutt Version: 1.5.18
Keywords: Cc:

Description

On replying to a message, which Message-ID contains >1 "@" signs, the In-Reply-To field is first set correct (when editing the message) but after saving the message, the In-Reply-To field is set empty.

Attachments

mutt-3090.patch (0.9 kB) - added by Aron Griffis 8 weeks ago.
Added by email2trac
mutt-3090-references.patch (23.6 kB) - added by Aron Griffis 7 weeks ago.
Added by email2trac
mutt-3090-references-v2.patch (23.9 kB) - added by agriffis 7 weeks ago.
3090-what-at-signs.patch (0.9 kB) - added by agriffis 7 weeks ago.
3090-mutt_extract_message_id.patch (3.6 kB) - added by agriffis 7 weeks ago.
3090-mutt_extract_message_id-no-static.patch (4.9 kB) - added by agriffis 7 weeks ago.

Change History

  Changed 8 weeks ago by Aron Griffis

Mutt wrote:  [Thu Jul 03 2008, 12:13:42PM EDT]
>  On replying to a message, which Message-ID contains >1 "@" signs, the In-
>  Reply-To field is first set correct (when editing the message) but after
>  saving the message, the In-Reply-To field is set empty.

RFC 822 specifies the msg-id format as <addr-spec>, which is
required to have only one "@" sign.  Do you know what mailer is
generating more than one "@" in a message-id?

  Changed 8 weeks ago by Aron Griffis

The problem here is that mutt is more liberal parsing Message-Id
than it is parsing In-Reply-To; see mutt_extract_message_id() vs.
mutt_parse_references() in parse.c

So when mutt initially builds the message, it copies the
Message-Id to the In-Reply-To without additional checking.  But
when edit_headers is set, mutt parses the In-Reply-To line
according to the stricter rules.  Technically there shouldn't be
two "@" signs in the In-Reply-To, so it drops the header
entirely.

The solution is either to tighten mutt_extract_message_id() or
loosen mutt_parse_references().  Personally I'd go with the
latter; probably no harm done to accept multiple "@" signs, and
mutt will still adhere to RFC 822 in its own generation of the
Message-Id.  Postel's law.

Changed 8 weeks ago by Aron Griffis

Added by email2trac

  Changed 8 weeks ago by Aron Griffis

Here's a tested patch which removes the single-@ check.  I guess
the question is just whether mutt wants to enforce the RFC on
incoming message-ids, or if it's content that the message-ids it
generates are compliant.

follow-up: ↓ 5   Changed 8 weeks ago by pdmef

In another case reported by exg on #mutt the problem is not >1 @ signs but something other than exactly 1 including 0.

IIRC In-Reply-To isn't supposed to contain only exactly some message-id but may also contain things like "message <message-id> sent by <user> on ...". That's why mutt needs a heuristic of exactly one @ symbol to locate the message-id. However, I don't know if clients really/still generate something like that and how common it is in mail archives.

But I think I prefer a solution not touching that heuristic. A fix could be to avoid writing out In-Reply-To and use References instead which still gives the user the chance to manually adjust threads in the editor.

So this could be the same issue as #1935.

in reply to: ↑ 4 ; follow-up: ↓ 7   Changed 8 weeks ago by vinc17

Replying to pdmef:

IIRC In-Reply-To isn't supposed to contain only exactly some message-id but may also contain things like "message <message-id> sent by <user> on ...". That's why mutt needs a heuristic of exactly one @ symbol to locate the message-id. However, I don't know if clients really/still generate something like that and how common it is in mail archives.

There are MUA's that still do such kind of things, and I still receive messages with: In-reply-to: <message-id> (message from <user> on <date>)

AFAIK, that's Emacs that does this. Gnus also does this, with a slightly different format.

Also, Mutt sometimes puts several message-id's in In-Reply-To (I think, when one replies to several messages at once).

A fix could be to avoid writing out In-Reply-To and use References instead which still gives the user the chance to manually adjust threads in the editor.

That's a bad idea: some MUA's only understand the In-Reply-To header (but this is quite old, and I don't remember which ones -- perhaps Emacs, at least I can see that it doesn't generate a References header).

follow-up: ↓ 8   Changed 8 weeks ago by Aron Griffis

I've been thinking about another fix for this problem.  Presently
references and in-reply-to are stored in struct envelope as LIST.
This is based on the assumption that the headers can be reliably
parsed and the original text regenerated, but that assumption is
false because the original headers aren't strictly bound to
a particular format.

So rather than try to settle on a more correct parser, I think
both forms should be stored.  The original in-reply-to and
references should be stored in struct envelope as a char * just
like subject (and for that matter, just like message_id).  The
parsed lines can be stored alongside the strings in a LIST for
the sake of threading.

There are situations where it would still be appropriate to throw
away the string and rely entirely on the list, for example when
linking or breaking threads.  In that case, the string can be
thrown away and regenerated as necessary.  Possibly there should
be a couple functions to abstract this for convenience... (keep
in mind this is just brainstorming)

references_list(env) -- returns env->references_l if non-null,
otherwise parses env->references_s and caches the result in
env->references_l

references_string(env) -- returns env->references_s if non-null,
otherwise generates from env->references_l and caches the result
in env->references_s

and similar for in_reply_to.

in reply to: ↑ 5 ; follow-up: ↓ 9   Changed 8 weeks ago by pdmef

Replying to vinc17:

A fix could be to avoid writing out In-Reply-To and use References instead which still gives the user the chance to manually adjust threads in the editor.

That's a bad idea: some MUA's only understand the In-Reply-To header (but this is quite old, and I don't remember which ones -- perhaps Emacs, at least I can see that it doesn't generate a References header).

I didn't mean to eventually sent this out, I meant to use References instead of In-Reply-To when creating and parsing the message during editing to avoid changing message parsers this late before 1.6 and still fix the issue before it gets released.

in reply to: ↑ 6   Changed 8 weeks ago by pdmef

  • milestone set to 1.6

Replying to Aron Griffis:

So rather than try to settle on a more correct parser, I think both forms should be stored. The original in-reply-to and references should be stored in struct envelope as a char * just like subject (and for that matter, just like message_id). The parsed lines can be stored alongside the strings in a LIST for the sake of threading.

That would be a bad idea (I guess) because it significantly increases memory use to eventually "only" fix these corner cases of totally broken message-ids. Since the headers of the message are part of the header cache that approach would also significantly increase cache file size on disk.

in reply to: ↑ 7 ; follow-up: ↓ 12   Changed 8 weeks ago by vinc17

Replying to pdmef:

I didn't mean to eventually sent this out, I meant to use References instead of In-Reply-To when creating and parsing the message during editing to avoid changing message parsers this late before 1.6 and still fix the issue before it gets released.

I'm not sure I understand. Do you mean that when generating a message to be edited (compose), you generate a References header instead of the current In-Reply-To, and when the user quits the editor, the References contents are used to generate the References and In-Reply-To headers?

  Changed 7 weeks ago by Aron Griffis

Mutt wrote:  [Sun Jul 06 2008, 08:31:50AM EDT]
> That would be a bad idea (I guess) because it significantly increases
> memory use to eventually "only" fix these corner cases of totally broken
> message-ids. Since the headers of the message are part of the header cache
> that approach would also significantly increase cache file size on disk.

pdmef, you might be right, but I had already prototyped it so
here is my patch for reference.  Note that the header cache
doesn't grow with my patch because only the strings are stored.

Changed 7 weeks ago by Aron Griffis

Added by email2trac

Changed 7 weeks ago by agriffis

  Changed 7 weeks ago by agriffis

v2 is slightly better, eradicated all direct access to references_l, references_s, in_reply_to_l, in_reply_to_s

in reply to: ↑ 9 ; follow-up: ↓ 13   Changed 7 weeks ago by pdmef

Replying to vinc17:

Replying to pdmef:

I didn't mean to eventually sent this out, I meant to use References instead of In-Reply-To when creating and parsing the message during editing to avoid changing message parsers this late before 1.6 and still fix the issue before it gets released.

I'm not sure I understand. Do you mean that when generating a message to be edited (compose), you generate a References header instead of the current In-Reply-To, and when the user quits the editor, the References contents are used to generate the References and In-Reply-To headers?

Exactly. I didn't look into it yet, but if that worked, I'd be my preferred solution since it wouldn't mean to touch any parsers.

in reply to: ↑ 12 ; follow-up: ↓ 14   Changed 7 weeks ago by agriffis

Replying to pdmef:

Replying to vinc17:

Replying to pdmef:

I didn't mean to eventually sent this out, I meant to use References instead of In-Reply-To when creating and parsing the message during editing to avoid changing message parsers this late before 1.6 and still fix the issue before it gets released.

I'm not sure I understand. Do you mean that when generating a message to be edited (compose), you generate a References header instead of the current In-Reply-To, and when the user quits the editor, the References contents are used to generate the References and In-Reply-To headers?

Exactly. I didn't look into it yet, but if that worked, I'd be my preferred solution since it wouldn't mean to touch any parsers.

pdmef: If you could provide more explanation or even a (possibly incomplete) patch, I'd be interested. So far, I don't understand how your suggested approach does anything other than move the problem around, and possibly creates other problems...

in reply to: ↑ 13 ; follow-up: ↓ 15   Changed 7 weeks ago by pdmef

Replying to agriffis:

Replying to pdmef:

Replying to vinc17:

I'm not sure I understand. Do you mean that when generating a message to be edited (compose), you generate a References header instead of the current In-Reply-To, and when the user quits the editor, the References contents are used to generate the References and In-Reply-To headers?

Exactly. I didn't look into it yet, but if that worked, I'd be my preferred solution since it wouldn't mean to touch any parsers.

pdmef: If you could provide more explanation or even a (possibly incomplete) patch, I'd be interested. So far, I don't understand how your suggested approach does anything other than move the problem around, and possibly creates other problems...

Well, from my point of view the problem is that when mutt generates a partial message with edit_headers set and the message-id not containing exactly 1 @ (or not being "valid" by mutt's In-Reply-To expectations), mutt later can't parse that and breaks threading as it simply drops that message-id.

I don't think the parser for In-Reply-To should be touched to solve the problem because it implements a heuristic and changing a heuristic is bad when you can't test that change against sample data.

That said, the only way out I currently see is to find a way to get the broken message-id accross the editor. It's a feature that the parent's message-ids are available with edit_headers set so that users being sure what they're doing can adjust threading (i.e. not writing it out and reading it back isn't a solution).

So, my first thought was to not add a In-Reply-To header with the parent message-ids but a References header to the message being edited by the user. If we also change mutt_parse_references() to only require exactly one @ for In-Reply-To but accept everything else otherwise (kind of what you suggested), I think this could work and be a simple patch.

But I didn't look further into code yet.

Can you explain why that moves the problem around?

in reply to: ↑ 14   Changed 7 weeks ago by agriffis

Replying to pdmef:

Replying to agriffis:

If you could provide more explanation or even a (possibly incomplete) patch, I'd be interested. So far, I don't understand how your suggested approach does anything other than move the problem around, and possibly creates other problems...

Well, from my point of view the problem is that when mutt generates a partial message with edit_headers set and the message-id not containing exactly 1 @ (or not being "valid" by mutt's In-Reply-To expectations), mutt later can't parse that and breaks threading as it simply drops that message-id. I don't think the parser for In-Reply-To should be touched to solve the problem because it implements a heuristic and changing a heuristic is bad when you can't test that change against sample data. That said, the only way out I currently see is to find a way to get the broken message-id accross the editor. It's a feature that the parent's message-ids are available with edit_headers set so that users being sure what they're doing can adjust threading (i.e. not writing it out and reading it back isn't a solution). So, my first thought was to not add a In-Reply-To header with the parent message-ids but a References header to the message being edited by the user. If we also change mutt_parse_references() to only require exactly one @ for In-Reply-To but accept everything else otherwise (kind of what you suggested), I think this could work and be a simple patch. But I didn't look further into code yet. Can you explain why that moves the problem around?

Well, by "moves the problem around" I mean that you're changing the parser for References and assuming that's better than changing the parser for In-Reply-To. I think your assumption is based on the idea that References is more consistent than In-Reply-To, which might be right, but in fact the RFC gives them both the same treatement.

Regarding putting References in lieu of In-Reply-To in the message text when edit_headers is set, I have a few objections:

1. the References string is often long (up to 10 entries, hard limit in the mutt source). Most of the time users simply want to change the In-Reply-To string, so providing References in the text is overkill 2. this change implies that adding In-Reply-To manually to a message while editing will not be effective 3. too risky at this point in 1.6 bugfixing phase. This is the kind of change that needs some time to soak because it's very visible to users.

So getting back to the problem, let's keep in mind that the issue here is pretty simple: References and In-Reply-To are strings in a message header, but for the sake of threading, mutt stores them as LIST. In order to convert them to LIST, the strings are parsed to fish out the message-ids. If the message-ids aren't in strict format, the parser drops them.

Normally, dropping bogus message-ids only results in broken threads on display, for example bug #1935. All by itself, that's a good reason to loosen the parser. However in this case, the message-id is silently dropped in sending, so not only is it broken on display, but mutt is also breaking threads for the recipients...

So for these reasons, I think the parser needs to be loosened to allow for some common kinds of brokenness. I appreciate theoretically your objection to changing the heuristic, but I think that pragmatically the parser needs to accept these bogus message-ids. Note that already the parser's heuristic is clearly broken: *most* email addresses these days have more than 8 characters before the @-sign; that's based entirely on antiquated UNIX login limits!

So how to loosen the parser? It seems the most common kinds we know of are: missing @-sign or multiple @-signs. Your patch on #1935 treats the former (though inexplicably it only loosens for References) and my first patch here treats the latter. Ultimately I think the best thing would be to loosen both, assume that any non-whitespace between <..> is a message-id. Patch attached, though frankly this should coordinated with some changes to mutt_extract_message_id()

Changed 7 weeks ago by agriffis

  Changed 7 weeks ago by agriffis

...and as I said in my last sentence, this should really be merged with mutt_extract_message_id() so there's only one way that message-ids are recognized. Here's the patch. Bonus: no longer use strtok() so the input string isn't modified. This replaces all my previous patches in this bug. I've tested it for a while and it seems to fix the various problems cited.

Changed 7 weeks ago by agriffis

Changed 7 weeks ago by agriffis

  Changed 7 weeks ago by agriffis

I posted 3090-mutt_extract_message_id-no-static.patch (unmodified) to mutt-dev with appropriate mercurial headers.

  Changed 2 weeks ago by brendan

  • status changed from new to closed
  • resolution set to fixed

Fixed in [00ce81d778bf].

Note: See TracTickets for help on using tickets.