Ticket #2995 (new defect)

Opened 9 months ago

Last modified 7 weeks ago

Mutt is folding subject line folds using a tab, which appears to be against RFC 2822

Reported by: frnkblk Owned by: mutt-dev
Priority: minor Milestone: 1.6
Component: mutt Version: 1.5.17
Keywords: header folding white-space Cc:

Description

Mutt is folding long subject lines by using a CRLF and then a tab.

For example, what was:

Subject: A DSL modem belonging to username 'username' is constantly reconnecting (112 times)

in the headers is now:

Subject: A DSL modem belonging to username 'username' is constantly

<Tab>reconnecting (112 times)

That doesn't appear to be compliant with RFC 2822. According to section 2.2.3:

Unfolding is accomplished by simply removing any CRLF that is immediately followed by WSP.

Outlook does not remove the tab when unfolding, such that a huge gap in the subject line. Now I know why some people's NANOG and other techie-type listservs are showing the occasional gap in the subject line -- they must be using mutt as their email client.

This was not the behavior in 1.4.1i.

I am sending messages from the command-line, if that makes a difference in the subject line processing.

Attachments

part0001.pgp (196 bytes) - added by Kyle Wheeler 9 months ago.
Added by email2trac
part0001.2.pgp (196 bytes) - added by Kyle Wheeler 9 months ago.
Added by email2trac
part0001.3.pgp (196 bytes) - added by Kyle Wheeler 9 months ago.
Added by email2trac

Change History

  Changed 9 months ago by pdmef

Please look up what WSP is in RfC2334 (it's the same as LSWP in the older RfC822):

HTAB           =  %x09
                               ; horizontal tab
SP             =  %x20
                               ; space
WSP            =  SP / HTAB
                               ; white space

i.e. WSP is either a space or tab character. Mutt does everything correct.

Though I think a tab is traditionally used, I didn't find evidence that any folding WSP is to be treated as SP. RfC822 even says:

Unfolding is accomplished by regarding CRLF immediately followed by a LWSP-char as equivalent to the LWSP-char

which I interpret as tab for folding indeed means tab in the header.

  Changed 9 months ago by Kyle Wheeler

On Wednesday, November 28 at 09:21 AM, quoth Mutt:
> Please look up what WSP is in RfC2334 (it's the same as LSWP in the 
> older RfC822):

Just because a WSP can be either a tab or a space does not mean that 
they are interchangeable and that they have no semantic meaning. The 
RFC says that you may only insert CRLF's to wrap headers, not swap out 
some characters for other characters of the same class.

If we allow any character of the same class (e.g. WSP or US-ASCII) to 
be swapped out for any other character when wrapping, the example

     Subject: This is a test

Could instead be "wrapped" to:

     Subject: This
      xx x xxxx

Obviously these are not the same thing.

It's true that there's some ambiguity in the first part of the 
standard, where it says

     The general rule is that wherever this standard allows for folding
     white space (not simply WSP characters), a CRLF may be inserted
     before any WSP.

But it seems to me that what is intended here is that it could be 
rewritten as:

     ...a CRLF may be inserted before any EXISTING WSP.

This is upheld by the later description of how to unfold a header:

     Unfolding is accomplished by simply removing any CRLF that is
     immediately followed by WSP.

In other words, by merely removing the CRLF, we should have the 
original, pre-folding version of the header. Thus, when folding, we 
may only ADD CRLFs (in specific places), rather than give ourselves 
the freedom to delete and replace some of the characters of the 
original header.

> i.e. WSP is either a space or tab character. Mutt does everything 
> correct.

The idea is that folding is done by inserting CRLF's in strategic 
places, namely, just before WSPs. That doesn't mean we get to swap out 
one WSP character for another WSP character. It does NOT say that all  
WSPs are semantically equivalent, and can be swapped around according 
to the whims of the mail client. The WSP (in folding) is ONLY an 
indicator of where you can add a CRLF. The ORIGINAL WSP must stay 
intact.

The header:

     Subject: This is a test

May be folded like this:

     Subject: This<CRLF> is a test

But NOT like this:

     Subject: This<CRLF><TAB>is a test

> Though I think a tab is traditionally used, I didn't find evidence 
> that any folding WSP is to be treated as SP. RfC822 even says:
>
>   Unfolding is accomplished by regarding CRLF immediately followed by a 
> LWSP-char as equivalent to the LWSP-char
>
> which I interpret as tab for folding indeed means tab in the header.

This was clarified in RFC 2822 to be more obvious that when unfolding 
a header you may ONLY remove CRLFs (and only those CRLFs that are 
followed by a WSP character), and that everything else about the 
header must remain as-is, including that WSP character. Note that RFC 
822 doesn't say that a CRLF-LWSP sequence is equivalent to *any* LWSP 
character; it says that a CRLF-LWSP sequence is equivalent to *that* 
LWSP character.

Thus, if mutt is replacing spaces with tabs (which it is), a CORRECT 
unfolding of those folded headers MUST preserve those tabs.

If mutt is transforming:

     Subject: This is a test

...into:

     Subject: This<CRLF><TAB>is a test

...(which is exactly what it is currently doing) then the only correct 
unfolding of this header is:

     Subject: This<TAB>is a test

Which is obviously not desirable.

~Kyle

Changed 9 months ago by Kyle Wheeler

Added by email2trac

  Changed 9 months ago by Rocco Rutte

Hi,

* Mutt wrote:
> On Wednesday, November 28 at 09:21 AM, quoth Mutt:

>> Please look up what WSP is in RfC2334 (it's the same as LSWP in the
>> older RfC822):

> Just because a WSP can be either a tab or a space does not mean that
> they are interchangeable and that they have no semantic meaning.

If it read like I was saying this, I didn't mean it. I just wanted to 
point out that "white space" does not mean "space" and that a tab for 
folding is not illegal per se.

> In other words, by merely removing the CRLF, we should have the
> original, pre-folding version of the header. Thus, when folding, we
> may only ADD CRLFs (in specific places), rather than give ourselves
> the freedom to delete and replace some of the characters of the
> original header.

I read the standard the same way, too (except RfC2047 where space or 
even white space between encoded words is to be removed).

On the other hand, there's a compatibility issue. Some quick tests show 
that some clients properly implement it the standard way, others don't.

So maybe mutt should just keep continuing interpreting it in the broken 
way but send in the correct. We could then leave it that way until 
somebody else complains.

I'm still wondering though why this hasn't come up earlier...

Rocco

  Changed 9 months ago by Frank Bulk

I think Kyle has made a pretty strong case that mutt is folding it
incorrectly, and the reason it hasn't shown up before is that some clients
either don't display a TAB or compensate for special characters in the
subject line.  If we don't follow Kyle's interpretation, an e-mail client
could add or remove an arbitrary number of WSP characters over the life of
the message.  That should be obviously undesirable.

In regards to how mutt unfolds it, perhaps we ought to revert to correct
behavior but add a compiling or run time option to use the incorrect method.

Frank

-----Original Message-----
From: fleas@mutt.org [mailto:fleas@mutt.org] 
Sent: Wednesday, November 28, 2007 5:04 AM
To: frnkblk@iname.com; pdmef@gmx.net
Cc: mutt-dev@mutt.org
Subject: Re: [Mutt] #2995: Mutt is folding subject line folds using a tab,
which appears to be against RFC 2822

#2995: Mutt is folding subject line folds using a tab, which appears to be
against
RFC 2822

Comment (by Rocco Rutte):

 {{{
 Hi,

 * Mutt wrote:
 > On Wednesday, November 28 at 09:21 AM, quoth Mutt:

 >> Please look up what WSP is in RfC2334 (it's the same as LSWP in the
 >> older RfC822):

 > Just because a WSP can be either a tab or a space does not mean that
 > they are interchangeable and that they have no semantic meaning.

 If it read like I was saying this, I didn't mean it. I just wanted to
 point out that "white space" does not mean "space" and that a tab for
 folding is not illegal per se.

 > In other words, by merely removing the CRLF, we should have the
 > original, pre-folding version of the header. Thus, when folding, we
 > may only ADD CRLFs (in specific places), rather than give ourselves
 > the freedom to delete and replace some of the characters of the
 > original header.

 I read the standard the same way, too (except RfC2047 where space or
 even white space between encoded words is to be removed).

 On the other hand, there's a compatibility issue. Some quick tests show
 that some clients properly implement it the standard way, others don't.

 So maybe mutt should just keep continuing interpreting it in the broken
 way but send in the correct. We could then leave it that way until
 somebody else complains.

 I'm still wondering though why this hasn't come up earlier...

 Rocco
 }}}

--
Ticket URL: <http://dev.mutt.org/trac/ticket/2995#comment:>

  Changed 9 months ago by pdmef

  • keywords header white-space added; fold tab removed
  • milestone set to 1.6

Correctly unfolding it mutt would mean mutt would insert tabs into message created by its own... which I'm not sure it should do.

  Changed 9 months ago by Frank Bulk

The Mutt client viewing the message wouldn't insert tabs in the subject line
-- that would have been done by the mutt client that sent the message.

I'm no day-time programmer, but if someone can point me to the right source
file I'm sure I can hack a fix.  I'm sending hundreds of automated messages
per day from cron to my techs and this tab is super-annoying.

Frank

-----Original Message-----
From: fleas@mutt.org [mailto:fleas@mutt.org] 
Sent: Wednesday, November 28, 2007 11:32 AM
To: frnkblk@iname.com; pdmef@gmx.net
Cc: mutt-dev@mutt.org
Subject: Re: [Mutt] #2995: Mutt is folding subject line folds using a tab,
which appears to be against RFC 2822

#2995: Mutt is folding subject line folds using a tab, which appears to be
against
RFC 2822

Changes (by pdmef):

  * keywords:  fold folding tab => header folding white-space
  * milestone:  => 1.6

Comment:

 Correctly unfolding it mutt would mean mutt would insert tabs into message
 created by its own... which I'm not sure it should do.

--
Ticket URL: <http://dev.mutt.org/trac/ticket/2995#comment:5>

Changed 9 months ago by Kyle Wheeler

Added by email2trac

  Changed 9 months ago by Kyle Wheeler

On Wednesday, November 28 at 11:04 AM, quoth Mutt:
>> In other words, by merely removing the CRLF, we should have the 
>> original, pre-folding version of the header. Thus, when folding, we 
>> may only ADD CRLFs (in specific places), rather than give ourselves 
>> the freedom to delete and replace some of the characters of the 
>> original header.
>
> I read the standard the same way, too (except RfC2047 where space or 
> even white space between encoded words is to be removed).

Ahh, an interesting point. Hmm... Though, if I'm understanding that 
RFC correctly (and I may not be), the space is only to be removed 
between *encoded* words, and should be left in place between encoded 
words and non-encoded words or between multiple non-encoded words.

> On the other hand, there's a compatibility issue. Some quick tests 
> show that some clients properly implement it the standard way, 
> others don't.

Eh? I lost track of your pronoun there. By "it" you mean folding? or 
unfolding?

I did notice that Apple's Mail.app implements unfolding improperly (it 
swaps out tabs for spaces).

> So maybe mutt should just keep continuing interpreting it in the 
> broken way but send in the correct. We could then leave it that way 
> until somebody else complains.

If you mean that mutt should behave like Mail.app for rendering, then 
I think you're right, that probably makes sense.

> I'm still wondering though why this hasn't come up earlier...

My best guess would be that most folks don't write long subject lines, 
and that's probably the only place where it is really visible.

~Kyle

Changed 9 months ago by Kyle Wheeler

Added by email2trac

  Changed 9 months ago by Kyle Wheeler

On Wednesday, November 28 at 06:21 PM, quoth Mutt:
> I'm no day-time programmer, but if someone can point me to the right
> source file I'm sure I can hack a fix.  I'm sending hundreds of 
> automated messages per day from cron to my techs and this tab is 
> super-annoying.

I think the right file is sendlib.c... somewhere near the words "Dirty 
hack" maybe?

~Kyle

  Changed 9 months ago by Frank Bulk

I looked at the subroutine and realized that I'm not going to figure it out
in ten minutes, so I cheated -- I changed the wrap length from 76 to 200 and
I'm calling it good for now.

But don't close the bug, this issue still needs to be addressed.

Frank

-----Original Message-----
From: fleas@mutt.org [mailto:fleas@mutt.org] 
Sent: Wednesday, November 28, 2007 1:14 PM
To: frnkblk@iname.com; pdmef@gmx.net
Cc: mutt-dev@mutt.org
Subject: Re: [Mutt] #2995: Mutt is folding subject line folds using a tab,
which appears to be against RFC 2822

#2995: Mutt is folding subject line folds using a tab, which appears to be
against
RFC 2822

Comment (by Kyle Wheeler):

 {{{
 On Wednesday, November 28 at 06:21 PM, quoth Mutt:
 > I'm no day-time programmer, but if someone can point me to the right
 > source file I'm sure I can hack a fix.  I'm sending hundreds of
 > automated messages per day from cron to my techs and this tab is
 > super-annoying.

 I think the right file is sendlib.c... somewhere near the words "Dirty
 hack" maybe?

 ~Kyle
 }}}

--
Ticket URL: <http://dev.mutt.org/trac/ticket/2995#comment:>

follow-up: ↓ 11   Changed 2 months ago by brendan

  • priority changed from major to minor

That subroutine is horrifying, but I believe the line in question is near the bottom of mutt_write_one_header:

        cp = &cp[n];
}}
This advances the parser to the character _after_ the whitespace break character, after flushing the first line. So the logic at the top of the loop to pad only when necessary is never triggered.

I get the willies every time I think about editing this function though. To make matters worse, I think it's shared between the display code and the sending code. It would be good to tab-indent on the display side when the header wraps past the window, and to simply insert CRs on the sending side.

in reply to: ↑ 10   Changed 7 weeks ago by pdmef

Replying to brendan:

This advances the parser to the character _after_ the whitespace break character, after flushing the first line. So the logic at the top of the loop to pad only when necessary is never triggered.

I think the padding logic does trigger as it's the only place inserting a tab in that function. It does trigger since first=0 for the wrapped header and *cp is neighter ' ' nor '\t' since cp = &cp[n] makes it point at non-ws.

The fix for this ticket IMHO could be to use the last space we saw instead of tab.

If anyone is really going to touch that function, #3080 is another folding-related edge case to test against...

Note: See TracTickets for help on using tickets.