Ticket #3061 (closed defect: fixed)

Opened 6 months ago

Last modified 4 months ago

display broken when a mail contains U+FEFF

Reported by: myon Owned by: mutt-dev
Priority: minor Milestone: 1.6
Component: display Version: 1.5.18
Keywords: patch Cc: cam@…

Description

----- Forwarded message from Julien Cristau <jcristau@debian.org> -----

Date: Sun, 18 May 2008 17:22:15 +0200
From: Julien Cristau <jcristau@debian.org>
Reply-To: Julien Cristau <jcristau@debian.org>, 481797@bugs.debian.org
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Bug#481797: mutt: display broken when a mail contains U+FEFF

Package: mutt
Version: 1.5.17+20080114-1
Severity: normal

Some mails seem to embed a U+FEFF character at random locations, and
this breaks mutt's display.  See for example
<1211122278.21823.128.camel@holly.codehelp> on debian-devel@ldo.

Cheers,
Julien

----- End forwarded message -----
----- Forwarded message from Julien Cristau <jcristau@debian.org> -----

Date: Sun, 18 May 2008 18:08:40 +0200
From: Julien Cristau <jcristau@debian.org>
Reply-To: Julien Cristau <jcristau@debian.org>, 481797@bugs.debian.org
To: 481797@bugs.debian.org
Subject: Bug#481797: mutt: display broken when a mail contains U+FEFF

On Sun, May 18, 2008 at 17:22:15 +0200, Julien Cristau wrote:

> Some mails seem to embed a U+FEFF character at random locations, and
> this breaks mutt's display.  See for example
> <1211122278.21823.128.camel@holly.codehelp> on debian-devel@ldo.
> 
Attached a screenshot to show what I'm seeing.
One line wasn't refreshed ("I don't see that duplication is a major
problem" comes from the screen above this one), and the bar at the
bottom has disappeared.

Cheers,
Julien

[see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=481797 for the
attachment]

----- End forwarded message -----
----- Forwarded message from Christoph Berg <myon@debian.org> -----

Date: Mon, 26 May 2008 21:33:55 +0200
From: Christoph Berg <myon@debian.org>
To: Julien Cristau <jcristau@debian.org>, 481797@bugs.debian.org
Subject: Re: Bug#481797: mutt: display broken when a mail contains U+FEFF

tags 481797 confirmed
thanks

Re: Julien Cristau 2008-05-18 <20080518152213.GA1726@patate.is-a-geek.org>
> Some mails seem to embed a U+FEFF character at random locations, and
> this breaks mutt's display.  See for example
> <1211122278.21823.128.camel@holly.codehelp> on debian-devel@ldo.

In this terminal (xterm), U+FEFF makes mutt eat the next character.

| <U+FEFF>It appears that you are assigned to an Application Manager who is marked

is rendered as

| t appears that you are assigned to an Application Manager who is marked

Christoph

Attachments

zerowidth.diff (366 bytes) - added by pdmef 5 months ago.
zerowidth.2.diff (0.5 kB) - added by pdmef 5 months ago.

Change History

Changed 5 months ago by pdmef

  • component changed from mutt to display

Changed 5 months ago by agriffis

I'm seeing this bug too, on mutt 1.5.18+. The culprit seems to be Evolution adding the bogus character (at least I think it's bogus). The result isn't quite as described, though... The following character displays on the right side of the terminal, causing it to *appear* as if it's been eaten. I suspect Christoph would find his missing "I" on the RHS just above the line "t appears that you..."

I'm not sure mutt is to blame here... the character is in the mail, thanks to Evo.

Changed 5 months ago by agriffis

Changed 5 months ago by agriffis

  • version set to 1.5.18
  • milestone set to 1.6

Changed 5 months ago by pdmef

  • priority changed from major to minor

Together with #3048 I think mutt should simply drop any u+feff and u+200b characters that have zero width (not sure if we also should add more zero-width chars in a routine like u+200c).

Changed 5 months ago by pdmef

Changed 5 months ago by pdmef

Attach patch to skip U+200B and U+FEFF in pager when charset is UTF-8. I'm not sure about the UTF-8 limit but it's more a hack anyway that solves the display issues for me.

Changed 5 months ago by heycam

  • cc cam@… added

Changed 5 months ago by brendan

I'm ok with this patch, but it seems like mutt is converting 0-width characters to -1-width. I'd like to understand why that's happening and whether there's a more general fix. That if (k == 0) k = 1 stanza looks a bit suspicious, but I haven't yet dug into the code. Can we use something like testing for a 0 wcwidth instead of hardcoded values?

Changed 5 months ago by brendan

See #3048

Changed 5 months ago by pdmef

I'm not sure about k==0 either, but shouldn't that happen only when we read L'\0'?

You're right that testing for wcwidth returning 0 should be used instead, but it's broken on for example OS X. Linux says u+feff and u+200b have a width of 0 and are printable, OS X says u+200b has a width of 1 and is printable but u+feff has a width of -1 (non-printable) and and iswprint "correctly" returns 0 for it.

I attached a patch that tests for wcwidth being zero and for the hard-coded constants.

Changed 5 months ago by pdmef

Changed 4 months ago by mlichvar

The problem should be fixed in ncurses-5.6-20080517.

Ignoring the two chars is ok, but ignoring all chars with wcwidth 0 (as in zerowidth.2.diff) is probably a bad idea.

Changed 4 months ago by pdmef

  • keywords patch added

Changed 4 months ago by Rocco Rutte <pdmef@…>

  • status changed from new to closed
  • resolution set to fixed

(In [51bd7a47d552]) Ignore zero width characters U+200B/U+FEFF which may garble the display. Closes #3061, #3048.

Note: See TracTickets for help on using tickets.