wiki:MuttFaq/Charset

Umlauts, accents, and other non-ASCII characters are displayed as '?' or '\123' -- locales

Short answer: set LC_CTYPE=en_US.ISO-8859-1.

Long answer: You have to configure your locale settings. This is done by setting environment variables.

If your system is already configured, you only have to set $LANG and/or some of the $LC_* variables. $LANG is the default for the $LC_* categories that are unset, and $LC_ALL overrides all other variables, so make sure the latter one is unset. Mutt cares mostly about these categories:

  • LC_CTYPE is the character set used by your terminal
  • LC_MESSAGES is the language used by the Mutt menus and messages printed
  • LC_TIME is used by strftime(3)

We will use $LANG here. Examples for settings are:

  • export LANG=de_DE.UTF-8 (sh/bash syntax, put that into your .bashrc/.bash_profile)
  • setenv LANG en_US.ISO-8859-1 (csh/tcsh syntax for .cshrc/.login)

The triple stands for language_country.charset. There are also variants like de_AT@euro, and aliases like deutsch. Check the output of "locale -a" to see what locales values are supported by your system. Type "locale" to check the actual settings of all categories.

$ locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
...

Finally, verify Mutt correctly detects the charset of the locale. Restart Mutt and type:

:set &charset ?charset
charset="utf-8"

Don't forget to empty the Mutt header cache when you change the charset if you're not at least running mutt 1.5.18.

Also, if you built mutt yourself it is critical that you use a unicode aware ncurses. Sometimes the package for that is ncursesw*. If there is no such package chances are your system includes unicode for all of ncurses.

Problem: If "locale --all-locales" list is empty, or lacks a suitable value, you have to generate the locale files first. Check the localedef(1) manpage on how to do this. Debian users simply call "dpkg-reconfigure locales" (make sure the locales package is installed).

Further problems:

  • Some systems (MacOS X before 10.4, NetBSD...) have no locale command installed.

Use something as "ls /usr/share/locale/" or "ls /usr/lib/locale/" to list available values.

  • Some systems (libc5...) have no way to tell Mutt the locale's charset.

You have to set $charset variable in muttrc yourself.

  • Some systems (HP-UX, AIX, OSF1, Irix...) have not totally standard names for some charsets.

Use iconv-hooks to alias them to standard names. Example files come with Mutt tarball.

  • Some systems (Cygwin...) have no working locales.

Use the --enable-locales-fix configure option, and set $charset yourself, but be prepared to have some limitations in functionality.

Umlauts, accents, and other non-ASCII characters are displayed fine in some mails, but hidden in others

Make sure the mails have a proper charset declaration in the header. For example:

Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 8bit

In case the charset label lacks, lies, or these headers lack entirely, you can still try to make Mutt workaround the problem on-the-fly. Example for westerners receiving broken mails really mostly in Latin-1 or CP-1252 charset: Declaring CP-1252 as default assumed charset for broken mails.

charset-hook ^us-ascii$ cp1252
charset-hook ^iso-8859-1$ cp1252
unset strict_mime
set assumed_charset="cp1252"

or

charset-hook US-ASCII     ISO-8859-1
charset-hook x-unknown    ISO-8859-1
charset-hook windows-1250 CP1250
charset-hook windows-1251 CP1251
charset-hook windows-1252 CP1252
charset-hook windows-1253 CP1253
charset-hook windows-1254 CP1254
charset-hook windows-1255 CP1255
charset-hook windows-1256 CP1256
charset-hook windows-1257 CP1257
charset-hook windows-1258 CP1258

Another example, for Chinese receiving broken mails really mostly in GB2312 charset:

charset-hook ^us-ascii$ gb2312
unset strict_mime
set assumed_charset="gb2312"

In more specific cases you can use <edit-type> function to manually override a wrong label. By default it's ^E key. From index or pager it acts on the body of the mail, while from attachments menu it acts for the individual part selected.

See also: PatchList: assumed_charset

Umlauts, accents, and other non-ASCII characters are only displayed wrong when using auto_view

First, imagine a situation when you have to use MIME Autoview i.e. to display text/html content in the mutt-pager.

You get a mail with the following header:

Content-Type: text/html; charset="iso-8859-1"

your locales are:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
(...)

your mailcap looks something like:

text/html w3m -dump %s; copiousoutput

and there is one auto_view in your muttrc:

auto_view text/html

When you open this mail in the mutt-pager, mutt spawns w3m (or any other text-browser defined in mailcap), w3m dumps text generated from the input html-file (%s) back to mutt and the mutt-pager displays it -- unfortunately wrong.

The problem is, that w3m does not know anything about the character encoding of the input-file. w3m can only figure out a (possible) charset from your locales but in our example the sets don't match (iso-8859-1 != UTF-8).

One can get around this, with the %{charset} variable in mailcap:

w3m -I %{charset} -T text/html -dump; copiousoutput

(Don't be confused by the missing %s -- w3m can read data from stdin so %s is basically not needed. See Advanced mailcap Usage for details.)

the w3m documentation says:

$ w3m -h
   (...)
   -I charset       document charset
   -T type          specify content-type
   (...)

With this entry, your mutt-pager will print something like:

[-- Autoview using w3m -I 'iso-8859-1' -T text/html -dump --]
  (...)
  Here are some Umlauts: äöü ÄÖÜ
  (...)

As you can see, mutt resolved %{charset} correctly into iso-8859-1. Of course the input-charset options above depend on your preferred text-browser.

Characters are replaced by ? when charsets and fonts are correctly set up

The problem here is that characters in the document's charset are simply not available in mutt's current charset. This is particularly prevelant in documents created by Microsoft agents. Mutt can be instructed to make a best effort attempt to replace the missing characters with something similar by appending TRANSLIT to the set charset declaration (e.g. set charset=iso-8859-1TRANSLIT).

Note: Whatever nice this "approximation" trick can be, it's only a workaround. The best solution for the problem is upgrading to a more capable terminal, with a charset able to display directly all wanted characters. But it's not always possible or easy.

How can I check if locales work before I blame Mutt for it?

perl is sensetive to proper locale settings. On certain distros (e.g: Debian) it will complain when the charset settings are incorrect. Try:

perl -e ""

should do nothing and print nothing. If it gives a loud ugly warning about LANG, LC_CTYPE and LC_ALL, something's wrong. But if it does not shout it may only be because it is configured not to (how?). To test for that, run:

env LC_ALL=nocharset perl -e ""

and verify that you <em>do</em> get and ugly warning with it.

GNU ls also uses $LC_CTYPE. Simply "touch äöü" a file with non-ASCII characters and look whether "ls" lists the proper name, or just "???". To test $LC_MESSAGES, call GNU grep:

Aufruf: grep [OPTION]... MUSTER [DATEI]...
grep --help gibt Ihnen mehr Informationen.

(Obviously, this method does not work for English locales.)

UTF-8 chars are displayed fine, but the screen is garbled

Mutt has to be linked against a term library with wide char support. For ncurses, this is the libncursesw library.

$ mutt -v | grep using
System: Linux 2.4.25-planck (i686) [using '''ncurses''' 5.4]
$ ldd `which mutt` | grep curses
libncursesw.so.5 => /usr/lib/libncurses'''w'''.so.5 (0x40023000)

To get libncursesw, compile curses with --enable-widec. Debian users install the libncursesw5 package. (On Debian/Woody? (stable), install mutt-utf8. Starting with Debian/Sarge?, Mutt is already linked against libncursesw; try apt-get build-dep mutt if you compile your own mutt.)

Default Slang seems not to work with UTF-8, relink Mutt against libncursesw. (Hello Gentoo users :-)

S-Lang needs the UTF-8 patch to work with UTF-8. Here it is: http://www.emaillab.org/mutt/tools/slang-1.4.8-utf8.diff.gz (This displays CJK chars more correctly than ncursesw.)

I tuned all the variables correctly, but my messages are garbled

Miscoded characters can perturbate the charset transcoding, or their auto-sensing by your $editor. Make sure that your signature, aliases, muttrc, /etc/Muttrc, and any files sourced are written with the right charset. Make sure that the charset of $locale (used to localize date and time) matches your $charset. Make sure that the mail you quote was cleanly displayed before.

:Tip: Autoconvert on-the-fly the config files from their fixed charset to the current $charset: Convert once for all your files to one given charset, your preferred one. Example here UTF-8. From now on edit them only in this charset. Then add at the beginning of your muttrc:

set config_charset=utf-8
set signature="iconv -f utf-8 ~/.signature |"
set locale=`echo "${LC_ALL:-${LC_TIME:-${LANG}}}"`

:Note: $config_charset feature is included since Mutt 1.5.7.

The $editor used by Mutt to compose messages must be configured to read and write files in current locale's charset, without smart autosensing of file's charset. When used for the <edit> function (edit the raw message), autosensing can help. When used to edit muttrc, signature, or aliases, hardcode the charset previously choosen as $config_charset.

Regarding your editor of choice: Some distros change the defaults of the editor you use or the defaults are not good enough. For example some distributions set the fileencoding of Vim to UTF-8 no matter what locale the user chooses to use. Say the user chooses LANG="de_DE@euro". Then displaying received messages containing umlauts or other special characters is most likely no problem at all. But writing messages results in a total mess. For instance sending a string containing "öäüß@€" results in "öÀÌÃ\237@â\202¬". You can fix this by setting up your own ~/.vimrc holding the following:

set encoding&		" terminal charset: follows current locale
set termencoding=
set fileencodings=	" charset auto-sensing: disabled
set fileencoding&	" auto-sensed charset of current buffer

Those settings are in fact reset to Vim's sensible defaults. Only the fileencodings is different: Its default value is very nice, but can sometimes hurt Mutt. At best, it should be unset only when called from Mutt to compose a message, not in general (how?).

Attached text files get sent misencoded with wrong charset

By default Mutt assumes the text files you attach are originally in the same charset as your terminal. Upon sending, Mutt will convert those files from $charset to one of $send_charset. This fails badly for any file that was not originally in $charset.

There are 2 solutions:

  • Interactively change the attachment's charset to the file's real charset in compose menu\

before sending, using <edit-type> function (bound to T key by default) and replying no to the "Convert?" question.\ This unfortunately bypasses automatic selection of the better suited sending charset.

  • Activate original charset auto-sensing with:
    set file_charset="utf-8:iso-8859-1"
    

Mutt then checks each $file_charset in turn.\ The first charset in which the text file is entirely valid is assumed to be the file's charset.\ Upon sending, Mutt will convert this file from auto-sensed charset to one of $send_charset.

Note: This "auto-sensing" is really educated guessing, and can fail. Keep an eye on compose menu, which displays for each attachment the charset choosen for sending (after $charset or $file_charset to $send_charset conversion). Particularly it is not able to distinguish similar 8 bits charsets like Latin-1 from Latin-2, or from CP-850, and such. UTF-8 and one 8 bits charset is OK. No more. Japanese may use "iso-2022-jp:euc-jp:shift_jis:utf-8" which works well because those charsets are coded very differently and thus are easely distinguishable.

Note: $file_charset is one of the numerous features provided by Takashi Takizawa in his Japanese patch. It is also part of the compat patch, and of the tt.assumed_charset patch. See more infos on PatchList. The feature is integrated in Debian and Gentoo Mutt packages.

MIME attachment filenames are displayed as =?iso-8859-1?Q

The filename is encoded in the deprecated RFC 2047 (which has been superseded by RFC 2231); this is commonly produced by Microsoft Outlook, and some other MUAs.

Decode these filenames by setting this parameter:

set rfc2047_parameters
Last modified 4 years ago Last modified on Jul 28, 2013 2:26:07 PM