[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#555330: Bug#555331: [col] improperly fails with Invalid or incomplete multibyte or wide character



On Mon, Nov 09, 2009 at 12:48:03PM +0100, Raphael Hertzog wrote:
> Package: bsdmainutils
> Version: 8.0.1
> Severity: serious
> 
> Since today I gets lots of lintian warnings (manpage-has-errors-from-man)
> on my dpkg builds because col fails with:
> col: Invalid or incomplete multibyte or wide character
> 
> You can reproduce it by doing this:
> LANG=C man --warnings -E UTF-8 -l /usr/share/man/man8/update-alternatives.8.gz >/dev/null
> 
> I don't know if it's col's fault or if it's man-db that does not use col
> properly but since col changed recently (and not man-db), I filed the bug
> against col. Note that dropping LANG=C makes the warning go away so it's
> most certainly locale related. Using any other locale seems to work, even
> one that is not UTF-8.
> 
> Severity serious to avoid propagation to testing until we know more on the
> nature of the problem. 

This bug is somewhere in the intersection of bsdmainutils, man-db,
lintian, and locales. Have fun. :-)

The proximate cause is that man uses -Tutf8 and thus outputs UTF-8
hyphens even under LANG=C (compare #547695), and that confuses col now
that it knows about the encoding of its input data.

However, the upstream patch referred to in #547695 is not sufficient
here. lintian uses the '-E UTF-8' option, which forces man to use UTF-8,
overriding the default. This used to work fine when col was dumb; now
that it's smart, things are a bit more problematic. The reason that
lintian does this is that it needs to force UTF-8 output somehow or else
CJK manual pages tend not to work properly, but there is no UTF-8 locale
that's guaranteed to be available on all systems.

In the short term, I think the best approach would be for man to set
LC_CTYPE to some appropriate locale that matches the encoding requested
by -E while running col. I'll see if I can arrange for this. However,
such a locale is not actually guaranteed to exist. Perhaps lintian needs
to generate a UTF-8 locale if it can't find one otherwise, a bit like
the hack in installation-locale; or perhaps we should just make sure
that there's always a C.UTF-8 locale on the system, which could be used
to get UTF-8 character type semantics without implying a particular
language or country.

-- 
Colin Watson                                       [cjwatson@debian.org]




Reply to: