Re: OT: Python (was: Make Unicode bugs release critical?)
On Wed, Feb 16, 2011 at 01:01:07AM +0100, Vincent Lefevre wrote:
> On 2011-02-14 16:43:11 +0000, Ian Jackson wrote:
> > When LC_CTYPE=en_GB.utf-8, programs which attempt to print unicode
> > characters to stdout should use UTF-8. That's what LC_TYPE means.
>
> So, "cat", "grep", etc. are all broken. :)
How come?
"cat" will, for any valid UTF-8 character on input, print a valid UTF-8
character on output. For any valid ISO-8859-1 character on input, it will
print a valid ISO-8859-1 character on output.
"grep" on the other hand has to actually understand the encoding -- and it
does. Try this:
$ echo "ą"|LC_CTYPE=C grep --color=always .
Will be mangled.
$ echo "ą"|LC_CTYPE=en_US.utf-8 grep --color=always .
Will be handled correctly.
--
1KB // Microsoft corollary to Hanlon's razor:
// Never attribute to stupidity what can be
// adequately explained by malice.
Reply to: