Bug#265163: locales: locale.alias aliases some names to unsupported locales
Package: locales
Version: 2.3.2.ds1-15
Severity: normal
Tags: upstream
Some of the locale aliases in /etc/locale.alias map names to unsupported
locales. Namely, "eucJP" and "eucKR" aren't spelled correctly per
/usr/share/i18n/SUPPORTED, and the "SJIS" codeset isn't supported at all.
I'm attaching two files:
* A Python script I wrote that found this problem.
* A patch to correct the problem. I corrected all but one problem; I had
to drop the alias for "japanese.sjis", as adding support for the SJIS
character set to glibc is beyond my ability, and I don't even know if
that's a desirable solution.
Thanks for looking into this.
-- System Information:
Debian Release: 3.1
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: powerpc (ppc)
Kernel: Linux 2.4.25-powerpc-smp
Locale: LANG=C, LC_CTYPE=en_US.UTF-8
Versions of packages locales depends on:
ii debconf 1.4.30 Debian configuration management sy
ii libc6 [glibc-2.3.2.ds1-15] 2.3.2.ds1-15 GNU C Library: Shared libraries an
-- debconf information:
* locales/default_environment_locale: None
* locales/locales_to_be_generated: en_US ISO-8859-1, en_US.ISO-8859-15 ISO-8859-15, en_US.UTF-8 UTF-8
#!/usr/bin/python
import os
import re
RUNTIME_DEBUG = True
# Build a dictionary of canonical locales according to the GNU C library. The
# keys in this dictionary are the locale names, and the values are the character
# sets used by each locale name.
glibc_locales_canonical = { }
glibc_locale_file = open(os.path.join("/", "usr", "share", "i18n", "SUPPORTED"))
for line in glibc_locale_file.readlines():
(left_side, right_side) = re.split(r'\s', line, 1)
glibc_locales_canonical[(left_side.strip())] = right_side.strip()
glibc_locale_file.close()
if RUNTIME_DEBUG:
print "Canonical glibc locales: %s" % (glibc_locales_canonical.keys(),)
glibc_locales_aliased = { }
glibc_alias_file = open(os.path.join("/", "etc", "locale.alias"))
for line in glibc_alias_file.readlines():
# Ignore blank lines and lines beginning with a comment character.
# beginning with "XCOMM".
if re.match(r'$', line) \
or re.match(r'#', line):
continue
(left_side, right_side) = re.split(r'\s', line, 1)
glibc_locales_aliased[(left_side.strip())] = right_side.strip()
# glibc is a little weird; it aliases names to locale specifications
# *including* the codeset, whereas it omits the codeset from the officially
# supported list except when necessary for disambiguation purposes.
# Consequently, if we don't find the alias's target in the canonical list,
# we have to fall back to seeing if it is in the canonical list using the
# same codeset that is explicitly stated.
if right_side.strip() not in glibc_locales_canonical.keys():
# Try harder to find it.
goal_locale = right_side.strip()
found = False
for locale in glibc_locales_canonical.keys():
if not re.match(r'\.', locale):
locale_with_codeset = '.'.join([ locale,
glibc_locales_canonical[locale] ])
if goal_locale == locale_with_codeset:
found = True
break
if not found:
print "Warning: glibc bug: glibc locale %s is aliased to" \
" non-canonical glibc locale %s" \
% (left_side.strip(), right_side.strip())
glibc_alias_file.close()
if RUNTIME_DEBUG:
print "Aliased glibc locales: %s" % (glibc_locales_aliased.keys(),)
# vim:set ai et sts=4 sw=4 tw=80:
--- /etc/locale.alias.dpkg-dist 2004-08-11 19:15:44.000000000 -0500
+++ /etc/locale.alias 2004-08-11 19:17:57.000000000 -0500
@@ -49,14 +49,13 @@
hungarian hu_HU.ISO-8859-2
icelandic is_IS.ISO-8859-1
italian it_IT.ISO-8859-1
-japanese ja_JP.eucJP
-japanese.euc ja_JP.eucJP
-ja_JP ja_JP.eucJP
-ja_JP.ujis ja_JP.eucJP
-japanese.sjis ja_JP.SJIS
-korean ko_KR.eucKR
-korean.euc ko_KR.eucKR
-ko_KR ko_KR.eucKR
+japanese ja_JP.EUC-JP
+japanese.euc ja_JP.EUC-JP
+ja_JP ja_JP.EUC-JP
+ja_JP.ujis ja_JP.EUC-JP
+korean ko_KR.EUC-KR
+korean.euc ko_KR.EUC-KR
+ko_KR ko_KR.EUC-KR
lithuanian lt_LT.ISO-8859-13
norwegian no_NO.ISO-8859-1
nynorsk nn_NO.ISO-8859-1
Reply to: