Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly

To: 600310@bugs.debian.org
Cc: Debian Bug Tracking System <owner@bugs.debian.org>
Subject: Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
From: Michael Tokarev <mjt@tls.msk.ru>
Date: Sat, 16 Oct 2010 02:29:36 +0400
Message-id: <[🔎] 4CB8D5D0.1000304@msgid.tls.msk.ru>
Reply-to: Michael Tokarev <mjt@tls.msk.ru>, 600310@bugs.debian.org
In-reply-to: <handler.600310.B.128717926920032.ack@bugs.debian.org>
References: <[🔎] 20101015214742.17728.79430.reportbug@gandalf.local> <handler.600310.B.128717926920032.ack@bugs.debian.org>

Ok, after discussing on #debian-devel and some more thinking,
even if it's 02:23 here already... I now see the problem
isn't in locales package actually, and it should affect
other locales too.

The prob is that people used to use [a-z] to mean all 26
latin chars, while various locales have them in different
order, like this Estonian case:
http://en.wikipedia.org/wiki/Estonian_alphabet

For now, there are 2 important problem cases: it's cron
and run-parts.  Both are using [a-z]-like regexps to filter
out "invalid" filenames.

In cron this comes with this context:

+       /* Get the default locale character set for the mail
+        * "Content-Type: ...; charset=" header
+        */
+       setlocale(LC_ALL,""); /* set locale to system defaults or to
+                                that specified by any  LC_* env vars */
+       /* Except that "US-ASCII" is preferred to "ANSI_x3.4-1968" in MIME,
+        * even though "ANSI_x3.4-1968" is the official charset name. */
+       if ( ( cs = nl_langinfo( CODESET ) ) != 0L &&
+               strcmp(cs, "ANSI_x3.4-1968") != 0 )
+           strncpy( cron_default_mail_charset, cs, MAX_ENVSTR );
+       else
+           strcpy( cron_default_mail_charset, "US-ASCII" );
+

so it's basically used only to get "proper" charset name in
email notifications, but setlocale() comes not only with
charset name, but with collation sequence and other things.

Note both cron and run-parts cases are RC-critical too and
should be fixed -- it's just that this is not bug in locales
anymore.

And it's too late already for me to think more ;)

Thanks!

/mjt

Reply to:

References:
- Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
  - From: Michael Tokarev <mjt@tls.msk.ru>

Prev by Date: Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
Next by Date: Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
Previous by thread: Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
Next by thread: Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly
Index(es):
- Date
- Thread