[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#600310: wrong collation(?) order for et_EE.UTF-8 causes regexps to fail matching randomly



Ok, after discussing on #debian-devel and some more thinking,
even if it's 02:23 here already... I now see the problem
isn't in locales package actually, and it should affect
other locales too.

The prob is that people used to use [a-z] to mean all 26
latin chars, while various locales have them in different
order, like this Estonian case:
http://en.wikipedia.org/wiki/Estonian_alphabet

For now, there are 2 important problem cases: it's cron
and run-parts.  Both are using [a-z]-like regexps to filter
out "invalid" filenames.

In cron this comes with this context:

+       /* Get the default locale character set for the mail
+        * "Content-Type: ...; charset=" header
+        */
+       setlocale(LC_ALL,""); /* set locale to system defaults or to
+                                that specified by any  LC_* env vars */
+       /* Except that "US-ASCII" is preferred to "ANSI_x3.4-1968" in MIME,
+        * even though "ANSI_x3.4-1968" is the official charset name. */
+       if ( ( cs = nl_langinfo( CODESET ) ) != 0L &&
+               strcmp(cs, "ANSI_x3.4-1968") != 0 )
+           strncpy( cron_default_mail_charset, cs, MAX_ENVSTR );
+       else
+           strcpy( cron_default_mail_charset, "US-ASCII" );
+

so it's basically used only to get "proper" charset name in
email notifications, but setlocale() comes not only with
charset name, but with collation sequence and other things.

Note both cron and run-parts cases are RC-critical too and
should be fixed -- it's just that this is not bug in locales
anymore.

And it's too late already for me to think more ;)

Thanks!

/mjt



Reply to: