Re: Multibyte encoding - what should a package provide?

To: sen_ml@eccosys.com
Cc: debian-devel@lists.debian.org
Subject: Re: Multibyte encoding - what should a package provide?
From: Stephane Bortzmeyer <bortzmeyer@pasteur.fr>
Date: Wed, 08 Sep 1999 09:53:06 +0200
Message-id: <[🔎] 199909080753.JAA05138@ezili.sis.pasteur.fr>
In-reply-to: <[🔎] 19990908112735H.1000@eccosys.com> (sen_ml@eccosys.com's message of Wed, 08 Sep 1999 11:27:35 +0900)

On Wednesday 8 September 1999, at 11 h 27, the keyboard of sen_ml@eccosys.com 
wrote:

>   -the current approach in unicode and iso 10646 is to treat certain
>    characters (appearance - glyphs?) from different languages as the same 
>    character (byte representatoin - code point?).  
...
>    the most often cited example i hear of is for kanji (roughly, 
>    ideographs) -- some kanji from different locales are treated as 
>    identical. 

There is a classical example which is even simpler to grasp for European-languages users: the decimal dot in english (3.14) is the same glyph (== shape) as the dot which marks the end of a sentence.

In SGML style, they would be two different elements, because they have different semantics (for instance, when producing a French version, you have to replace the decimal dot by a comma, but not the end-of-sentence dot). I

In Unicode, they are the same glyph and the same character. The rationale for this decision is explained in the standard (quick summary: it would be extremely long and complicated to list the possible semantics of a glyph).

Reply to:

Follow-Ups:
- Re: Multibyte encoding - what should a package provide?
  - From: sen_ml@eccosys.com

References:
- Re: Multibyte encoding - what should a package provide?
  - From: sen_ml@eccosys.com

Prev by Date: gpg confusion
Next by Date: voicerecog in Debian
Previous by thread: Re: Multibyte encoding - what should a package provide?
Next by thread: Re: Multibyte encoding - what should a package provide?
Index(es):
- Date
- Thread