[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Multibyte encoding - what should a package provide?



On Wednesday 8 September 1999, at 11 h 27, the keyboard of sen_ml@eccosys.com 
wrote:

>   -the current approach in unicode and iso 10646 is to treat certain
>    characters (appearance - glyphs?) from different languages as the same 
>    character (byte representatoin - code point?).  
...
>    the most often cited example i hear of is for kanji (roughly, 
>    ideographs) -- some kanji from different locales are treated as 
>    identical. 

There is a classical example which is even simpler to grasp for European-languages users: the decimal dot in english (3.14) is the same glyph (== shape) as the dot which marks the end of a sentence.

In SGML style, they would be two different elements, because they have different semantics (for instance, when producing a French version, you have to replace the decimal dot by a comma, but not the end-of-sentence dot). I

In Unicode, they are the same glyph and the same character. The rationale for this decision is explained in the standard (quick summary: it would be extremely long and complicated to list the possible semantics of a glyph).



Reply to: