[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Using .XCompose



On Sat, Jul 11, 2020 at 08:32:34PM +0000, davidson wrote:
> '!' marks the spot of nonbreaking spaces that made it into OP's first
> report of odd behavior, upon testing the white scissors XCompose rule:
> 
>   $ grep "WHITE SCISSORS"  d-u_xcompose_2020-07-08.nbsp | tr $'\xc2\xa0' \!
>   <Multi_key> <s> <x>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! : "✄"!!!! U2704 # WHITE SCISSORS

Note that tr does not handle multi-character sequences.  If you pass
something like

tr abc xyz

It does *not* look for "abc" sequences and convert only those sequences.
Rather, it looks at single characters.  It converts 'a' to 'x', and 'b'
to 'y', and 'c' to 'z'.

The number of characters in the first pattern is supposed to match the
number of characters in the second pattern, so that there is a 1:1
mapping.

GNU tr also does not handle multi-byte *characters* correctly (which
violates POSIX -- it's a known bug).

So, your tr command actually converts all c2 bytes into ! and all a0
bytes into ! as well.  Not *just* c2a0 pairs.

Nevertheless, this is useful as a first pass approximation to say that,
hey, there *might* be a bunch of NBSPs here, and you should take a
closer look.

NBSPs most often result when someone gets lazy and pastes a line from
a web page or from a Microsoft Word/Excel document into a Unix
terminal or X11 application, instead of pasting just the characters
they actually want.  Web pages, especially *older* web pages, often
use NBSPs for primitive formatting.


Reply to: