Bug#802721: lintian: mojibake in HTML documentation
tags -1 patch
--
The only UTF-8 sequence that appears in these files is the sequence
for a section character, when referring to a section in the Policy
Manual. It appears repeatedly in a few of those files. This is
Unicode code point U+00A7, which in UTF-8 becomes octal 302 247. That
two-byte binary sequence could just be converted in sed to the HTML
name (HTML "entity") for that character: "§". The resulting file
will be plain ASCII.
Adding something like the following in a post-install script should
fix this issue:
find $(DESTDIR)/usr/share/doc/lintian/lintian.html/*.html -type f -exec \
sed -i 's/\o302\o247/\§/g' {} \;
The "-type f" will not be necessary if it is certain that the *.html
files will always be ordinary files, not directories. I have tried
the above without the "-type f" on the command line in bash and it
worked fine.
Paul Hardy
Reply to: