[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Filtering out some untranslatable strings



Frans Spiesschaert schreef op zo 09-08-2020 om 13:00 [+0200]:
> Hi everyone and Wolfgang in particular ;)
> 
> I prepared a code snippet that is able to remove some untranslatable
> strings from the manuals.
> I tested it locally (as a separate script) on the Debian Edu bullseye
> manual, but more rigorous testing is probably advisable before taking it
> into production.
> I suppose the snippet could be appended to scripts/get_manual, although I
> could not test this here.
> 
> ---- begin code snippet -----
> 
> # create $name-stripped.xml
> # wich will remove some non-translatable strings
> # ---remove untranslatable image names--- #
> echo "removing image names"
> sed -e 's#<imagedata.*</imageobject>#</imageobject>#g' $xmlfile > $name-
> stripped.xml
> # ---remove paragraphs that just have a <ulink> and no other text--- #
> echo "removing link paragraphs"
>     #---# first copy those paragraphs to a tempfile #---#
>     TMPFILE3=$(mktemp)
>     cat $xmlfile | sed -n '/^<para><ulink/p' | sed -n '/> *$/p'  >
> $TMPFILE3
>     #---# then replace those links with an empty string #---#
>     #---# and keep only the <para> tag to prevent xml from being broken
> #---#
>     while read line ;
>         do sed -i "s#$line#<para>#" $name-stripped.xml
>         done < $TMPFILE3
> # ---remove FIXME: paragraphs--- #
> # ---(currently that colon is missing in some FIXME paragraphs)--- #
> echo "removing FIXME: paragraphs"
> sed -i '/^FIXME\:/d' $name-stripped.xml
                 ^
should be: sed -i '/^FIXME:/d' $name-stripped.xm

> 
> ---- end code snippet -----
> 
> For this to be useful, also po4a.cfg needs a small addition.
> It should look like this (with the added pot_in line):
> [po_directory] .
> 
> [type: docbook] debian-edu-bullseye-manual.xml \
> pot_in:debian-edu-bullseye-manual-stripped.xml \
> $lang:$lang.xml \
> add_$lang:?./$lang.add \
> opt:"-o nodefault='<inlinemediaobject> <imagedata>' -o
> untranslated='<listitem> <inlinemediaobject> <imagedata>' -M UTF-8 -k 15"
> 
> With this enabled, the Debian Edu bullseye manual counts 1154
> translatable strings
> instead of 1210 strings now.
> More untranslatable strings could be moved out with some adaptations to
> the wiki
> (e.g. reword paragraphs that contain a link, so that those links could
> become separate paragraphs in a meaningful way).
> 
> -- 
> Kind regards,
> Frans Spiesschaert
> 



Reply to: