[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1014029: marked as done (invisible malicious unicode in source code - detection and prevention)



Your message dated Thu, 30 Jun 2022 10:22:58 +0200
with message-id <20220630082257.dfe35x2cghs7ay2f@gpm.stappers.nl>
and subject line Too broad
has caused the Debian Bug report #1014029,
regarding invisible malicious unicode in source code - detection and prevention
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
1014029: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1014029
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems
--- Begin Message ---
Package: general
Severity: normal

Quote https://trojansource.codes

> Some Vulnerabilities are Invisible

> Rather than inserting logical bugs, adversaries can attack the encoding of source code files to inject vulnerabilities.
> 

> These adversarial encodings produce no visual artifacts.

> The trick is to use Unicode control characters to reorder tokens in
source code at the encoding level.
> These visually reordered tokens can be used to display logic that,
while semantically correct, diverges from the logic presented by the
logical ordering of source code tokens.
> Compilers and interpreters adhere to the logical ordering of source
code, not the visual order.

> The attack is to use control characters embedded in comments and strings to reorder source code characters in a way that changes its logic.

> Adversaries can leverage this deception to commit vulnerabilities into code that will not be seen by human reviewers.

> This attack is particularly powerful within the context of software supply chains.
> If an adversary successfully commits targeted vulnerabilities into open source code by deceiving human reviewers, downstream software will likely inherit the vulnerability.

> The defense

- > Compilers, interpreters, and build pipelines supporting Unicode
should throw errors or warnings for unterminated bidirectional control
characters in comments or string literals, and for identifiers with
mixed-script confusable characters.

- > Language specifications should formally disallow unterminated
bidirectional control characters in comments and string literals.

- > Code editors and repository frontends should make bidirectional
control characters and mixed-script confusable characters perceptible
with visual symbols or warnings.

additional ideas to protect from this:

- **check if potential existing compromises:** scan all source code for
existing unicode

- **educate existing and future source code reviewers:** add a source
code reviewer policy which existing and future reviewers need to
acknowledge that they understand the issue.

- **remove as much unicode from source code as possible**: by reducing
the amount of unicode in source code, audits for malicious unicode with
automated tools gets simpler. If possible, if unicode is considered
essential, instead of writing `®` when required it should be encoded as
`&reg;`.

- **local check by reviewer:** document tools that source code reviewers
could/should use to scan future contributions for malicious unicode

- **lintian check:** a lintian test that notifies when unicode is
included in the source code.

- **build scripts / CI scripts:** should check if there is unicode in
any files except in opt-in expected files defines in a list. If there is
any unexpected unicode in unexpected files, the build should error out.

- **scan upstream projects source code**: check if these are compromised
by malicious unicode.

- **notify upstream projects**: these might not be aware of this issue
and already compromised by malicious unicode.

how to check example:

grep_args="--exclude=changelog.upstream --exclude-dir=.git
--binary-files=without-match --recursive --color=auto -P -n"

LC_ALL=C grep $grep_args '[^\x00-\x7F]'

LC_ALL=C grep $grep_args "[^[:ascii:]]"

A few other tools might be desirable in case grep can ever be tricked to
miss anything.

--- End Message ---
--- Begin Message ---
Hi,


This email will close this way too broad bugreport.
Having this BR closed will prevent further drain of human energy.

Those who think "but it important" do I recomment
to take smaller steps in going forward.


Regards
Geert Stappers
DD
--
Silence is hard to parse

Attachment: signature.asc
Description: PGP signature


--- End Message ---

Reply to: