[VDZ22] Trojan Source: Bad Characters Are Coming for Your Code by Nicholas Boucher

Показать описание

Your source code may be lying to you.

New research has uncovered a technique for creating vulnerabilities that are invisible to developers. By attacking the encoding of text in source code files, adversaries can craft code that shows different logic to compilers than human reviewers. This poisoned code, which persists through copy & paste from internet resources, raises a new risk for particularly insidious supply chain attacks.

Due the near-universal dependency on encoded text across all subfields of computer science, these evil encodings can also be used to attack a wide range of targets. Beyond source code, production systems implementing tasks such as toxic content identification and machine translation can be reduced to near-zero performance with these techniques.

Starting from a set of initial attacks against machine learning systems deployed by some of the world's largest tech companies, this talk will describe a new family of encoding-based attacks that build to the result that source code can no longer be trusted do what it says. The talk will include a series of practical defenses that can be used by all developers to mitigate their exposure to this threat vector.