Adversarial Machine Learning: from Models to Practice
Machine learning (ML) techniques are increasingly used in a broad array of high-stakes applications, including cybersecurity and autonomous driving. However, ML models are often susceptible to adversarial example attacks, in which an adversary makes changes to the input in order to cause misclassification; for example, an adversary may modify malware in order for it to bypass ML-based malware detectors. A conventional approach to evaluate ML robustness to such attacks, as well as to design robust ML, is by considering simplified feature-space models of attacks, where the attacker changes ML features directly to effect misclassification, while minimizing or constraining the magnitude of this change; we term methods for making ML robust against such attacks simply as “Robust ML”. We investigate the effectiveness of “Robust ML” in the face of realizable attacks. What constitutes realizable attacks varies by application. We consider two settings: PDF malware detection and image classification. In malware detection, a realizable attack involves modifying actual malware, while preserving malicious functionality in the process. In image classification, realizable attacks (also known as physical attacks) are those that are possible to implement in the physical space without raising undue suspicion.
The first part of the talk will focus on PDF malware detection. First, we demonstrate Robust ML is surprisingly effective in defending against realizable attacks on PDF malware detection. However, we observe two issues: 1) the resulting classifier has a high false positive rate, and 2) robustness is more limited in structure-based detection. Next, we show that augmenting the feature space models with conserved features (those that cannot be unilaterally modified without compromising malicious functionality) significantly improves performance. Finally, we show that feature space models can enable generalized robustness when faced with multiple realizable attacks, as compared to classifiers which are tuned to be robust to a specific realizable attack.
- second part of the talk concerns image classification. First, we evaluate two methods for obtaining Robust ML, adversarial training a la Madry et al. and randomized smoothing, against several high-profile physically realizable attacks. Here, we find that Robust ML provides inadequate defense. Next, we present a novel stylized model, rectangular occlusion attacks, give several simple approaches for computing such attacks, and a defense based on the general adversarial training idea. We show that this approach yields robustness to three distinct physical attacks (adversarial eyeglass frames, stickers on stop signs, adversarial patch) in two unrelated domains (face recognition and traffic sign classification).
Yevgeniy Vorobeychik is an Associate Professor of Computer Science & Engineering at Washington University in St. Louis. Previously, he was an Assistant Professor of Computer Science at Vanderbilt University. Between 2008 and 2010 he was a post-doctoral research associate at the University of Pennsylvania Computer and Information Science department. He received Ph.D. (2008) and M.S.E. (2004) degrees in Computer Science and Engineering from the University of Michigan, and a B.S. degree in Computer Engineering from Northwestern University. His work focuses on game theoretic modeling of security and privacy, adversarial machine learning, algorithmic and behavioral game theory and incentive design, and network science. Dr. Vorobeychik received an NSF CAREER award in 2017, and was invited to give an IJCAI-16 early career spotlight talk. He also received several Best Paper awards, including one of 2017 Best Papers in Health Informatics. He was nominated for the 2008 ACM Doctoral Dissertation Award and received honorable mention for the 2008 IFAAMAS Distinguished Dissertation Award.