Machine Learning vs Machine Learning in Malware Evasion (available)

Starting Date: Summer 2019
Duration: 10-12 weeks
Time commitment: 20 hours a week
Prerequisites: Advanced knowledge of Python or similar language, some Machine Learning notions might be useful, knowledge of how malware works

Machine learning is a popular approach to signature-less malware detection because it can generalize to new (unseen) malware families. Some recent works have proposed the use of AI/ML-powered malware to bypass machine learning anti-malware systems.

The goal of the project is to model the system of malware vs anti-malware systems as two opponents using various AI/ML strategies to bypass (i.e., evade) the other side. For instance, by exploiting reinforcement learning through a series of games played against the anti-malware system, a malware learns which sequences of operations are likely to result in evading the anti-malware, or vice-versa, the defender might mutate their strategy to bypass the evasive techniques of the malware, and so on and so forth. The goal of the project is to understand how this interactive ML vs ML system (modelled as a set of games) evolves, e.g. whether it converges to a stable state (and who benefits from this state: attackers or defenders?) or to a never-ending game, or whether the convergence (if any) depends on some initial assumptions (e.g., threat models).

Some references:

https://arxiv.org/abs/1801.08917
https://github.com/endgameinc/gym-malware
https://i.blackhat.com/us-18/Thu-August-9/us-18-Kirat-DeepLocker-Concealing-Targeted-Attacks-with-AI-Locksmithing.pdf
https://dl.acm.org/citation.cfm?id=3150378
https://arxiv.org/pdf/1811.01190.pdf
https://github.com/a0rtega/pafish
https://arxiv.org/abs/1712.03141