Author Attribution of Binaries (available)

Starting Date: Summer 2020
Duration: 12 weeks
Time commitment: 20 hours a week
Prerequisites: Binary Analysis, Python, some Machine Learning notions might be useful

Attributing binaries, whether malicious or benign, is a difficult and time consuming task however, there is an increase demand for this either for attributing cyber attacks or preventing plagiarism.

The goal of this project is to use machine learning to predict authorship of binaries. You will use a corpus of open source software either for static or dynamic analysis of binaries to classify or cluster the binaries into authors. The project has several technical parts:

  • Statically or dynamically analysing binaries.
  • Extracting features usable for machine learning, e.g. control flow graphs.
  • Training a classifier, using scikit-learn, on the features extracted one half of the binaries

Testing the classifier on the other half to evaluate. The same classifier can then be used to predict authors on binaries when the author is unknown.

You should feel confident writing Python code, and know the basics of binary analysis or be willing to learn them quickly. Knowledge of machine learning isn’t necessary, but you should be interested in learning more about it.