Control Flow Graph Reconstruction using Control-Flow Integrity (ongoing)

Starting Date: Summer 2019
Duration: 10-12 weeks
Time commitment: 20 hours a week
Prerequisites: Advanced level of Python, knowledge of x86-64 assembly, systems programming

Control flow graphs (CFGs) show the set of possible flows a computer program can have at run-time, in particular, how a certain target can be reached inside (binary) code. CFG are incredibly useful in almost any program analysis technique. For example, when reverse engineering, the control flow graph is the main component to build on top of. Reconstructing a control flow graph has been an open problem within computer science for decades. In particular, indirect control flow can often not reliably be analysed.

Consider a “call rax” instruction. When encountering this instruction during code analysis, one would not know all the possibilities where this call can go. One technique that comes in handy is Control-flow Integrity (CFI, https://nebelwelt.net/blog/20160913-ControlFlowIntegrity.html). CFI is a defence technique that limits any indirect control flow to a limited set of targets. CFI is a generic technique used to defeat exploitation techniques, and has many implementations.

To give an example, one implementation provides a table saved with pointers to every “allowed target”. Instead of the indirect call, the application then reads out a value from the table and jumps to the target in the table instead. This technique ensures that all indirect control flow has to use this table, so any other target becomes impossible. In other words, CFI indirectly provides us with a control flow graph at each indirect jump or call instruction. Recovering this will aid in general binary analysis by providing a reliable and fast control flow graph recovery. On top, this will give an upper bound for attackers to what extent they can influence control flow.

The goal of this project is to build a component that will detect a given CFI implementation on a binary executable, reconstruct the indirect control flow, combine this with the direct control flow and integrate this into the Binary Analysis framework called angr (http://angr.io/). Please note that angr already has a big set of tools that should make this task significantly easier. You should be confident in Python 3.x, at least know the basics of x86-64 assembly, and you should be enthusiastic about the project. Preferably you have some experience with reverse engineering and/or angr as a framework, but this is not necessary.