Published in

Hindawi, Security and Communication Networks, (2021), p. 1-19, 2021

DOI: 10.1155/2021/9954520

Links

Tools

Export citation

Search in Google Scholar

Hierarchical Attention Graph Embedding Networks for Binary Code Similarity against Compilation Diversity

Journal article published in 2021 by Yan Wang, Peng Jia ORCID, Cheng Huang ORCID, Jiayong Liu, Peisong He ORCID
This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Orange circle
Preprint: archiving restricted
Orange circle
Postprint: archiving restricted
Green circle
Published version: archiving allowed
Data provided by SHERPA/RoMEO

Abstract

Binary code similarity comparison is the technique that determines if two functions are similar by only considering their compiled form, which has many applications, including clone detection, malware classification, and vulnerability discovery. However, it is challenging to design a robust code similarity comparison engine since different compilation settings that make logically similar assembly functions appear to be very different. Moreover, existing approaches suffer from high-performance overheads, lower robustness, or poor scalability. In this paper, a novel solution HBinSim is proposed by employing the multiview features of the function to address these challenges. It first extracts the syntactic and semantic features of each basic block by static analysis. HBinSim further analyzes the function and constructs a syntactic attribute control flow graph and a semantic attribute control flow graph for each function. Then, a hierarchical attention graph embedding network is designed for graph-structured data processing. The network model has a hierarchical structure that mirrors the hierarchical structure of the function. It has three levels of attention mechanisms applied at the instruction, basic block, and function level, enabling it to attend differentially to more and less critical content when constructing the function representation. We conduct extensive experiments to evaluate its effectiveness and efficiency. The results show that our tool outperforms the state-of-the-art binary code similarity comparison tools by a large margin against compilation diversity clone searching. A real-world vulnerabilities search case further demonstrates the usefulness of our system.