Scalable and accurate detection of code clones
- Authors: Sargsyan S.1, Kurmangaleev S.1, Belevantsev A.1,2,3, Avetisyan A.1,2,3
-
Affiliations:
- Institute for System Programming
- Moscow State University
- Moscow Institute of Physics and Technology State University
- Issue: Vol 42, No 1 (2016)
- Pages: 27-33
- Section: Article
- URL: https://journals.rcsi.science/0361-7688/article/view/176398
- DOI: https://doi.org/10.1134/S0361768816010072
- ID: 176398
Cite item
Abstract
A detailed description of a method for detection of code clones is described. This method is based on the semantic analysis of programs and on new algorithms that make it scalable without affecting its accuracy. The proposed method involves two phases. In the first phase, the program dependence graph (PDG) is constructed while the program is compiled. LLVM is used as the compilation infrastructure. In the second phase, similar subgraphs of maximum size that represent code clones are detected. Before starting the search for similar subgraphs, the PDG is divided into subgraphs that will be considered as potential clones of each other. To ensure scalability of the search for similar subgraphs, the composition of algorithms is used. The first algorithm checks that a pair of graphs cannot have similar subgraphs of the desired size; this is done in a linear amount of time. If this algorithm fails, another (approximate) algorithm is executed to find similar subgraphs of maximum size. After similar subgraphs have been found, the program code is additionally checked for the position of the code lines corresponding to the detected clone candidates. Tests showed that the developed tool is more accurate than similar tools, such as MOSS, CCFinder, and CloneDR. Results obtained for the projects Linux-2.6, Firefox Mozilla, LLVM/Clang, and OpenSSL are presented.
Keywords
About the authors
S. Sargsyan
Institute for System Programming
Author for correspondence.
Email: sevaksargsyan@ispras.ru
Russian Federation, Moscow, 109004
Sh. Kurmangaleev
Institute for System Programming
Email: sevaksargsyan@ispras.ru
Russian Federation, Moscow, 109004
A. Belevantsev
Institute for System Programming; Moscow State University; Moscow Institute of Physics and Technology State University
Email: sevaksargsyan@ispras.ru
Russian Federation, Moscow, 109004; Moscow, 119991; Dolgoprudny, Moscow oblast, 141700
A. Avetisyan
Institute for System Programming; Moscow State University; Moscow Institute of Physics and Technology State University
Email: sevaksargsyan@ispras.ru
Russian Federation, Moscow, 109004; Moscow, 119991; Dolgoprudny, Moscow oblast, 141700