Measuring similarity between Karel programs using character and word n-grams


如何引用文章

全文:

开放存取 开放存取
受限制的访问 ##reader.subscriptionAccessGranted##
受限制的访问 订阅存取

详细

We present a method for measuring similarity between source codes. We approach this task from the machine learning perspective using character and word n-grams as features and examining different machine learning algorithms. Furthermore, we explore the contribution of the latent semantic analysis in this task. We developed a corpus in order to evaluate the proposed approach. The corpus consists of around 10,000 source codes written in the Karel programming language to solve 100 different tasks. The results show that the highest classification accuracy is achieved when using Support Vector Machines classifier, applying the latent semantic analysis, and selecting as features trigrams of words.

作者简介

G. Sidorov

Instituto Politécnico Nacional (IPN)

编辑信件的主要联系方式.
Email: sidorov@cic.ipn.mx
墨西哥, Mexico City

M. Ibarra Romero

Instituto Politécnico Nacional (IPN)

Email: francisco.castillo@upq.mx
墨西哥, Mexico City

I. Markov

Instituto Politécnico Nacional (IPN)

编辑信件的主要联系方式.
Email: markovilya@yahoo.com
墨西哥, Mexico City

R. Guzman-Cabrera

Engineering Division

编辑信件的主要联系方式.
Email: guzmanc81@gmail.com
墨西哥, Guanajuato

L. Chanona-Hernández

Instituto Politécnico Nacional

编辑信件的主要联系方式.
Email: lchanona@gmail.com
墨西哥, Mexico City

F. Velásquez

Polytechnic University of Queretaro

编辑信件的主要联系方式.
Email: francisco.castillo@upq.mx
墨西哥, Queretaro

补充文件

附件文件
动作
1. JATS XML

版权所有 © Pleiades Publishing, Ltd., 2017