Measuring similarity between Karel programs using character and word n-grams
- Авторы: Sidorov G.1, Ibarra Romero M.1, Markov I.1, Guzman-Cabrera R.2, Chanona-Hernández L.3, Velásquez F.4
-
Учреждения:
- Instituto Politécnico Nacional (IPN)
- Engineering Division
- Instituto Politécnico Nacional
- Polytechnic University of Queretaro
- Выпуск: Том 43, № 1 (2017)
- Страницы: 47-50
- Раздел: Article
- URL: https://journals.rcsi.science/0361-7688/article/view/176478
- DOI: https://doi.org/10.1134/S0361768817010066
- ID: 176478
Цитировать
Аннотация
We present a method for measuring similarity between source codes. We approach this task from the machine learning perspective using character and word n-grams as features and examining different machine learning algorithms. Furthermore, we explore the contribution of the latent semantic analysis in this task. We developed a corpus in order to evaluate the proposed approach. The corpus consists of around 10,000 source codes written in the Karel programming language to solve 100 different tasks. The results show that the highest classification accuracy is achieved when using Support Vector Machines classifier, applying the latent semantic analysis, and selecting as features trigrams of words.
Ключевые слова
Об авторах
G. Sidorov
Instituto Politécnico Nacional (IPN)
Автор, ответственный за переписку.
Email: sidorov@cic.ipn.mx
Мексика, Mexico City
M. Ibarra Romero
Instituto Politécnico Nacional (IPN)
Email: francisco.castillo@upq.mx
Мексика, Mexico City
I. Markov
Instituto Politécnico Nacional (IPN)
Автор, ответственный за переписку.
Email: markovilya@yahoo.com
Мексика, Mexico City
R. Guzman-Cabrera
Engineering Division
Автор, ответственный за переписку.
Email: guzmanc81@gmail.com
Мексика, Guanajuato
L. Chanona-Hernández
Instituto Politécnico Nacional
Автор, ответственный за переписку.
Email: lchanona@gmail.com
Мексика, Mexico City
F. Velásquez
Polytechnic University of Queretaro
Автор, ответственный за переписку.
Email: francisco.castillo@upq.mx
Мексика, Queretaro
Дополнительные файлы
