The Hybrid Method for Accurate Patent Classification
- Autores: Yadrintsev V.1,2, Sochenkov I.1,3
-
Afiliações:
- Federal Research Center Computer Science and Control of the Russian Academy of Sciences
- Peoples’ Friendship University of Russia (RUDN University)
- Lomonosov Moscow State University
- Edição: Volume 40, Nº 11 (2019)
- Páginas: 1873-1880
- Seção: Article
- URL: https://journals.rcsi.science/1995-0802/article/view/206101
- DOI: https://doi.org/10.1134/S1995080219110325
- ID: 206101
Citar
Resumo
This article is dedicated to stacking of two approaches of patent classification. First is based on linguistically-supported k-nearest neighbors algorithm using the method of search for topically similar documents based on a comparison of vectors of lexical descriptors. Second is the word embeddings based fastText, where the sentence (or a document) vector is obtained by averaging the n-gram embeddings, and then a multinomial logistic regression exploits these vectors as features. We show in Russian and English datasets that stacking classifier shows better results compared to single classifiers.
Palavras-chave
Sobre autores
V. Yadrintsev
Federal Research Center Computer Science and Control of the Russian Academy of Sciences; Peoples’ Friendship University of Russia (RUDN University)
Autor responsável pela correspondência
Email: vvyadrincev@gmail.com
Rússia, Moscow, 119333; Moscow, 117198
I. Sochenkov
Federal Research Center Computer Science and Control of the Russian Academy of Sciences; Lomonosov Moscow State University
Autor responsável pela correspondência
Email: sochenkov@isa.ru
Rússia, Moscow, 119333; Moscow, 119991
![](/img/style/loading.gif)