Optimization Approach to Selecting Methods of Detecting Anomalies in Homogeneous Text Collections
- Authors: Krasnov F.V, Smaznevich I.S, Baskakova E.N
- Issue: Vol 20, No 4 (2021)
- Pages: 869-904
- Section: Information security
- URL: https://journals.rcsi.science/2713-3192/article/view/266326
- DOI: https://doi.org/10.15622/ia.20.4.5
- ID: 266326
Cite item
Full Text
Abstract
About the authors
F. V Krasnov
Email: fkrasnov@naumen.ru
Tatishcheva street 49А
I. S Smaznevich
Email: ismaznevich@naumen.ru
Tatishcheva street 49А
E. N Baskakova
Email: enbaskakova@naumen.ru
Tatishcheva street 49A
References
- Mahapatra A., Srivastava N., Srivastava J. Contextual anomaly detection in text data // Algorithms. 2012. vol. 5. no. 4. pp. 469-489.
- Ghosal T. et al. Novelty goes deep. A deep neural solution to document level novelty detection // Proceedings of the 27th International Conference on Computational Linguistics, 2018. pp. 2802–2813.
- Zhao L., Zhang M., Ma S. The nature of novelty detection // Information Retrieval. 2006. vol. 9. no. 5. С. 521–541.
- Guzman J., Poblete B. On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model // Proceedings of the ACM SIGKDD workshop on outlier detection and description. 2013. pp. 31-39.
- Lau J. H. et al. Word sense induction for novel sense detection // Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. 2012. pp. 591-601.
- Гурина А.О., Гузев О.Ю., Елисеев В.Л. Обнаружение аномальных событий на хосте с использованием автокодировщика // International Journal of Open Information Technologies. 2020. Т. 8. №. 8.
- Goldstein M., Dengel A. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm // KI-2012: Poster and Demo Track. 2012. pp. 59-63.
- Zhao Y., Nasrullah Z., Li Z. Pyod: A python toolbox for scalable outlier detection // arXiv preprint arXiv:1901.01588. 2019.
- Denning D.E. An intrusion-detection model // IEEE Transactions on software engineering. 1987. no. 2. pp. 222-232.
- Markou M., Singh S. Novelty detection: a review—part 1: statistical approaches // Signal processing. 2003. vol. 83. no. 12. pp. 2481-2497.
- Chandola V., Banerjee A., Kumar V. Anomaly detection: A survey // ACM computing surveys (CSUR). 2009. vol. 41. no. 3. pp. 1-58.
- Pimentel M.A.F. et al. A review of novelty detection // Signal Processing. 2014. vol. 99. pp. 215-249.
- Faria E.R. et al. Novelty detection in data streams // Artificial Intelligence Review. 2016. vol. 45. no. 2. pp. 235-269.
- Ruff L. et al. A unifying review of deep and shallow anomaly detection // Proceedings of the IEEE. 2021.
- Hendrycks D., Mazeika M., Dietterich T. Deep anomaly detection with outlier exposure // arXiv preprint arXiv:1812.04606. 2018.
- Gorokhov O., Petrovskiy M., Mashechkin I. Convolutional neural networks for unsupervised anomaly detection in text data // International Conference on Intelligent Data Engineering and Automated Learning. Springer, Cham, 2017. pp. 500-507.
- Yang Y. et al. Topic-conditioned novelty detection // Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 2002. pp. 688-693.
- Ng K.W. et al. Novelty detection for text documents using named entity recognition // 2007 6th international conference on information, communications & signal processing. IEEE, 2007. pp. 1-5.
- Amplayo R.K., Hong S.L., Song M. Network-based approach to detect novelty of scholarly literature // Information Sciences. 2018. vol. 422. pp. 542-557.
- Li Z. et al. COPOD: copula-based outlier detection // arXiv preprint arXiv:2009.09463. 2020.
- Mikolov T., Yih W., Zweig G. Linguistic regularities in continuous space word representations // Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies. 2013. pp. 746-751.
- Краснов Ф.В., Смазневич И.С. Фактор объяснимости алгоритма в задачах поиска схожести текстовых документов // Вычислительные технологии. 2020. Т. 25. №. 5. С. 107-123.
- Schubert E., Gertz M. Intrinsic t-stochastic neighbor embedding for visualization and outlier detection // International Conference on Similarity Search and Applications. Springer, Cham, 2017. pp. 188-203.
- McInnes L., Healy J., Melville J. Umap: Uniform manifold approximation and projection for dimension reduction // arXiv preprint arXiv:1802.03426. 2018.
- Narayan A., Berger B., Cho H. Density-preserving data visualization unveils dynamic patterns of single-cell transcriptomic variability // bioRxiv. 2020.
- Campos G.O. et al. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study // Data mining and knowledge discovery. 2016. vol. 30. №. 4. pp. 891-927.
- Amarbayasgalan T., Jargalsaikhan B., Ryu K.H. Unsupervised novelty detection using deep autoencoders with density-based clustering // Applied Sciences. 2018. vol. 8. no. 9. pp. 1468.
- Campello R.J.G.B. et al. Hierarchical density estimates for data clustering, visualization, and outlier detection // ACM Transactions on Knowledge Discovery from Data (TKDD). 2015. vol. 10. no. 1. pp. 1-51.
- Ankerst M. et al. OPTICS: Ordering points to identify the clustering structure // ACM Sigmod record. 1999. vol. 28. no. 2. pp. 49-60.
- Karypis G., Han E.H., Kumar V. Chameleon: Hierarchical clustering using dynamic modeling // Computer. 1999. vol. 32. no. 8. pp. 68-75.
- Karypis G., Kumar V. A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices // University of Minnesota, Department of Computer Science and Engineering, Army HPC Research Center, Minneapolis, MN. 1998. vol. 38.
- Kannan R. et al. Outlier detection for text data // Proceedings of the 2017 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2017. pp. 489-497.
- Zhang J., Ghahramani Z., Yang Y. A probabilistic model for online document clustering with application to novelty detection // Advances in neural information processing systems. 2004. vol. 17. pp. 1617-1624.
- Manevitz L. M., Yousef M. One-class SVMs for document classification // Journal of machine Learning research. 2001. vol. 2. no. Dec. pp. 139-154.
- Zimek A., Campello R.J.G.B., Sander J. Ensembles for unsupervised outlier detection: challenges and research questions a position paper // ACM SIGKDD Explorations Newsletter. 2014. vol. 15. no. 1. pp. 11-22.
- Marques H.O. et al. Internal evaluation of unsupervised outlier detection // ACM Transactions on Knowledge Discovery from Data (TKDD). 2020. vol. 14. no. 4. pp. 1-42.
- Liu F.T., Ting K.M., Zhou Z.H. Isolation Forest // 2008 Eighth IEEE international conference on data mining. IEEE, 2008. pp. 413-422.
- Краснов Ф.В. Сравнительный анализ точности методов визуализации структуры коллекции текстов // International Journal of Open Information Technologies. 2021. Т. 9. №. 4. С. 79-84.
- Пименов В.И., Воронов М.В. Формализация регулятивных текстов // Информатика и автоматизация. 2021. № 3 (20). C. 562–590.
Supplementary files
