Effect of Different Factors on Predicting Constants of Acidity of Low-Molecular Organic Compounds by Means of Machine Learning
- Авторлар: Matyushin D.1, Sholokhova A.1, Buryak A.1
-
Мекемелер:
- Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences
- Шығарылым: Том 97, № 2 (2023)
- Беттер: 262-269
- Бөлім: ХЕМОИНФОРМАТИКА И КОМПЬЮТЕРНОЕ МОДЕЛИРОВАНИЕ
- URL: https://journals.rcsi.science/0044-4537/article/view/136524
- DOI: https://doi.org/10.31857/S0044453723020152
- EDN: https://elibrary.ru/ECXZUI
- ID: 136524
Дәйексөз келтіру
Аннотация
A study is performed of the effect the way of standardizing the molecular structure and parameters of calculating molecular fingerprints has on the accuracy of predicting constants of acidity. It is shown that standardization (i.e., the choice of the tautomeric form and the way of writing the structure of the molecule) using OpenEye QuacPac gives the best results, but the RDKit library allows comparable accuracy to be achieved. It is established that how the charge state is chosen has a great effect on the accuracy of predictions. The accuracy of predictions depending on the radius (size of substructures) of circular molecular fingerprints is studied, and the best results are achieved using radius r = 2. A random forest, a machine learning algorithm, is used. It is also shown that the use of support vectors ensures fairly high accuracy when optimizing hyperparameters.
Негізгі сөздер
Авторлар туралы
D. Matyushin
Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences
Email: shonastya@yandex.ru
119071, Moscow, Russia
A. Sholokhova
Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences
Email: shonastya@yandex.ru
119071, Moscow, Russia
A. Buryak
Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences
Хат алмасуға жауапты Автор.
Email: shonastya@yandex.ru
119071, Moscow, Russia
Әдебиет тізімі
- Baltruschat M., Czodrowski P. // F1000Res. 2020. V. 9. P. 113. https://doi.org/10.12688/f1000research.22090.2
- Mansouri K., Cariello N.F., Korotcov A. et al. // J. Cheminform. 2019. V. 11. № 1. P. 60. https://doi.org/10.1186/s13321-019-0384-1
- Mayr F., Wieder M., Wieder O. et al. // Front. Chem. 2022. V. 10. P. 866585. https://doi.org/10.3389/fchem.2022.866585
- Lu Y., Anand S., Shirley W. et al. // J. Chem. Inf. Model. 2019. V. 59. № 11. P. 4706. https://doi.org/10.1021/acs.jcim.9b00498
- Rupp M., Korner R., Tetko I. // CCHTS. 2011. V. 14. № 5. P. 307. https://doi.org/10.2174/138620711795508403
- Lionta E., Spyrou G., Vassilatis D. et al. // CTMC. 2014. V. 14. № 16. P. 1923. https://doi.org/10.2174/1568026614666140929124445
- Bahi M., Batouche M. // 2018 3rd International Conference on Pattern Analysis and Intelligent Systems (PAIS). Tebessa: IEEE, 2018. P. 1–5. https://doi.org/10.1109/PAIS.2018.8598488
- Yang Q., Ji H., Fan X. et al. // J. Chromatogr. A. 2021. V. 1656. P. 462536. https://doi.org/10.1016/j.chroma.2021.462536
- Fedorova E.S., Matyushin D.D., Plyushchenko I.V. et al. // J. Chromatogr. A. 2022. V. 1664. P. 462792. https://doi.org/10.1016/j.chroma.2021.462792
- Milyushkin A.L., Matyushin D.D., Buryak A.K. // J. Chromatogr. A. 2020. V. 1613. P. 460724. https://doi.org/10.1016/j.chroma.2019.460724
- Zenkevich I.G., Nikitina D.A. // Russ. J. Phys. Chem. A. 2021. V. 95. № 2. P. 395. https://doi.org/ Зенкевич И.Г., Никитина Д.А. // Журн. физ. химии. 2021. Т. 95. № 2. С. 285.https://doi.org/10.1134/S003602442102028X
- Angra S., Ahuja S. // 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC). Chirala, Andhra Pradesh, India: IEEE, 2017. P. 57. https://doi.org/10.1109/ICBDACI.2017.8070809
- Mansouri K., Grulke C.M., Judson R.S. et al. // J. Cheminform. 2018. V. 10. № 1. P. 10. https://doi.org/10.1186/s13321-018-0263-1
- Parmar A., Katariya R., Patel V. // International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018 / Ed. Hemanth J., Fernando X., Lafata P. et al. Cham: Springer International Publishing, 2019. V. 26. P. 758. https://doi.org/10.1007/978-3-030-03146-6_86
- Cereto-Massagué A., Ojeda M.J., Valls C. et al. // Methods. 2015. V. 71. P. 58. https://doi.org/10.1016/j.ymeth.2014.08.005
- Rogers D., Hahn M. // J. Chem. Inf. Model. 2010. V. 50. № 5. P. 742. https://doi.org/10.1021/ci100050t
- Xiong J., Li Z., Wang G. et al. // Bioinformatics / Ed. by Z. Lu. 2022. V. 38. № 3. P. 792. https://doi.org/10.1093/bioinformatics/btab714
- Pan X., Wang H., Li C. et al. // J. Chem. Inf. Model. 2021. V. 61. № 7. P. 3159. https://doi.org/10.1021/acs.jcim.1c00075
- Reza Ghiasi, Zamani A., Shamami M.K. // Russ. J. Phys. Chem. A. 2019. V. 93. № 8. P. 1537. https://doi.org/10.1134/S0036024419080247
- Prasad S., Huang J., Zeng Q. et al. // J. Comput. Aided Mol. Des. 2018. V. 32. № 10. P. 1191. https://doi.org/10.1007/s10822-018-0167-1
- Pracht P., Wilcken R., Udvarhelyi A. et al. // J. Comput. Aided Mol. Des. 2018. V. 32. № 10. P. 1139. https://doi.org/10.1007/s10822-018-0145-7
- Pedregosa F., Varoquaux G., Gramfort A. et al. Scikit-learn: Machine Learning in Python: arXiv:1201.0490. arXiv, 2018. https://arxiv.org/abs/1201.0490
- Bento A.P., Hersey A., Félix E. et al. // J. Cheminform. 2020. V. 12. № 1. P. 51. https://doi.org/10.1186/s13321-020-00456-1
- Chang C.-C., Lin C.-J. // ACM Trans. Intell. Syst. Technol. 2011. V. 2. № 3. P. 1. https://doi.org/10.1145/1961189.1961199
- Willighagen E.L., Mayfield J.W., Alvarsson J. et al. // J. Cheminform. 2017. V. 9. № 1. P. 33. https://doi.org/10.1186/s13321-017-0220-4
- https://github.com/czodrowskilab/Machine-learning-meets-pKa
- Heller S., McNaught A., Stein S. et al. // J. Cheminform. 2013. V. 5. № 1. P. 7. https://doi.org/10.1186/1758-2946-5-7
- Matyushin D.D., Buryak A.K. // IEEE Access. 2020. V. 8. P. 223140. https://doi.org/10.1109/ACCESS.2020.3045047