Development of a predictive model for two- and three-component inorganic systems in aqueous solutions using spectral analysis

Cover Page

Cite item

Full Text

Abstract

This study presents an algorithm for analyzing spectral data through mathematical modeling, constructing prognostic models, and selecting optimal wavelength intervals for designing LED-based multisensor systems. The algorithm is implemented in Python and validated using experimental data from aqueous solutions of inorganic salts.
Key methodological aspects include:
– Application of multivariate calibration methods (PLS regression and multiple linear regression);
– Utilization of Shapley values to identify informative spectral wavelengths;
– Systematic enumeration to determine optimal wavelength intervals.
The developed model enables accurate prediction of two- and threecomponent systems in metal salt solutions using partial spectral data rather than full-spectrum analysis. Cross-validation demonstrates that:
– The model achieves comparable accuracy to full-spectrum approaches;
– The solution remains computationally efficient while maintaining predictive reliability.
The results confirm the model’s adequacy for quantitative spectral analysis, particularly in resource-constrained environments where partial spectral data acquisition is advantageous.

About the authors

Kirill Y. Massalov

National Engineering Physics Institute “MEPhI”

Author for correspondence.
Email: kirill.massalov@yandex.ru
ORCID iD: 0009-0003-6214-7470
https://www.mathnet.ru/person228575

Master’s Student; Senior Researcher; Dept. of Elementary Particle Physics; Institute of Nuclear Physics and Engineering1

Russian Federation, 115409, Moscow, Kashirskoe shosse, 31

Elena Y. Moshchenskaya

Samara State Technical University

Email: lmos@rambler.ru
ORCID iD: 0000-0002-1070-3151
https://www.mathnet.ru/person39351

Cand. Chem. Sci., Associate Professor; Associate Professor; Dept. of Analytical and Physical Chemistry2

Russian Federation, 443100, Samara, Molodogvardeyskaya st., 244

References

  1. Dubrovkin J. Data Compression in Spectroscopy. Cambridge Scholars Publ., 2022, 355 pp.
  2. Rodionova O. E. Chemometric approaches for analysis of large chemical data arrays, Ros. Khim. Zh., 2006, vol. 50, no. 2, pp. 128–144 (In Russian). EDN: HTUUSZ.
  3. Smilde A., Bro R., Geladi P. Multi-Way Analysis: Applications in the Chemical Sciences. Chichester, John Wiley & Sons, 2004, xiv+381 pp. DOI: https://doi.org/10.1002/0470012110.
  4. Bogomolov A. Yu. Optical multisensor systems in analytical spectroscopy, J. Anal. Chem., 2022, vol. 77, no. 3, pp. 277–294. EDN: YORSQC. DOI: https://doi.org/10.1134/S1061934822030030.
  5. Bogomolov A. Multivariate process trajectories: capture, resolution and analysis, Chemom. Intel. Lab. Syst., 2011, vol. 108, no. 1, pp. 49–63. DOI: https://doi.org/10.1016/j.chemolab.2011.02.005.
  6. Galyanin V., Melenteva A., Bogomolov A. Selecting optimal wavelength intervals for an optical sensor: A case study of milk fat and total protein analysis in the region 400–1100 nm, Sens. Actuat. B: Chem., 2015, vol. 218, pp. 97-104. EDN: UFYADR. DOI: https://doi.org/10.1016/j.snb.2015.03.101.
  7. Moshchenskaya E. Yu., Stifatov B. M. Modeling "composition-property" diagrams for the "aluminum-silicon" system, J. Sib. Fed. Univ. Chem., 2023, vol. 16, no. 1, pp. 107–115 (In Russian). EDN: JWRAGD.
  8. Moshchenskaya E. Yu., Stifatov B. M. Investigation of the possibility of using theoretical modeling methods to determine the eutectic composition of binary alloys, Vestn. Tversk. Gos. Univ., Ser. Khimiia, 2021, no. 3, pp. 105–122 (In Russian). EDN: JDZAEI. DOI: https://doi.org/10.26456/vtchem2021.3.12.
  9. Holland P. W., Welsch R. E. Robust regression using iteratively reweighted least-squares, Commun. Stat–Theor. M., 1977, vol. 6, no. 9, pp. 813–827. DOI: https://doi.org/10.1080/03610927708827533.
  10. Wegelin J. A. A Survey of Partial Least Squares (PLS) Methods, with Emphasis on the Two-Block Case, Technical Report 371. Washington, Univ. of Washington, 2000, 44 pp. https://stat.uw.edu/research/tech-reports/survey-partial-least-squares-plsmethods-emphasis-two-block-case.
  11. Pedregosa F., Varoquaux G., Gramfort A., et. al. Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 2011, vol. 12, pp. 2825–2830.
  12. Lundberg S. M., Lee S.-I. A unified approach to interpreting model predictions, In: Proc. Intern. Conf. Neural Inform. Proces. Systems, 2017, pp. 4768–4777, arXiv: 1705.07874 [cs.AI]. DOI: https://doi.org/10.48550/arXiv.1705.07874.
  13. de Myttenaere A., Golden B., Le Grand B., Rossi F. Mean Absolute Percentage Error for regression models, Neurocomputing, 2016, vol. 192, pp. 38-48. DOI: https://doi.org/10.1016/j.neucom.2015.12.114.

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Figure 1. Approximation of standard deviation of Shapley values and optimum intervals for solution with nickel (a), cobalt (b), and copper (c) ions

Download (357KB)
3. Figure 2. Mean absolute percentage error (MAPE) for the model built using selected spectral intervals (left) and the model utilizing the full spectrum (right) for nickel (a), cobalt (b), and copper (c) ions

Download (401KB)

Copyright (c) 2025 Authors; Samara State Technical University (Compilation, Design, and Layout)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).