Genetic Technologies and Methods of Combinatorial Chemistry and Biology in the Study of Biological Processes

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

This article provides a comprehensive review of significant advancements in the practical application of large language model (LLM) algorithms to contemporary problems in structural bioinformatics. The discussion focuses on several demonstrated successes of LLM implementations, including their use in predicting antigen surface epitopes, assessing the antigen-binding capabilities of specific CDRH3 fragments, and forecasting antibody cross-reactivity patterns. Particular attention is given to concrete examples where LLMs have been successfully employed for identifying hemagglutinin-binding antibodies against influenza virus, predicting the effects of point mutations, and improving the accuracy of protein sequence alignments. The analysis further examines critical limitations inherent in current LLM approaches, with specific emphasis on challenges related to model weight interpretability, constraints imposed by training dataset characteristics, and the substantial computational resources required for effective model training.

About the authors

A. G. Gabibov

Shemyakin-Ovchinnikov Institute of bioorganic chemistry Russian Academy of Sciences; Lomonosov Moscow State University

Moscow, 117997 Russia; Moscow, 119991 Russia

V. D. Knorre

Shemyakin-Ovchinnikov Institute of bioorganic chemistry Russian Academy of Sciences

Email: vera.knorre@gmail.com
Moscow, 117997 Russia

Ya. V. Solov’ev

Shemyakin-Ovchinnikov Institute of bioorganic chemistry Russian Academy of Sciences

Moscow, 117997 Russia

References

  1. Larionova T.D., Bastola S., Aksinina T.E. et al. Alternative RNA splicing modulates ribosomal composition and determines the spatial phenotype of glioblastoma cells // Nat. Cell Biol. 2022. V. 24. № 10. P. 1541–1557. https://doi.org/10.1038/s41556-022-00994-w
  2. Orlov E.E., Nesterenko A.M., Korotkova D.D. et al. Targeted search for scaling genes reveals matrixmetalloproteinase 3 as a scaler of the dorsal-ventral pattern in Xenopus laevis embryos // Dev. Cell. 2022. V. 57. № 1. P. 95–111. https://doi.org/10.1016/j.devecl.2021.11.021
  3. Wang Y., Ly H., Lei R. et al. An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies // Immunity. 2024. V. 57. № 10. P. 2453–2465. https://doi.org/10.1016/j.immuni.2024.07.022
  4. Mason D.M., Friedensohn S., Weber C.R. et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning // Nat. Biomed. Eng. 2021. V. 5. № 6. P. 600–612. https://doi.org/10.1038/s41551-021-00699-9
  5. Smirnov I.V., Golovin A.V., Chatziefthimiou S.D. Robotic OM/MM-driven maturation of antibody combining sites // Sci. Adv. 2016. V. 2. № 10. https://doi.org/10.1126/sciadv.1501695
  6. Robert P.A., Akbar R., Frank R. et al. Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for antibody specificity prediction // Nat. Comput. Sci. 2022. V. 2. № 12. P. 845–865. https://doi.org/10.1038/s43588-022-00372-4
  7. Greiff V., Menzel U., Milo E. et al. Systems analysis reveals high genetic and antigen-driven predetermination of antibody repertoires throughout B cell development // Cell Rep. 2017. V. 19. № 7. P. 1467–1478. https://doi.org/10.1016/j.celrep.2017.04.054
  8. Éliás S., Wrzodek C., Deane Ch.M. et al. Prediction of polyspecificity from antibody sequence data by machine learning // Front. Bioinform. 2024. V. 8. № 3. https://doi.org/10.3389/fbinf.2023.1286883
  9. Bravi B. Development and use of machine learning algorithms in vaccine target selection // NPJ Vaccines. 2024. V. 9. № 1. P. 15. https://doi.org/10.1038/s41541-023-00795-8
  10. Jumper J., Evans R., Pritzel A. et al. Highly accurate protein structure prediction with AlphaFold // Nature. 2021. V. 596. № 7873. P. 583–589. https://doi.org/10.1038/s41586-021-03819-2
  11. Kulikova A.V., Diaz D.J., Chen T. et al. Two sequence- and two structure-based ML models have learned different aspects of protein biochemistry // Sci. Rep. 2023. V. 13. № 1. P. 13280. https://doi.org/10.1038/s41598-023-40247-w
  12. Wang K., Zeng X., Zhou J. et al. BERT-TFBS: A novel BERT-based model for predicting transcription factor binding sites by transfer learning // Brief Bioinform. 2024. V. 25. № 3. https://doi.org/10.1093/bib/bbae195
  13. Abdullahi T., Singh R., Eickhoff C. Learning to make rare and complex diagnoses with generative AI assistance: Qualitative study of popular large language models // JMIR Med. Educ. 2024. V. 13. № 10. https://doi.org/10.2196/51391
  14. Lupo U., Sgarbossa D., Bitbol A.-F. Protein language models trained on multiple sequence alignments learn phylogenetic relationships // Nat. Commun. 2022. V. 13. № 1. P. 6298. https://doi.org/10.1038/s41467-022-34032-y
  15. Jumper J., Evans R., Pritzel A. et al. Highly accurate protein structure prediction with AlphaFold // Nature. 2021. V. 596. № 7873. P. 583–589. https://doi.org/10.1038/s41586-021-03819-2
  16. Kroll A., Ranjan S., Martin K.M. et al. A general model to predict small molecule substrates of enzymes based on machine and deep learning // Nat. Commun. 2023. V. 14. № 1. P. 2787. https://doi.org/10.1038/s41467-023-38347-2
  17. Clark T., Subramanian V., Jayaraman A. et al. Enhancing antibody affinity through experimental sampling of non-deleterious CDR mutations predicted by machine learning // Commun. Chem. 2023. V. 6. P. 244. https://doi.org/10.1038/s42004-023-01037-7
  18. Robert Ph.A., Akbar R., Frank R. et al. Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for antibody specificity prediction // Nat. Comput. Sci. 2022. V. 2. № 12. P. 845–865. https://doi.org/10.1038/s43588-022-00372-4
  19. Marinov T.M., Abu-Shmais A.A., Janke A.K., Georgiev I.S. Design of antigen-specific antibody CDRH3 sequences using AI and germline-based templates // bioRxiv [Preprint]. 2024. https://doi.org/10.1101/2024.032.586241
  20. Pisetsky D.S. Pathogenesis of autoimmune disease // Nat. Rev. Nephrol. 2023. V. 8. P. 509–524. https://doi.org/10.1038/s41581-023-00720-1
  21. Mason D.M., Friedensohn S., Weber C.R. et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning // Nat. Biomed. Eng. 2021. V. 5. № 6. P. 600–612. https://doi.org/10.1038/s41551-021-00699-9

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2025 Russian Academy of Sciences

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).