Reduced amino acid alphabet-based encoding and its impact on modeling influenza antigenic evolution
- Authors: Forghani M.1, Firstkov A.L.2, Alyannezhadi M.M.3, Danilenko D.M.4, Komissarov A.B.4
-
Affiliations:
- N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences (IMM UB RAS)
- N.N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences (IMM UB RAS)
- University of Science and Technology of Mazandaran
- Smorodintsev Research Institute of Influenza, Ministry of Health of the Russian Federation
- Issue: Vol 12, No 5 (2022)
- Pages: 837-849
- Section: ORIGINAL ARTICLES
- URL: https://journals.rcsi.science/2220-7619/article/view/119100
- DOI: https://doi.org/10.15789/2220-7619-RAA-1968
- ID: 119100
Cite item
Full Text
Abstract
Currently, vaccination is one of the most efficient ways to control and prevent influenza infection. Vaccine production largely relies on the results of laboratory assays, including hemagglutination inhibition and microneutralization assays, which are time-consuming and laborious. Viruses can escape from the immune response that results in the need to revise and update vaccines biannually. The hemagglutination inhibition assay can measure how effectively antibodies against a reference strain bind and block an antigen of the test strain. Various computer-aided models have been developed to optimize candidate vaccine strain selection. A general problem in modeling of antigenic evolution is the representation of genetic sequences for input into the research model. Our motivation stems from the well-known problem of encoding genetic information for modeling antigenic evolution. This paper introduces a two-fold encoding approach based on reduced amino acid alphabet and amino acid index databases called AAindex. We propose to apply a simplified amino acid alphabet in modeling of antigenic evolution. A simplified alphabet, also called a sub-alphabet or reduced amino acid alphabet, implies to use the 20 amino acids being clustered and divided into amino acid groups. The proposed encoding allows to redefine mutations termed for amino acid groups located in reduced alphabets. We investigated 40 reduced amino acid sets and their performance in modeling antigenic evolution. The experimental results indicate that the proposed reduced amino acid alphabets can achieve the performance of the standard alphabet in its accuracy. Moreover, these alphabets provide deeper insight into various aspects of the relationship between mutation and antigenic variation. By checking identified high-impact sites in the Influenza Research Database, we found that not only antigenic sites have a significant influence on antigenicity, but also other amino acids located in close proximity. The results indicate that all selected non-antigenic sites are related to immune responses. According to the Influenza Research Database, these have been experimentally determined to be T-cell epitopes, B-cell epitopes, and MHC-binding epitopes of different classes. This highlighted a caveat: while simulating antigenic evolution, the model should consider not only the genetic information on antigenic sites, but also that of neighboring positions, as they may indirectly impact antigenicity. Additionally, our findings indicate that structural and charge characteristics are the most beneficial in modeling antigenic evolution, which is in agreement with previous studies.
Full Text
##article.viewOnOriginalSite##About the authors
M. Forghani
N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences (IMM UB RAS)
Author for correspondence.
Email: majid.forqani@gmail.com
PhD (Physics and Mathematics), Researcher
Russian Federation, EkaterinburgA. L. Firstkov
N.N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences (IMM UB RAS)
Email: firstk121@gmail.com
Mathematician of the First Category
Russian Federation, EkaterinburgM. M. Alyannezhadi
University of Science and Technology of Mazandaran
Email: alyan.nezhadip@mazust.ac.ir
Doctor in Computer Science (Specialty: Artificial Intelligence), Associate Professor, Researcher and Lecturer
Iran, Islamic Republic of, BehshahrD. M. Danilenko
Smorodintsev Research Institute of Influenza, Ministry of Health of the Russian Federation
Email: daria.baibus@gmail.com
PhD (Biology), Deputy Director for Scientific Work, Head of the Department of Etiology and Epidemiology
Russian Federation, St.PetersburgA. B. Komissarov
Smorodintsev Research Institute of Influenza, Ministry of Health of the Russian Federation
Email: a.b.komissarov@gmail.com
Head of the Laboratory of Molecular Virology
Russian Federation, St.PetersburgReferences
- Andersen C.A., Brunak S. Representation of protein-sequence information by amino acid subalphabets. AI Magazine, 2004, vol. 25, no. 1, pp. 97–101. doi: 10.1609/aimag.v25i1.1750
- Arinaminpathy N., Grenfell B. Dynamics of glycoprotein charge in the evolutionary history of human influenza. PLoS One, 2010, vol. 5, no. 12: e15674. doi: 10.1371/journal.pone.0015674
- Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The protein data bank. Nucleic Acids Res., 2000, vol. 28, no. 1, pp. 235–242. doi: 10.1093/nar/28.1.235
- Burns A., Van der Mensbrugghe D., Timmer H. Evaluating the economic consequences of avian influenza. World Bank Washington, DC, 2006. 6 p.
- Cannata N., Toppo S., Romualdi C., Valle G. Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics, 2002, vol. 18, no. 8, pp. 1102–1108. doi: 10.1093/bioinformatics/18.8.1102
- Cui H., Wei X., Huang Y., Hu B., Fang Y., Wang J. Using multiple linear regression and physicochemical changes of amino acid mutations to predict antigenic variants of influenza A/H3N2 viruses. Biomed Mater. Eng., 2014, vol. 24, no. 6, pp. 3729–3735. doi: 10.3233/BME-141201
- De Brevern A.G. New assessment of a structural alphabet. In Silico Biol., 2005, vol. 5, no. 3, pp. 283–289.
- Edgar R.C. Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res., 2004, vol. 32, no. 1, pp. 380–385. doi: 10.1093/nar/gkh180
- Etchebest C., Benros C., Bornot A., Camproux A.C., De Brevern A.G. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur. Biophys. J., 2007, vol. 36, no. 8, pp. 1059–1069. doi: 10.1007/s00249-007-0188-5
- Forghani M., Khachay M. Convolutional neural network based approach to in silico non-anticipating prediction of antigenic distance for influenza virus. Viruses, 2020, vol. 12, no. 9: 1019. doi: 10.3390/v12091019
- Forghani M., Khachay M., AlyanNezhadi M.M. The impact of amino acid encoding on the prediction of antigenic variants. In: 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp. 1–5, 2020. doi: 10.1109/ICSPIS51611.2020.9349560
- Gamblin S.J., Haire L.F., Russell R.J., Stevens D.J., Xiao B., Ha Y., Vasisht N., Steinhauer D.A., Daniels R.S., Elliot A., Wiley D.C., Skehel J.J. The structure and receptor binding properties of the 1918 influenza hemagglutinin. Science, 2004, vol. 303, no. 5665, pp. 1838–1842. doi: 10.1126/science.1093155
- Gregory V., Harvey W., Daniels R.S., Reeve R., Whittaker L., Halai C., Douglas A., Gonsalves R., Skehel J.J., Hay A.J., McCauley J.W., Haydon D. Human former seasonal Influenza A (H1N1) haemagglutination inhibition data 1977–2009 from the WHO Collaborating Centre for Reference and Research on Influenza, London, UK. University of Glasgow, 2016. doi: 10.5525/gla.researchdata.289
- Huang Z.Z., Yu L., Huang P., Liang L.J., Guo Q. Charged amino acid variability related to N-glyco-sylation and epitopes in A/H3N2 influenza: hem-agglutinin and neuraminidase. PLoS One, 2017, vol. 12, no. 7: e0178231. doi: 10.1371/journal.pone.0178231
- Kawashima S., Pokarowski P., Pokarowska M., Kolinski A., Katayama T., Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res., 2007, vol. 36, suppl. 1, pp. D202–D205. doi: 10.1093/nar/gkm998
- Klingen T.R., Reimering S., Guzmán C.A., McHardy A.C. In silico vaccine strain prediction for human influenza viruses. Trends Microbiol., 2018, vol. 26, no. 2, pp. 119–131. doi: 10.1016/j.tim.2017.09.001
- Kobayashi Y., Suzuki Y. Compensatory evolution of net-charge in influenza A virus hemagglutinin. PLoS One, 2012, vol. 7, no. 7: E40422. doi: 10.1371/journal.pone.0040422
- Lee M.S., Chen J.S.E. Predicting antigenic variants of influenza A/H3N2 viruses. Emerg. Infect. Dis., 2004, vol. 10, no. 8, pp. 1385–1390. doi: 10.3201/eid1008.040107
- Lenckowski J., Walczak K. Simplifying amino acid alphabets using a genetic algorithm and sequence alignment. In: European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, 2007, pp. 122–131. doi: 10.1007/978-3-540-71783-6_12
- Li T., Fan K., Wang J., Wang W. Reduction of protein sequence complexity by residue grouping. Protein Eng., 2003, vol. 16, no. 5, pp. 323–330. doi: 10.1093/protein/gzg044
- Nanni L., Lumini A. A genetic approach for building different alphabets for peptide and protein classification. BMC Bioinformatics, 2008, vol. 9, no. 1, pp. 1–10. doi: 10.1186/1471-2105-9-45
- Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay É. Scikit-learn: machine learning in Python. J. Mach. Learn Res., 2011, vol. 12, pp. 2825–2830.
- Prlić A., Domingues F.S., Sippl M.J. Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng., 2000, vol. 13, no. 8, pp. 545–550. doi: 10.1093/protein/13.8.545
- Qiu J., Qiu T., Yang Y., Wu D., Cao Z. Incorporating structure context of HA protein to improve antigenicity calculation for influenza virus A/H3N2. Sci. Rep., 2016, vol. 6, no. 1, pp. 1–9. doi: 10.1038/srep31156
- Risler J.L., Delorme M.O., Delacroix H., Henaut A. Amino acid substitutions in structurally related proteins a pattern recognition approach: determination of a new and efficient scoring matrix. J. Mol. Biol., 1988, vol. 204, no. 4, pp. 1019–1029. doi: 10.1016/0022-2836(88)90058-7
- Schrödinger L.L.C. The PyMOL molecular graphics system, version 1.8, 2015.
- Smith D.J., Forrest S., Ackley D.H., Perelson A.S. Variable efficacy of repeated annual influenza vaccination. Proc. Natl. Acad. Sci. USA, 1999, vol. 96, no. 24, pp. 14001–14006. doi: 10.1073/pnas.96.24.14001
- Smith D.J., Lapedes A.S., De Jong J.C., Bestebroer T.M., Rimmelzwaan G.F., Osterhaus A.D., Fouchier R.A. apping the antigenic and genetic evolution of influenza virus. Science, 2004, vol. 305, no. 5682, pp. 371–376. doi: 10.1126/science.1097211
- Stephenson J.D., Freeland S.J. Unearthing the root of amino acid similarity. J. Mol. Evol., 2013, vol. 77, no. 4, pp. 159–169. doi: 10.1007/s00239-013-9565-0
- Su S., Fu X., Li G., Kerlin F., Veit M. Novel influenza D virus: epidemiology, pathology, evolution and biological characteristics. Virulence, 2017, vol. 8, no. 8, pp. 1580–1591. doi: 10.1080/21505594.2017.1365216
- Sylte M.J., Suarez D.L. Influenza neuraminidase as a vaccine antigen. In: Vaccines for Pandemic Influenza. Current Topics in Microbiology and Immunology. Eds.: R. Compans, W. Orenstein. Vol. 333. Berlin, Heidelberg: Springer, 2009, pp. 227–241. doi: 10.1007/978-3-540-92165-3_12
- Tomii K., Kanehisa M. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng., 1996, vol. 9, no. 1, pp. 27–36. doi: 10.1093/protein/9.1.27
- Tzarum N., de Vries R.P., Peng W., Thompson A.J., Bouwman K.M., McBride R., Yu W., Zhu X., Verheije M.H., Paulson J.C., Wilson I.A. The 150-loop restricts the host specificity of human H10N8 influenza virus. Cell Rep., 2017, vol. 19, no. 2, pp. 235–245. doi: 10.1016/j.celrep.2017.03.054
- Wang P., Zhu W., Liao B., Cai L., Peng L., Yang J. Predicting influenza antigenicity by matrix completion with antigen and antiserum similarity. Front. Microbiol., 2018, vol. 9: 2500. doi: 10.3389/fmicb.2018.02500
- Wikramaratna P.S., Sandeman M., Recker M., Gupta S. The antigenic evolution of influenza: drift or thrift? Philos Trans. R. Soc. Lond. B Biol. Sci., 2013, vol. 368, no. 1614: 20120200. doi: 10.1098/rstb.2012.0200
- World Health Organization. Influenza fact sheet: Overview = Aide-mémoire sur la grippe: Généralités. Weekly Epidemiological Record = Relevé épidémiologique hebdomadaire, 2003, vol. 78, no. 11, pp. 77–80.
- Yang H., Carney P.J., Chang J.C., Guo Z., Villanueva J.M., Stevens J. Structure and receptor binding preferences of recombinant human A (H3N2) virus hemagglutinins. Virology, 2015, vol. 477, pp. 18–31. doi: 10.1016/j.virol.2014.12.024
- Yang X.Y., Shi X.H., Meng X., Li X.L., Lin K., Qian Z.L., Feng K.Y., Kong X.Y., Cai Y.D. Classification of transcription factors using protein primary structure. Protein Pept. Lett., 2010, vol. 17, no. 7, pp. 899–908. doi: 10.2174/092986610791306670
- Yao Y., Li X., Liao B., Huang L., He P., Wang F., Yang J., Sun H., Zhao Y., Yang J. Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method. Sci. Rep., 2017, vol. 7, no. 1, pp. 1–10. doi: 10.1038/s41598-017-01699-z
- Zhang Y., Aevermann B.D., Anderson T.K., Burke D.F., Dauphin G., Gu Z., He S., Kumar S., Larsen C.N., Lee A.J., Li X., Macken C., Mahaffey C., Pickett B.E., Reardon B., Smith T., Stewart L., Suloway C., Sun G., Tong L., Vincent A.L., Walters B., Zaremba S., Zhao H., Zhou L., Zmasek C., Klem E.B., Scheuermann R.H. Influenza Research Database: an integrated bioinformatics resource for influenza virus research. Nucleic Acids Res., 2017, vol. 45, no. D1, pp. D466–D474. doi: 10.1093/nar/gkw857
- Zhang Z.H., Wang Z.H., Zhang Z.R., Wang Y.X. A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett., 2006, vol. 580, no. 26, pp. 6169–6174. doi: 10.1016/j.febslet.2006.10.017
- Zuo Y.C., Li Q.Z. Using reduced amino acid composition to predict defensin family and subfamily: integrating similarity measure and structural alphabet. Peptides, 2009, vol. 30, no. 10, pp. 1788–1793. doi: 10.1016/j.peptides.2009.06.032
Supplementary files
