Reduced amino acid alphabet-based encoding and its impact on modeling influenza antigenic evolution

M. Forghani; Форгани М.; A. L. Firstkov; Фирстков А. Л.; M. M. Alyannezhadi; Аляннеджади М. M.; D. M. Danilenko; Даниленко Д. М.; A. B. Komissarov; Комиссаров А. Б.

doi:10.15789/2220-7619-RAA-1968

Reduced amino acid alphabet-based encoding and its impact on modeling influenza antigenic evolution

Authors: Forghani M.¹, Firstkov A.L.², Alyannezhadi M.M.³, Danilenko D.M.⁴, Komissarov A.B.⁴
Affiliations:
1. N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences (IMM UB RAS)
2. N.N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences (IMM UB RAS)
3. University of Science and Technology of Mazandaran
4. Smorodintsev Research Institute of Influenza, Ministry of Health of the Russian Federation
Issue: Vol 12, No 5 (2022)
Pages: 837-849
Section: ORIGINAL ARTICLES
URL: https://journals.rcsi.science/2220-7619/article/view/119100
DOI: https://doi.org/10.15789/2220-7619-RAA-1968
ID: 119100

Cite item

Full Text

Abstract
Full Text
About the authors
References
Supplementary files
Statistics

Abstract

Currently, vaccination is one of the most efficient ways to control and prevent influenza infection. Vaccine production largely relies on the results of laboratory assays, including hemagglutination inhibition and microneutralization assays, which are time-consuming and laborious. Viruses can escape from the immune response that results in the need to revise and update vaccines biannually. The hemagglutination inhibition assay can measure how effectively antibodies against a reference strain bind and block an antigen of the test strain. Various computer-aided models have been developed to optimize candidate vaccine strain selection. A general problem in modeling of antigenic evolution is the representation of genetic sequences for input into the research model. Our motivation stems from the well-known problem of encoding genetic information for modeling antigenic evolution. This paper introduces a two-fold encoding approach based on reduced amino acid alphabet and amino acid index databases called AAindex. We propose to apply a simplified amino acid alphabet in modeling of antigenic evolution. A simplified alphabet, also called a sub-alphabet or reduced amino acid alphabet, implies to use the 20 amino acids being clustered and divided into amino acid groups. The proposed encoding allows to redefine mutations termed for amino acid groups located in reduced alphabets. We investigated 40 reduced amino acid sets and their performance in modeling antigenic evolution. The experimental results indicate that the proposed reduced amino acid alphabets can achieve the performance of the standard alphabet in its accuracy. Moreover, these alphabets provide deeper insight into various aspects of the relationship between mutation and antigenic variation. By checking identified high-impact sites in the Influenza Research Database, we found that not only antigenic sites have a significant influence on antigenicity, but also other amino acids located in close proximity. The results indicate that all selected non-antigenic sites are related to immune responses. According to the Influenza Research Database, these have been experimentally determined to be T-cell epitopes, B-cell epitopes, and MHC-binding epitopes of different classes. This highlighted a caveat: while simulating antigenic evolution, the model should consider not only the genetic information on antigenic sites, but also that of neighboring positions, as they may indirectly impact antigenicity. Additionally, our findings indicate that structural and charge characteristics are the most beneficial in modeling antigenic evolution, which is in agreement with previous studies.

Keywords

AAindex, antigenic evolution, hemagglutinin, influenza, modeling, reduced amino acid alphabet

Full Text

##article.viewOnOriginalSite##

About the authors

M. Forghani

N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences (IMM UB RAS)

Author for correspondence.
Email: majid.forqani@gmail.com

PhD (Physics and Mathematics), Researcher

Russian Federation, Ekaterinburg

A. L. Firstkov

N.N. Krasovskii Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences (IMM UB RAS)

Email: firstk121@gmail.com

Mathematician of the First Category

Russian Federation, Ekaterinburg

M. M. Alyannezhadi

University of Science and Technology of Mazandaran

Email: alyan.nezhadip@mazust.ac.ir

Doctor in Computer Science (Specialty: Artificial Intelligence), Associate Professor, Researcher and Lecturer

Iran, Islamic Republic of, Behshahr

D. M. Danilenko

Smorodintsev Research Institute of Influenza, Ministry of Health of the Russian Federation

Email: daria.baibus@gmail.com

PhD (Biology), Deputy Director for Scientific Work, Head of the Department of Etiology and Epidemiology

Russian Federation, St.Petersburg

A. B. Komissarov

Smorodintsev Research Institute of Influenza, Ministry of Health of the Russian Federation

Email: a.b.komissarov@gmail.com

Head of the Laboratory of Molecular Virology

Russian Federation, St.Petersburg

References

Andersen C.A., Brunak S. Representation of protein-sequence information by amino acid subalphabets. AI Magazine, 2004, vol. 25, no. 1, pp. 97–101. doi: 10.1609/aimag.v25i1.1750
Arinaminpathy N., Grenfell B. Dynamics of glycoprotein charge in the evolutionary history of human influenza. PLoS One, 2010, vol. 5, no. 12: e15674. doi: 10.1371/journal.pone.0015674
Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I.N., Bourne P.E. The protein data bank. Nucleic Acids Res., 2000, vol. 28, no. 1, pp. 235–242. doi: 10.1093/nar/28.1.235
Burns A., Van der Mensbrugghe D., Timmer H. Evaluating the economic consequences of avian influenza. World Bank Washington, DC, 2006. 6 p.
Cannata N., Toppo S., Romualdi C., Valle G. Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics, 2002, vol. 18, no. 8, pp. 1102–1108. doi: 10.1093/bioinformatics/18.8.1102
Cui H., Wei X., Huang Y., Hu B., Fang Y., Wang J. Using multiple linear regression and physicochemical changes of amino acid mutations to predict antigenic variants of influenza A/H3N2 viruses. Biomed Mater. Eng., 2014, vol. 24, no. 6, pp. 3729–3735. doi: 10.3233/BME-141201
De Brevern A.G. New assessment of a structural alphabet. In Silico Biol., 2005, vol. 5, no. 3, pp. 283–289.
Edgar R.C. Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res., 2004, vol. 32, no. 1, pp. 380–385. doi: 10.1093/nar/gkh180
Etchebest C., Benros C., Bornot A., Camproux A.C., De Brevern A.G. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur. Biophys. J., 2007, vol. 36, no. 8, pp. 1059–1069. doi: 10.1007/s00249-007-0188-5
Forghani M., Khachay M. Convolutional neural network based approach to in silico non-anticipating prediction of antigenic distance for influenza virus. Viruses, 2020, vol. 12, no. 9: 1019. doi: 10.3390/v12091019
Forghani M., Khachay M., AlyanNezhadi M.M. The impact of amino acid encoding on the prediction of antigenic variants. In: 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp. 1–5, 2020. doi: 10.1109/ICSPIS51611.2020.9349560
Gamblin S.J., Haire L.F., Russell R.J., Stevens D.J., Xiao B., Ha Y., Vasisht N., Steinhauer D.A., Daniels R.S., Elliot A., Wiley D.C., Skehel J.J. The structure and receptor binding properties of the 1918 influenza hemagglutinin. Science, 2004, vol. 303, no. 5665, pp. 1838–1842. doi: 10.1126/science.1093155
Gregory V., Harvey W., Daniels R.S., Reeve R., Whittaker L., Halai C., Douglas A., Gonsalves R., Skehel J.J., Hay A.J., McCauley J.W., Haydon D. Human former seasonal Influenza A (H1N1) haemagglutination inhibition data 1977–2009 from the WHO Collaborating Centre for Reference and Research on Influenza, London, UK. University of Glasgow, 2016. doi: 10.5525/gla.researchdata.289
Huang Z.Z., Yu L., Huang P., Liang L.J., Guo Q. Charged amino acid variability related to N-glyco-sylation and epitopes in A/H3N2 influenza: hem-agglutinin and neuraminidase. PLoS One, 2017, vol. 12, no. 7: e0178231. doi: 10.1371/journal.pone.0178231
Kawashima S., Pokarowski P., Pokarowska M., Kolinski A., Katayama T., Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res., 2007, vol. 36, suppl. 1, pp. D202–D205. doi: 10.1093/nar/gkm998
Klingen T.R., Reimering S., Guzmán C.A., McHardy A.C. In silico vaccine strain prediction for human influenza viruses. Trends Microbiol., 2018, vol. 26, no. 2, pp. 119–131. doi: 10.1016/j.tim.2017.09.001
Kobayashi Y., Suzuki Y. Compensatory evolution of net-charge in influenza A virus hemagglutinin. PLoS One, 2012, vol. 7, no. 7: E40422. doi: 10.1371/journal.pone.0040422
Lee M.S., Chen J.S.E. Predicting antigenic variants of influenza A/H3N2 viruses. Emerg. Infect. Dis., 2004, vol. 10, no. 8, pp. 1385–1390. doi: 10.3201/eid1008.040107
Lenckowski J., Walczak K. Simplifying amino acid alphabets using a genetic algorithm and sequence alignment. In: European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, 2007, pp. 122–131. doi: 10.1007/978-3-540-71783-6_12
Li T., Fan K., Wang J., Wang W. Reduction of protein sequence complexity by residue grouping. Protein Eng., 2003, vol. 16, no. 5, pp. 323–330. doi: 10.1093/protein/gzg044
Nanni L., Lumini A. A genetic approach for building different alphabets for peptide and protein classification. BMC Bioinformatics, 2008, vol. 9, no. 1, pp. 1–10. doi: 10.1186/1471-2105-9-45
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay É. Scikit-learn: machine learning in Python. J. Mach. Learn Res., 2011, vol. 12, pp. 2825–2830.
Prlić A., Domingues F.S., Sippl M.J. Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng., 2000, vol. 13, no. 8, pp. 545–550. doi: 10.1093/protein/13.8.545
Qiu J., Qiu T., Yang Y., Wu D., Cao Z. Incorporating structure context of HA protein to improve antigenicity calculation for influenza virus A/H3N2. Sci. Rep., 2016, vol. 6, no. 1, pp. 1–9. doi: 10.1038/srep31156
Risler J.L., Delorme M.O., Delacroix H., Henaut A. Amino acid substitutions in structurally related proteins a pattern recognition approach: determination of a new and efficient scoring matrix. J. Mol. Biol., 1988, vol. 204, no. 4, pp. 1019–1029. doi: 10.1016/0022-2836(88)90058-7
Schrödinger L.L.C. The PyMOL molecular graphics system, version 1.8, 2015.
Smith D.J., Forrest S., Ackley D.H., Perelson A.S. Variable efficacy of repeated annual influenza vaccination. Proc. Natl. Acad. Sci. USA, 1999, vol. 96, no. 24, pp. 14001–14006. doi: 10.1073/pnas.96.24.14001
Smith D.J., Lapedes A.S., De Jong J.C., Bestebroer T.M., Rimmelzwaan G.F., Osterhaus A.D., Fouchier R.A. apping the antigenic and genetic evolution of influenza virus. Science, 2004, vol. 305, no. 5682, pp. 371–376. doi: 10.1126/science.1097211
Stephenson J.D., Freeland S.J. Unearthing the root of amino acid similarity. J. Mol. Evol., 2013, vol. 77, no. 4, pp. 159–169. doi: 10.1007/s00239-013-9565-0
Su S., Fu X., Li G., Kerlin F., Veit M. Novel influenza D virus: epidemiology, pathology, evolution and biological characteristics. Virulence, 2017, vol. 8, no. 8, pp. 1580–1591. doi: 10.1080/21505594.2017.1365216
Sylte M.J., Suarez D.L. Influenza neuraminidase as a vaccine antigen. In: Vaccines for Pandemic Influenza. Current Topics in Microbiology and Immunology. Eds.: R. Compans, W. Orenstein. Vol. 333. Berlin, Heidelberg: Springer, 2009, pp. 227–241. doi: 10.1007/978-3-540-92165-3_12
Tomii K., Kanehisa M. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng., 1996, vol. 9, no. 1, pp. 27–36. doi: 10.1093/protein/9.1.27
Tzarum N., de Vries R.P., Peng W., Thompson A.J., Bouwman K.M., McBride R., Yu W., Zhu X., Verheije M.H., Paulson J.C., Wilson I.A. The 150-loop restricts the host specificity of human H10N8 influenza virus. Cell Rep., 2017, vol. 19, no. 2, pp. 235–245. doi: 10.1016/j.celrep.2017.03.054
Wang P., Zhu W., Liao B., Cai L., Peng L., Yang J. Predicting influenza antigenicity by matrix completion with antigen and antiserum similarity. Front. Microbiol., 2018, vol. 9: 2500. doi: 10.3389/fmicb.2018.02500
Wikramaratna P.S., Sandeman M., Recker M., Gupta S. The antigenic evolution of influenza: drift or thrift? Philos Trans. R. Soc. Lond. B Biol. Sci., 2013, vol. 368, no. 1614: 20120200. doi: 10.1098/rstb.2012.0200
World Health Organization. Influenza fact sheet: Overview = Aide-mémoire sur la grippe: Généralités. Weekly Epidemiological Record = Relevé épidémiologique hebdomadaire, 2003, vol. 78, no. 11, pp. 77–80.
Yang H., Carney P.J., Chang J.C., Guo Z., Villanueva J.M., Stevens J. Structure and receptor binding preferences of recombinant human A (H3N2) virus hemagglutinins. Virology, 2015, vol. 477, pp. 18–31. doi: 10.1016/j.virol.2014.12.024
Yang X.Y., Shi X.H., Meng X., Li X.L., Lin K., Qian Z.L., Feng K.Y., Kong X.Y., Cai Y.D. Classification of transcription factors using protein primary structure. Protein Pept. Lett., 2010, vol. 17, no. 7, pp. 899–908. doi: 10.2174/092986610791306670
Yao Y., Li X., Liao B., Huang L., He P., Wang F., Yang J., Sun H., Zhao Y., Yang J. Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method. Sci. Rep., 2017, vol. 7, no. 1, pp. 1–10. doi: 10.1038/s41598-017-01699-z
Zhang Y., Aevermann B.D., Anderson T.K., Burke D.F., Dauphin G., Gu Z., He S., Kumar S., Larsen C.N., Lee A.J., Li X., Macken C., Mahaffey C., Pickett B.E., Reardon B., Smith T., Stewart L., Suloway C., Sun G., Tong L., Vincent A.L., Walters B., Zaremba S., Zhao H., Zhou L., Zmasek C., Klem E.B., Scheuermann R.H. Influenza Research Database: an integrated bioinformatics resource for influenza virus research. Nucleic Acids Res., 2017, vol. 45, no. D1, pp. D466–D474. doi: 10.1093/nar/gkw857
Zhang Z.H., Wang Z.H., Zhang Z.R., Wang Y.X. A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett., 2006, vol. 580, no. 26, pp. 6169–6174. doi: 10.1016/j.febslet.2006.10.017
Zuo Y.C., Li Q.Z. Using reduced amino acid composition to predict defensin family and subfamily: integrating similarity measure and structural alphabet. Peptides, 2009, vol. 30, no. 10, pp. 1788–1793. doi: 10.1016/j.peptides.2009.06.032

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

2. Figure 1. General scheme of the computational pipeline

Download (178KB)

Indexing metadata

3. Figure 2. Generation of the pseudo-AAindex1 database from the hydrophobicity index

Download (94KB)

Indexing metadata

4. Figure 5. Explained variance ratios for PCA analysis components

Download (37KB)

Indexing metadata

5. Figure 3. Visualization of high-impact sites on the surface of hemagglutinin protein by PyMOL [26] Note. Top — H1 protein (PDB ID: 1RUY [3, 12]). Bottom — H3 protein (PDB ID: 5THF [3, 33]).

Download (266KB)

Indexing metadata

6. Figure 4. Correlation matrix of 11 unique AAindex1 entries from Table 5

Download (115KB)

Indexing metadata

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register