Polarization- and CGR-based binary representations as identifiers of the nucleotide sequences in bioinformatics

Cover Page

Cite item

Full Text

Abstract

Purpose of this work is the comparative analysis of two approaches to the synthesis of two-dimensional binary identifiers of nucleotide sequences obtained using DNA sequencing of biological objects. Methods. One of the approaches is based on modeling the polarization-dependent diffraction of a coherent readout beam on a two-dimensional phase-modulating structure (phase screen) associated with the symbolic sequence obtained as a result of DNA sequencing. Another approach uses a two-dimensional representation of the symbolic sequence using a chaos game representation (CGR). To obtain a finite-element CGR mapping, it is fragmented into a given number of cells, ensuring acceptable sensitivity of the synthesized binary identifier to structural changes in the displayed sequence. Results. The comparative analysis was carried out using fragments of symbol sequences corresponding to various strains (Wuhan, Delta, Omicron) of the SarSCoV2 virus. In the course of the analysis, the correlation coefficients between the binary identifiers corresponding to various strains were obtained and compared with each other. Conclusion. It has been established that binary identifiers synthesized using the polarization encoding technique are characterized by significantly higher sensitivity to structural changes in the analyzed sequences and smaller sizes compared to CGR binary identifiers.

About the authors

Dmitry Aleksandrovich Zimnyakov

Yuri Gagarin State Technical University of Saratov

ORCID iD: 0000-0002-9787-7903
SPIN-code: 1918-5220
Scopus Author ID: 7005323820
ResearcherId: A-7951-2014
ul. Politechnicheskaya, 77, Saratov, 410054, Russia

Marina Vasilevna Alonova

Yuri Gagarin State Technical University of Saratov

ORCID iD: 0000-0001-7772-3985
Scopus Author ID: 56035731500
ResearcherId: AAB-1593-2021
ul. Politechnicheskaya, 77, Saratov, 410054, Russia

Anatolij Vladimirovich Skripal

Saratov State University

ORCID iD: 0000-0002-9080-0057
Scopus Author ID: 57255442300
ResearcherId: E-1327-2013
ul. Astrakhanskaya, 83, Saratov, 410012, Russia

Maksim Glebovich Inkin

Saratov State University

ORCID iD: 0000-0002-1580-5413
SPIN-code: 7323-2398
Scopus Author ID: 57202515018
ul. Astrakhanskaya, 83, Saratov, 410012, Russia

Sergey S Zaytsev

Саратовский государственный университет генетики, биотехнологии и инженерии им. Н.И. Вавилова

ул. Соколовая, 335, Саратов, Россия

Valentina Feodorova

Саратовский государственный университет генетики, биотехнологии и инженерии им. Н.И. Вавилова

ул. Соколовая, 335, Саратов, Россия

References

  1. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics. 2016;17(6):333–351. doi: 10.1038/nrg.2016.49.
  2. Neidle S, Sanderson M. Principles of Nucleic Acid Structure. Academic Press; 2021. 454 p.
  3. Randic M, Vracko M, Lers N, Plavsic D. Novel 2-D graphical representation of DNA sequences and their numerical characterization. Chemical Physics Letters. 2003;368(1–2):1–6. DOI: 10.1016/ S0009-2614(02)01784-0.
  4. Randic M, Vracko M, Nandy A, Basak SC. On 3-D graphical representation of DNA primary sequence and their numerical characterization. Journal of Chemical Information and Computer Sciences. 2000;40(5):1235–1244. doi: 10.1021/ci000034q.
  5. Xie G, Mo Z. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications. Journal of Theoretical Biology. 2011;269(1): 123–130. doi: 10.1016/j.jtbi.2010.10.018.
  6. Jafarzadeh N, Iranmanesh A. A novel graphical and numerical representation for analyzing DNA sequences based on codons. Match-Communications in Mathematical and Computer Chemistry. 2012;68(2):611–620.
  7. Jafarzadeh N, Iranmanesh A. C-curve: A novel 3D graphical representation of DNA sequence based on codons. Mathematical Biosciences. 2013;241(2):217–224. doi: 10.1016/j.mbs. 2012.11.009.
  8. Hamori E, Ruskin J. H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. Journal of Biological Chemistry. 1983;258(2):1318–1327. doi: 10.1016/S0021-9258(18)33196-X.
  9. Zhang CT, Zhang R, Ou HY. The Z-curve databases: A graphic representation of genome sequence. Bioinformatics. 2003;19(5):593–599. doi: 10.1093/bioinformatics/btg041.
  10. Yu ZG, Wang B. A time series model of CDS sequences in complete genome. Chaos Solitons Fractals. 2001;12(3):519–526. doi: 10.1016/S0960-0779(99)00208-8.
  11. Jeffrey HJ. Chaos game representation of gene structure. Nucleic Acids Research. 1990;18(8):2163– 2170. doi: 10.1093/nar/18.8.2163.
  12. Anitas EM. Small-angle scattering and multifractal analysis of DNA sequences. International Journal of Molecular Sciences. 2020;21(13):4651. doi: 10.3390/ijms21134651.
  13. Burma PK, Raj A, Deb JK, Brahmachari SK. Genome analysis: a new approach for visualization of sequence organization in genomes. Journal of Biosciences. 1992;17(4):395–411. DOI: 10.1007/ BF02720095.
  14. Huynen MA, Konings DAM, Hogeweg P. Equal G and C contents in histone genes indicate selection pressures on mRna secondary structure. Journal of Molecular Evolution. 1992;34(4):280– 291. doi: 10.1007/BF00160235.
  15. Hill KA, Schisler NJ, Singh SM. Chaos game representation of coding regions of human globin genes and alcohol dehydrogenase genes of phylogenetically divergent species. Journal of Molecular Evolution. 1992;35(3):261–269. doi: 10.1007/BF00178602.
  16. Almeida JS, Carrico JA, Maretzek A, Noble PA, Fletcher M. Analysis of genomic sequences by chaos game representation. Bioinformatics. 2001;17(5):429–437. doi: 10.1093/bioinformatics/ 17.5.429.
  17. Zimnyakov DA, Alonova MV, Skripal AnV, Zaitsev SS, Feodorova VA. Polarization analysis of gene sequence structures: Mapping of extreme local polarization states. Journal of Biomedical Photonics & Engineering. 2022;8(4):040302. doi: 10.18287/JBPE22.08.040302.
  18. Zimnyakov DA, Alonova MV, Skripal AnV, Dobdin SY, Feodorova VA. Quantification of the diversity in gene structures using the principles of polarization mapping. Current Issues in Molecular Biology. 2023;45(2):1720–1740. doi: 10.3390/cimb45020111.
  19. Ulyanov SS, Ulianova OV, Zaytsev SS, Saltykov YV, Feodorova VA. Statistics on gene-based laser speckles with a small number of scatterers: implications for the detection of polymorphism in the Chlamydia trachomatis omp1 gene. Laser Physics Letters. 2018;15:045601. doi: 10.1088/1612- 202X/aaa11c.
  20. Rak A, Isakova-Sivak I, Rudenko L. Overview of Nucleocapsid-Targeting Vaccines against COVID-19. Vaccines. 2023;11(12):1810. doi: 10.3390/vaccines11121810.
  21. Telenti A, Hodcroft EB, Robertson DL. The Evolution and Biology of SARS-CoV-2 Variants. Cold Spring Harbor Perspectives in Medicine. 2022;12:a041390. doi: 10.1101/cshperspect.a041390.
  22. Bergmann CC, Silverman RH. COVID-19: coronavirus replication, pathogenesis, and therapeutic strategies. Cleveland Clinic Journal of Medicine. 2020;87:321—327 doi: 10.3949/ccjm.87a.20047.
  23. Shang J, Wan Y, Luo C, Ye G, Geng Q, Auerbach A, Li F. Cell entry mechanisms of SARS-CoV-2. Proceedings of the National Academy of Sciences. 2020;117:11727—11734. doi: 10.1073/pnas. 2003138117.
  24. Grobbelaar LM, Venter C, Vlok M, Ngoepe M, Laubscher GJ, Lourens PJ, Steenkamp J, Kell DB, Pretorius E. SARS-CoV-2 spike protein S1 induces fibrin (ogen) resistant to fibrinolysis: implications for microclot formation in COVID-19. Bioscience Reports. 2021;41(8):BSR20210611. doi: 10.1042/BSR20210611.
  25. Singh D, Yi SV. On the origin and evolution of SARS-CoV-2. Experimental & Molecular Medicine. 2021;53:537—547. doi: 10.1038/s12276-021-00604-z.
  26. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si HR, Zhu Y, Li B, Huang CL, Chen HD, Chen J, Luo Y, Guo H, Jiang RD, Liu MQ, Chen Y, Shen XR, Wang X, Zheng XS, Zhao K, Chen QJ, Deng F, Liu LL, Yan B, Zhan FX, Wang YY, Xiao GF, Shi ZL. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579(7798):270–273. doi: 10.1038/s41586-020-2012-7.
  27. Chakraborty C, Bhattacharya M, Chopra H, Bhattacharya P, Islam MA, Dhama K. Recently emerged omicron subvariant BF.7 and its R346T mutation in the RBD region reveal increased transmissibility and higher resistance to neutralization antibodies: need to understand more under the current scenario of rising cases in China and fears of driving a new wave of the COVID-19 pandemic. International Journal of Surgery. 2023;109(4):1037–1040. doi: 10.1097/JS9.00000 00000000219.
  28. GISAID: Official hCoV-19 Reference Sequence. Acc. ID: EPI_ISL_402124. Available online: https://gisaid.org/wiv04/.
  29. GISAID: Official hCoV-19 Reference Sequence. Acc. ID: EPI_ISL_2552101. Available online: https://gisaid.org/wiv04/.
  30. GISAID: Official hCoV-19 Reference Sequence. Acc. ID: EPI_ISL_9991311. Available online: https://gisaid.org/wiv04/.
  31. Goodman JW. Introduction to Fourier Optics, 4th ed. New York: Macmillan Learning; 2017. 491 p.
  32. Bracewell R. The Fourier Transform and Its Applications. New York: McGraw Hill; 1986. 474 p.
  33. Chipman R, Lam WST, Young G. Polarized Light and Optical Systems (Optical Sciences and Applications of Light). Boca-Raton: CRC Press; 2018. 1036 p.
  34. Anitas EM. Fractal analysis of DNA sequences using frequency chaos game representation and small-angle scattering. International Journal of Molecular Sciences. 2022;23(3):1847. DOI: 10. 3390/ijms23031847.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies