Analysis of coverage of Alu repeats by aligned genomic reads
- Authors: Tamazian G.S1, Kanapin A.A1, Samsonova A.A1
-
Affiliations:
- Institute of Translational Biomedicine, St. Petersburg State University
- Issue: Vol 68, No 3 (2023)
- Pages: 496-500
- Section: Articles
- URL: https://journals.rcsi.science/0006-3029/article/view/144450
- DOI: https://doi.org/10.31857/S0006302923030109
- EDN: https://elibrary.ru/FRPXRZ
- ID: 144450
Cite item
Abstract
Alu repeats occupy a notable part of the human genome and greatly affect processes related to genome integrity maintenance. One of the basic methods for studying variation in a genome, including Alu repeats is genome sequencing followed by mapping the sequenced reads to a reference genome sequence. The key feature of the read alignment is the depth of reference genome region coverage by mapped reads. In this paper, a new method is proposed for analyzing the coverage of Alu repeats and their flanking regions by whole-genome sequencing reads and the distribution of mean coverage in two aforementioned region types is explored.
Keywords
About the authors
G. S Tamazian
Institute of Translational Biomedicine, St. Petersburg State UniversitySt. Petersburg, Russia
A. A Kanapin
Institute of Translational Biomedicine, St. Petersburg State UniversitySt. Petersburg, Russia
A. A Samsonova
Institute of Translational Biomedicine, St. Petersburg State University
Email: a.samsonova@spbu.ru
St. Petersburg, Russia
References
- M. A. Batzer and P. L. Deininger, Nat. Rev. Genet., 3 (5), 370 (2002).
- F. Hormozdiari, M. K. Konkel, J. Prado-Martinez. et al., Proc. Natl. Acad. Sci. USA, 110 (33), 13457 (2013).
- E. S. Lander, L. M. Linton, B. Birren, et al., Nature, 409 (6822), 860 (2001).
- J. C. Venter, M. D. Adams, E. W. Myers, et al., Science, 291 (5507), 1304 (2001).
- F. C. Chen, Y. Z. Chen, and T. J. Chuang, Bioinformatics, 25 (11), 1419 (2009).
- J. M. Chen, E. Masson, C. Le Marechal, et al., Cytogenet Genome Res, 123 (1-4), 102 (2008).
- P. Deininger, Genome Biol., 12 (12), 236 (2011).
- L. M. Payer, J. P. Steranka, W. R. Yang, et al., Proc. Natl. Acad. Sci. USA, 114 (20), E3984 (2017).
- S. Shen, L. Lin, J. J. Cai, et al., Proc. Natl. Acad. Sci. USA, 108 (7), 2837 (2011).
- I. Vorechovsky, Hum, Genet., 127 (2), 135 (2010).
- S. Pavlov, V. V. Gursky, M. Samsonova, et al., Life (Basel), 11 (11), 1209 (2021). doi: 10.3390/life11111209
- A. Smit, R. Hubley, and P. Green, RepeatMasker Open-4.0 (accessed 03/18/2022).
- H. Mao and H. Wang, Bioinformatics, 33 (5), 743 (2017).
- S. E. Staton and J. M. Burke, Bioinformatics, 31 (11), 1827 (2015).
- H. Li and R. Durbin, Bioinformatics, 25 (14), 1754 (2009).
- S. Fairley, E. Lowy-Gallego, E. Perry, et al., Nucl. Acids Res., 48 (D1), D941 (2020).
- H. Li, B. Handsaker, A. Wysoker, et al., Bioinformatics, 25 (16), 2078 (2009).
- J. K. Bonfield, J. Marshall, P. Danecek, et al., Gigascience, 10 (2), giab007 (2021). doi: 10.1093/giga-science/giab007
- G. Tamazian, N. Cherkasov, A. Kanapin, et al., in BGRS/SB-2022 (Novosibirsk, Russia, 2022), pp. 11211122.
- R Core Team, R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2022).
- L. Scrucca, M. Fop, T. B. Murphy, et al., The R Journal, 8 (1), 289 (2016).
- Broad Institute, Picard: A set of command line tools for manipulating high-throughput sequencing data (2022).
- A. R. Quinlan and I. M. Hall, Bioinformatics, 26 (6), 841 (2010).
- P. Danecek, J. K. Bonfield, J. Liddle, et al., Gigascience, 10 (2), giab008 (2021). doi: 10.1093/giga-science/giab008