Bioinformatic approaches for detection of fusion genes and trans-splicing products

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

Chimeric genes and transcripts can be biological markers as well as the reasons for tumor progression and development. Modern algorithms and high-throughput sequencing are the complementary clues to the question of the tumor origin and cancer detection as well as to the fundamental question of chimeric genes origin and their influence on molecular processes of the cell. A wide-range of algorithms for chimeric genes detection was developed, with various differences in computing speed, sensitivity, specificity, and focus on the experimental design. There exist three main types of bioinformatic approaches, which act according to the sequencing read length. Algorithms, which focus on short-read high-throughput sequencing (about 50–300 bр of read length) or long-read sequencing (about 5000–100000 bр of read length) exclusively or algorithms, which combine the results of both short and long-read sequencing. These algorithms are further subdivided into: 1) mapping-first approaches (STAR-Fusion, Arriba), which map reads to the genome or transcriptome directly and search the reads supporting the fused gene or transcript; 2) assembly-first approaches (Fusion-Bloom), which assemble the genome or transcriptome from the overlapping reads, and then compare the results to the reference transcriptome or genome to find transcripts or genes not present in the reference and therefore raising questions; 3) pseudoalignment approaches, which do not make local alignment, but just search for the closest transcript subsequence to the reads seed, following the precomputed index for all reference transcripts and provides the results. This article describes the main classes of available software tools for chimeric gene detection, provides the characteristics of these programs, their advantages and disadvantages. To date the most resource intensive and slowest are still assembly-first algorithms. Mapping-first approaches are quite fast and rather accurate at fusion detection, still the fastest and resource-saving are the pseudoalignment algorithms, but, worth noting, that the quick search is carried out at the expense of chimeras search quality decrease.

Full Text

Restricted Access

About the authors

I. Y. Musatov

Moscow Institute of Physics and Technology; Institute for Personalized Oncology of World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Federal State Autonomous Educational Institution of Higher Education I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University)

Author for correspondence.
Email: musatov.mailbox@yandex.ru
Russian Federation, Institutskiy per. 9, Dolgoprudniy, 141701; ul. Trubetskaya 8/2, Moscow, 119048

M. I. Sorokin

Institute for Personalized Oncology of World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Federal State Autonomous Educational Institution of Higher Education I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University)

Email: musatov.mailbox@yandex.ru
Russian Federation, ul. Trubetskaya 8/2, Moscow, 119048

А. A. Buzdin

Moscow Institute of Physics and Technology; Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences; Endocrinology Research Centre

Email: musatov.mailbox@yandex.ru
Russian Federation, Institutskiy per. 9, Dolgoprudniy, 141701; ul. Miklukho-Maklaya 16/10, Moscow, 117997; ul. Dm. Ulyanova 11, Moscow, 117292

References

  1. Barresi V., Cosentini I., Scuderi C., Napoli S., Di Bella V., Spampinato G., Condorelli D.F. // Int. J. Mol. Sci. 2019. V. 20. P. E5252. https://doi.org/10.3390/ijms20215252
  2. Friedrich S., Sonnhammer E.L.L. // BMC Med. Genomics. 2020. V. 13. P. 110., https://doi.org/10.1186/s12920-020-00738-5
  3. Sun Y., Li H. // Genes (Basel). 2022. V. 13. P. 741. https://doi.org/10.3390/genes13050741
  4. Li Z., Qin F., Li H. // Curr. Opin. Genet. Dev. 2018. V. 48. P. 36–43. https://doi.org/10.1016/j.gde.2017.10.002
  5. Xie Z., Babiceanu M., Kumar S., Jia Y., Qin F., Barr F.G., Li H. // Proc. Natl. Acad. Sci. USA. 2016. V. 113. P. 13126–13131. https://doi.org/10.1073/pnas.1612734113
  6. Shtivelman E., Lifshitz B., Gale R.P., Canaani E. // Nature. 1985. V. 315. P. 550–554. https://doi.org/10.1038/315550a0
  7. Pagani I.S., Dang P., Kommers I.O., Goyne J.M., Nicola M., Saunders V.A., Braley, J., White D.L., Yeung D.T., Branford S., Hughes T.P., Ross D.M. // Haematologica. 2018. V. 103. P. 2026–2032. https://doi.org/10.3324/haematol.2018.189787
  8. Zhou T., Medeiros L.J., Hu S. // Curr. Hematol. Malig. Rep. 2018. V. 13. P. 435–445. https://doi.org/10.1007/s11899-018-0474-6
  9. Mertens F., Johansson B., Fioretos T., Mitelman F. // Nat. Rev. Cancer. 2015. V. 15. P. 371–381. https://doi.org/10.1038/nrc3947
  10. Sorokin M., Rabushko E., Rozenberg J.M., Mohammad T., Seryakov A., Sekacheva M., Buzdin A. // Ther. Adv. Med. Oncol. 2022. V. 14. P. 108. https://doi.org/10.1177/17588359221144108
  11. Salokas K., Dashi G., Varjosalo M. // Cancers (Basel). 2023. V. 15. P. 3678. https://doi.org/10.3390/cancers15143678
  12. Stransky N., Cerami E., Schalm S., Kim J.L., Lengauer C. // Nat. Commun. 2014. V. 5. P. 4846. https://doi.org/10.1038/ncomms5846
  13. Salokas K., Weldatsadik R.G., Varjosalo M. // Sci. Rep. 2020. V. 10. P. 14169. https://doi.org/10.1038/s41598-020-71040-8
  14. Chu Y.-H. // Surg. Pathol. Clin. 2023. V. 16. P. 57–73. https://doi.org/10.1016/j.path.2022.09.007
  15. Nagy Z., Jeselsohn R. // Front. Oncol. 2022. V. 12. P. 1037531. https://doi.org/10.3389/fonc.2022.1037531
  16. Apfelbaum A.A., Wrenn E.D., Lawlor E.R. // Front. Oncol. 2022. V. 12. P. 1044707. https://doi.org/10.3389/fonc.2022.1044707
  17. Bowling G.C., Rands M.G., Dobi A., Eldhose B. // Mol. Cancer Ther. 2023. V. 22. P. 168–178. https://doi.org/10.1158/1535-7163.MCT-22-0527
  18. Shen Z., Qiu B., Li L., Yang B., Li G. // Front. Oncol. 2022. V. 12. P. 1033484. https://doi.org/10.3389/fonc.2022.1033484
  19. Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. // Bioinformatics. 2013. V. 29. P. 15–21. https://doi.org/10.1093/bioinformatics/bts635
  20. Петров С.Н., Урошлев Л.А., Касьянов А.С., Макеев В.Ю. // Мол. биофизика. 2018. Т. 63. С. 421–429.
  21. Haas B.J., Dobin A., Li B., Stransky N., Pochet N., Regev A. // Genome Biol. 2019. V. 20. P. 213. https://doi.org/10.1186/s13059-019-1842-22
  22. Nurk S., Bankevich A., Antipov D., Gurevich A.A., Korobeynikov A., Lapidus A., Prjibelski A.D., Pyshkin A., Sirotkin A., Sirotkin Y., Stepanauskas R., Clingenpeel S.R., Woyke T., McLean J.S., Lasken R., Tesler G., Alekseyev M.A., Pevzner P.A. // J. Comput. Biol. 2013. V. 20. P. 714–737. https://doi.org/10.1089/cmb.2013.0084
  23. Benoit-Pilven C., Marchet C., Chautard E., Lima L., Lambert M.-P., Sacomoto G., Rey A., Cologne A., Terrone S., Dulaurier L., Claude J.-B., Bourgeois C.F., Auboeuf D., Lacroix V. // Sci. Rep. 2018. V. 8. P. 4307. https://doi.org/10.1038/s41598-018-21770-7
  24. Haas B., Dobin A., Stransky N., Li B., Yang X., Tickle T., Bankapur A., Ganote C., Doak T., Pochet N., Sun J., Wu C., Gingeras T., Regev A. // BioRxiv. 2017. P. 120295. https://doi.org/10.1101/120295
  25. Križanovic K., Echchiki A., Roux J., Šikic M. // Bioinformatics. 2018. V. 34. P. 748–754. https://doi.org/10.1093/bioinformatics/btx668
  26. Chen Y., Ye W., Zhang Y., Xu Y. // Nucleic Acids Res. 2015. V. 43. P. 7762–7768., https://doi.org/10.1093/nar/gkv784
  27. Conesa A., Madrigal P., Tarazona S., Gomez-Cabrero D., Cervera A., McPherson A., Szcześniak M.W., Gaffney D.J., Elo L.L., Zhang X., Mortazavi A. // Genome Biol. 2016. V. 17. P. 13. https://doi.org/10.1186/s13059-016-0881-8
  28. Uhrig S., Ellermann J., Walther T., Burkhardt P., Fröhlich M., Hutter B., Toprak U.H., Neumann O., Stenzinger A., Scholl C., Fröhling S., Brors B. // Genome Res. 2021. V. 31. P. 448–460. https://doi.org/10.1101/gr.257246.119
  29. Uhlén M., Fagerberg L., Hallström B.M., Lindskog C., Oksvold P., Mardinoglu A., Sivertsson Å., Kampf C., Sjöstedt E., Asplund A., Olsson I., Edlund K., Lundberg E., Navani S., Szigyarto C.A., Odeberg J., Djureinovic D., Takanen J.O., Hober S., Alm T., Pontén F. // Science. 2015. V. 347. P. 1260419. https://doi.org/10.1126/science.1260419
  30. Barbosa-Morais N.L., Irimia M., Pan Q., Xiong H.Y., Gueroussov S., Lee L.J., Slobodeniuc V., Kutter C., Watt S., Colak R., Kim T., Misquitta-Ali C.M., Wilson M.D., Kim P.M., Odom D.T., Frey B.J., Blencowe B.J. // Science. 2012. V. 338. P. 1587–1593. https://doi.org/10.1126/science.1230612
  31. Expression Atlas. RNA-Seq of human individual tissues and mixture of 16 tissues (Illumina Body Map). https://www.ebi.ac.uk/gxa/experiments/E-MTAB513/Results
  32. ENCODE Project Consortium // A User’s Guide to the Encyclopedia of DNA Elements (ENCODE) // PLoS Biol. 2011. V. 9. P. e1001046. https://doi.org/10.1371/journal.pbio.1001046
  33. Roadmap Epigenomics Consortium, Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., HeraviMoussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M.J., Amin V., Whitaker J.W., Schultz M.D., Ward L.D., Sarkar A., Quon G., Sandstrom R.S., Eaton M.L., Wu Y.-C., Kellis M. // Nature. 2015. V. 518. P. 317–330. https://doi.org/10.1038/nature14248
  34. Jahn A., Rump A., Widmann T.J., Heining C., Horak P., Hutter B., Paramasivam N., Uhrig S., Gieldon L., Drukewitz S., Kübler A., Bermudez M., Hackmann K., Porrmann J., Wagner J., Arlt M., Franke M., Fischer J., Kowalzyk Z., William D., Klink B. // Ann. Oncol. 2022. V. 33. P. 1186–1199. https://doi.org/10.1016/j.annonc.2022.07.008
  35. Arriba. Documentation: workflow, internal algorithm, visualization. https://arriba.readthedocs.io/en/latest/visualization/
  36. Chiu R., Nip K.M., Birol I. // Bioinformatics. 2020. V. 36. P. 2256–2257. https://doi.org/10.1093/bioinformatics/btz902
  37. Nip K.M., Chiu R., Yang C., Chu J., Mohamadi H., Warren R.L., Birol I. // BioRxiv. 2019. P. 701607. https://doi.org/10.1101/701607
  38. PAVFinder – Post Assembly Variants Finder (Github). https://github.com/bcgsc/pavfinder
  39. Quinlan A.R., Hall I.M. // Bioinformatics. 2010. V. 26. P. 841–842. https://doi.org/10.1093/bioinformatics/btq033
  40. Aaron R. Quinlan, Ira M. // Hall. Bedtools 2.31.0 // BEDTools_documentation. BEDPE Format. 2010. https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format
  41. Bray N.L., Pimentel H., Melsted P., Pachter L. // Nat. Biotechnol. 2016. V. 34. P. 525–527. https://doi.org/10.1038/nbt.3519
  42. Melsted P., Hateley S., Joseph I.C., Pimentel H., Bray N., Pachter L. // bioRxiv. 2017. P. 166322. https://doi.org/10.1101/166322
  43. Frankish A., Diekhans M., Jungreis I., Lagarde J., Loveland J.E., Mudge J.M., Sisu C., Wright J.C., Armstrong J., Barnes I., Berry A., Bignell A., Boix C., Carbonell Sala S., Cunningham F., Di Domenico T., Donaldson S., Fiddes I.T., García Girón C., Gonzalez J.M., Flicek P. // Nucleic Acids Res. 2021. V. 49. P. D916–D923. https://doi.org/10.1093/nar/gkaa1087
  44. Davidson N.M., Majewski I.J., Oshlack A. // Genome Med. 2015. V. 7. P. 43. https://doi.org/10.1186/s13073-015-0167-x
  45. Kent W.J. // Genome Res. 2002. V. 12. P. 656–664. https://doi.org/10.1101/gr.229202
  46. Schulz M.H., Zerbino D.R., Vingron M., Birney E. // Bioinformatics. 2012. V. 28. P. 1086–1092. https://doi.org/10.1093/bioinformatics/bts094
  47. Zerbino D.R., Birney E. // Genome Res. 2008. V. 18. P. 821–829. https://doi.org/10.1101/gr.074492.107
  48. Hon T., Mars K., Young G., Tsai Y.-C., Karalius J.W., Landolin J.M., Maurer N., Kudrna D., Hardigan M.A., Steiner C.C., Knapp S.J., Ware D., Shapiro B., Peluso P., Rank D.R. // Sci. Data. 2020. V. 7. P. 399. https://doi.org/10.1038/s41597-020-00743-4
  49. Logsdon G.A., Vollger M.R., Eichler E.E. // Nat. Rev. Genet. 2020. V. 21. P. 597–614. https://doi.org/10.1038/s41576-020-0236-x
  50. Kasianowicz J.J., Brandin E., Branton D., Deamer D.W. // Proc. Natl. Acad. Sci. USA. 1996. V. 93. P. 13770–13773. https://doi.org/10.1073/pnas.93.24.13770
  51. Davidson N.M., Chen Y., Sadras T., Ryland G.L., Blombery P., Ekert P.G., Göke J., Oshlack A. // Genome Biol. 2022. V. 23. P. 10. https://doi.org/10.1186/s13059-021-02588-5
  52. Sadedin S.P., Pope B., Oshlack A. // Bioinformatics. 2012. V. 28. P. 1525–1526. https://doi.org/10.1093/bioinformatics/bts167
  53. Li H. // Bioinformatics. 2018. V. 34. P. 3094–3100. https://doi.org/10.1093/bioinformatics/bty191
  54. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S., Barnes I., Bignell A., Boychenko V., Hunt T., Kay M., Mukherjee G., Rajan J., Despacio-Reyes G., Saunders G., Steward C., Hubbard T.J. // Genome Res. 2012. V. 22. P. 1760–1774. https://doi.org/10.1101/gr.135350.111
  55. Lei Q., Li C., Zuo Z., Huang C., Cheng H., Zhou R. // Genome Biol. Evol. 2016. V. 8. P. 562–577. https://doi.org/10.1093/gbe/evw025
  56. Molania R., Foroutan M., Gagnon-Bartsch J.A., Gandolfo L.C., Jain A., Sinha A., Olshansky G., Dobrovic A., Papenfuss A.T., Speed T.P. // Nat. Biotechnol. 2023. V. 41. P. 82–95. https://doi.org/10.1038/s41587-022-01440-w
  57. Dorney R., Dhungel B.P., Rasko J.E.J., Hebbard L., Schmitz U. // Brief. Bioinformatics. 2023. V. 24. https://doi.org/10.1093/bib/bbac519
  58. Liu Q., Hu Y., Stucky A., Fang L., Zhong J.F., Wang K. // BMC Genomics. 2020. V. 21. P. 793. https://doi.org/10.1186/s12864-020-07207-4
  59. Chen Y., Wang Y., Chen W., Tan Z., Song Y., Human Genome Structural Variation Consortium, Chen H., Chong Z. // Cancer Res. 2023. V. 83. P. 28–33. https://doi.org/10.1158/0008-5472.CAN-22-1628
  60. Ester M., Kriegel H.-P., Sander J., Xu X.A. // KDD’96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. 1996. P. 226–231. https://dl.acm.org/doi/10.5555/3001460.3001507
  61. GitHub – ruanjue/bsalign: Banded Striped DNA Sequence Alignment. https://github.com/ruanjue/bsalign
  62. Illumina Online Support Service – RNAseq Analysis Methods – STAR. https://support.illumina.com/help/BS_App_RNASeq_Alignment_OLH_1000000006112/Content/Source/Informatics/STAR_RNAseq.htm
  63. Alser M., Rotman J., Deshpande D., Taraszka K., Shi H., Baykal P.I., Yang H.T., Xue V., Knyazev S., Singer B.D., Balliu B., Koslicki D., Skums P., Zelikovsky A., Alkan C., Mutlu O., Mangul S. // Genome Biol. 2021. V. 22. P. 249. https://doi.org/10.1186/s13059-021-02443-7
  64. Jain M., Koren S., Miga K.H., Quick J., Rand A.C., Sasani T.A., Tyson J.R., Beggs A.D., Dilthey A.T., Fiddes I.T., Malla S., Marriott H., Nieto T., O’Grady J., Olsen H.E., Pedersen B.S., Rhie A., Richardson H., Quinlan A.R., Snutch T.P., Loose M. // Nat. Biotechnol. 2018. V. 36. P. 338–345. https://doi.org/10.1038/nbt.4060
  65. Merker J.D., Wenger A.M., Sneddon T., Grove M., Zappala Z., Fresard L., Waggott D., Utiramerur S., Hou Y., Smith K.S., Montgomery S.B., Wheeler M., Buchan J.G., Lambert C.C., Eng K.S., Hickey L., Korlach J., Ford J., Ashley E.A. // Genet. Med. 2018. V. 20. P. 159–163. https://doi.org/10.1038/gim.2017.86
  66. Carrara M., Beccuti M., Lazzarato F., Cavallo F., Cordero F., Donatelli S., Calogero R.A. // Biomed Res. Int. 2013. V. 2013. P. 340620. https://doi.org/10.1155/2013/340620
  67. Kumar S., Razzaq S.K., Vo A.D., Gautam M., Li H. // Wiley Interdiscip. Rev. RNA. 2016. V. 7. P. 811–823. https://doi.org/10.1002/wrna.1382
  68. Suntsova M., Gaifullin N., Allina D., Reshetun A., Li X., Mendeleeva L., Surin V., Sergeeva A., Spirin P., Prassolov V., Morgan A., Garazha A., Sorokin M., Buzdin A. // Sci. Data. 2019. V. 6. P. 36. https://doi.org/10.1038/s41597-019-0043-4
  69. Yi Q.-Q., Yang R., Shi J.-F., Zeng N.-Y., Liang D.-Y., Sha S., Chang Q. // J. Int. Med. Res. 2020. V. 48. P. 1259. https://doi.org/10.1177/0300060520931259
  70. Langmead B., Salzberg S.L. // Nat. Methods. 2012. V. 9. P. 357–359. https://doi.org/10.1038/nmeth.1923
  71. Rabushko E., Sorokin M., Suntsova M., Seryakov A.P., Kuzmin D.V., Poddubskaya E., Buzdin A.A. // Biomedicines. 2022. V. 10. P. 1866. https://doi.org/10.3390/biomedicines10081866
  72. The Harmonizome 3.0: Integrated Knowledge about Genes and Proteins. https://maayanlab.cloud/Harmonizome/about
  73. Rouillard A.D., Gundersen G.W., Fernandez N.F., Wang Z., Monteiro C.D., McDermott M.G., Ma’ayan A. // Database (Oxford). 2016. V. 2016. P. baw100. https://doi.org/10.1093/database/baw100
  74. Borisov N., Buzdin A. // Biomedicines. 2022. V. 10. P. 2318. https://doi.org/10.3390/biomedicines10092318
  75. Tembe W.D., Pond S.J., Legendre C., Chuang H.Y., Liang W.S., Kim N.E., Montel V., Wong S., McDaniel T.K., Craig D.W., Carpten J.D. // BMC Genomics. 2014. V. 15. P. 824. https://doi.org/10.1186/1471-2164-15-824
  76. Wick R.R. // J. Open Source Software. 2019. V. 4. P. 1316. https://doi.org/10.21105/joss.01316
  77. Yukiteru O., Kiyoshi A., Michiaki H. // Bioinformatics. 2013. V. 29. P. 119–121. https://doi.org/10.1093/bioinformatics/bts649

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Additional materials
Download (422KB)
3. Fig. 1. The main stages of the work of the algorithms for mapping reads to the genome (transcriptome). The figure was prepared and revised based on the materials of the article by Alser et al. [63].

Download (488KB)
4. Fig. 2. The main stages of the work of de novo genome (transcriptome) assembly algorithms.

Download (448KB)
5. Fig. 3. Construction of the de Bruijn graph on some set of DNA/RNA sequencing reads using k-mers (subsequences) of such reads.

Download (499KB)
6. Scheme 1. General procedure for studying a DNA/RNA sequencing sample for the presence of hybrids.

Download (217KB)

Copyright (c) 2024 Russian Academy of Sciences

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies