MSSD: An Efficient Method for Constructing Accurate and Stable Phylogenetic Networks by Merging Subtrees of Equal Depth


Cite item

Full Text

Abstract

Background:Systematic phylogenetic networks are essential for studying the evolutionary relationships and diversity among species. These networks are particularly important for capturing non-tree-like processes resulting from reticulate evolutionary events. However, existing methods for constructing phylogenetic networks are influenced by the order of inputs. The different orders can lead to inconsistent experimental results. Moreover, constructing a network for large datasets is time-consuming and the network often does not include all of the input tree nodes.

Aims:This paper aims to propose a novel method, called as MSSD, which can construct a phylogenetic network from gene trees by Merging Subtrees with the Same Depth in a bottom-up way.

Methods:The MSSD first decomposes trees into subtrees based on depth. Then it merges subtrees with the same depth from 0 to the maximum depth. For all subtrees of one depth, it inserts each subtree into the current networks by means of identical subtrees.

Results:We test the MSSD on the simulated data and real data. The experimental results show that the networks constructed by the MSSD can represent all input trees and the MSSD is more stable than other methods. The MSSD can construct networks faster and the constructed networks have more similar information with the input trees than other methods.

Conclusion:MSSD is a powerful tool for studying the evolutionary relationships among species in biologyand is free available at https://github.com/xingjiajie2023/MSSD.

About the authors

Jiajie Xing

School of Computer Science, Inner Mongolia University

Email: info@benthamscience.net

Xu Song

School of Computer Science, Inner Mongolia University

Email: info@benthamscience.net

Meiju Yu

School of Computer Science, Inner Mongolia University

Email: info@benthamscience.net

Juan Wang

School of Computer Science, Inner Mongolia University

Author for correspondence.
Email: info@benthamscience.net

Jing Yu

School of Education, Inner Mongolia Normal University

Author for correspondence.
Email: info@benthamscience.net

References

  1. Leventhal GE, Kouyos R, Stadler T, et al. Inferring epidemic contact structure from phylogenetic trees. PLOS Comput Biol 2012; 8(3): e1002413. doi: 10.1371/journal.pcbi.1002413 PMID: 22412361
  2. Forster P, Forster L, Renfrew C, Forster M. Phylogenetic network analysis of SARS-CoV-2 genomes. Proc Natl Acad Sci 2020; 117(17): 9241-3. doi: 10.1073/pnas.2004999117 PMID: 32269081
  3. Sapoval N, Aghazadeh A, Nute MG, et al. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13(1): 1728. doi: 10.1038/s41467-022-29268-7 PMID: 35365602
  4. Linder CR. Network (reticulate) evolution: Biology, models, and algorithms Ninth pacific symposium on biocomputing, Hawaii. USA. 2004.
  5. Maddison WP. Gene trees in species trees. Syst Biol 1997; 46(3): 523-36. doi: 10.1093/sysbio/46.3.523
  6. Nakhleh L. Evolutionary phylogenetic networks: Models and issues. In: Heath L, Ramakrishnan N, Eds. Problem Solving Handbook in Computational Biology and Bioinformatics. Boston, MA: Springer 2010; pp. 125-58. doi: 10.1007/978-0-387-09760-2_7
  7. Gusfield D, Bansal V. A fundamental decomposition theory for phylogenetic networks and incompatible characters. In: Miyano S, Mesirov J, Kasif S, Istrail S, Pevzner PA, Waterman M, Eds. Research in Computational Molecular BiologyRECOMB 2005 Lecture Notes in Computer Science. Berlin, Heidelberg: Springer 2005; p. 3500. doi: 10.1007/11415770_17
  8. Song YS, Hein J. Constructing minimal ancestral recombination graphs. J Comput Biol 2005; 12(2): 147-69. doi: 10.1089/cmb.2005.12.147 PMID: 15767774
  9. Huson DH, Scornavacca C. A survey of combinatorial methods for phylogenetic networks. Genome Biol Evol 2011; 3: 23-35. doi: 10.1093/gbe/evq077 PMID: 21081312
  10. Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 2006; 23(2): 254-67. doi: 10.1093/molbev/msj030 PMID: 16221896
  11. Jin G, Nakhleh L, Snir S, Tuller T. Maximum likelihood of phylogenetic networks. Bioinformatics 2006; 22(21): 2604-11. doi: 10.1093/bioinformatics/btl452 PMID: 16928736
  12. Zhu J, Liu X, Ogilvie HA, Nakhleh LK. A divide-and-conquer method for scalable phylogenetic network inference from multilocus data. Bioinformatics 2019; 35(14): i370-8. doi: 10.1093/bioinformatics/btz359 PMID: 31510688
  13. Yu Y, Nakhleh L. A maximum pseudo-likelihood approach for phylo genetic networks. BMC Genomics 2015; 16(10): 1-10.
  14. Zhu J, Nakhleh L. Inference of species phylogenies from bi-allelic markers using pseudo-likelihood. Bioinformatics 2018; 34(13): i376-85. doi: 10.1093/bioinformatics/bty295 PMID: 29950004
  15. Wheeler WC. Phylogenetic network analysis as a parsimony optimization problem. BMC Bioinformatics 2015; 16(1): 296. doi: 10.1186/s12859-015-0675-0 PMID: 26382078
  16. Van Iersel L, Jones M, Scornavacca C. Improved maximum parsimony models for phylogenetic networks. Syst Biol 2018; 67(3): 518-42. doi: 10.1093/sysbio/syx094 PMID: 29272537
  17. Bryant C, Fischer M, Linz S, Semple C. On the quirks of maximum parsimony and likelihood on phylogenetic networks. J Theor Biol 2017; 417: 100-8. doi: 10.1016/j.jtbi.2017.01.013 PMID: 28087420
  18. Nakhleh L. Reconstructing phylogenetic networks using maximum parsimony. IEEE Computational Systems Bioinformatics Conference (CSB’05). Stanford, CA, USA. 2017; pp. 93-102.
  19. Kannan L, Wheeler WC. Maximum parsimony on phylogenetic networks. Algorithms Mol Biol 2012; 7(1): 9. doi: 10.1186/1748-7188-7-9 PMID: 22551229
  20. Yan Z, et al. Maximum parsimony inference of phylogenetic networks in the presence of polyploid complexes. Syst Biol 2022; 71(3): 706-20. PMID: 34605924
  21. Hejase HA. Fast and accurate statistical inference of phylogenetic networks using large-scale genomic sequence data. RECOMB International conference on Comparative Genomics Cham: Springer 2018; 242-59. doi: 10.1007/978-3-030-00834-5_14
  22. Lutteropp S. NetRAX: Accurate and fast maximum likelihood phylogenetic network inference. Bioinformatics 2021; 38(15): 3725-3. PMID: 35713506
  23. Zhang C, Ogilvie HA, Drummond AJ, Stadler T. Bayesian inference of species networks from multilocus sequence data. Mol Biol Evol 2018; 35(2): 504-17. doi: 10.1093/molbev/msx307 PMID: 29220490
  24. Poormohammadi H. Constructing rooted phylogenetic networks from triplets based on height function. JETAE 2014; 2: 389-93.
  25. Reyhani MH, Poormohammadi H. RPNCH: A method for constructing rooted phylogenetic networks from rooted triplets based on height function. Archives Adv Biosci 2017; 8(4): 14-20.
  26. Tusserkani R, Hadi P, Azin A, Changiz E. Inferring phylogenies from minimal information. Third International Conference on Advances in Electrical. Chennai, India. 2017; pp. 202-6.
  27. Poormohammadi H, Zarchi MS, Ghaneai H. NCHB: A method for constructing rooted phylogenetic networks from rooted triplets based on height function and binarization. J Theor Biol 2020; 489: 110144. doi: 10.1016/j.jtbi.2019.110144 PMID: 31911141
  28. Poormohammadi H, Mohsen SZ. Netcombin: An algorithm for constructing optimal phylogenetic network from rooted triplets. PLoS One 2020; 15(9): e0227842.
  29. Huson DH, Rupp R. Summarizing multiple gene trees using cluster networks. In: Algorithms in Bioinformatics (WABI) Berlin: Springer . 2008; 5251: pp. 296-305.
  30. Huson DH, Rupp R, Berry V, Gambette P, Paul C. Computing galled networks from real data. Bioinformatics 2009; 25(12): i85-93. doi: 10.1093/bioinformatics/btp217 PMID: 19478021
  31. van Iersel L, Kelk S, Rupp R, Huson D. Phylogenetic networks do not need to be complex: Using fewer reticulations to represent conflicting clusters. Bioinformatics 2010; 26(12): i124-31. doi: 10.1093/bioinformatics/btq202 PMID: 20529896
  32. Wang J, Guo M, Xing L, Che K, Liu X, Wang C. BIMLR: A method for constructing rooted phylogenetic networks from rooted phylogenetic trees. Gene 2013; 527(1): 344-51. doi: 10.1016/j.gene.2013.06.036 PMID: 23816409
  33. Wang J, Guo M, Liu X, et al. Lnetwork: An efficient and effective method for constructing phylogenetic networks. Bioinformatics 2013; 29(18): 2269-76. doi: 10.1093/bioinformatics/btt378 PMID: 23811095
  34. Wang J, Zhibin Z. yanjuan L. Constructing phylogenetic networks based on the isomorphism of datasets. Biomed Res Int 2016; 2016: 4236858.
  35. van Iersel L, Janssen R, Jones M, et al. A practical fixed-parameter algorithm for constructing tree-child networks from multiple binary trees. Algorithmica 2019; 84: 917-60.
  36. Solís-Lemus C, Ané C. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet 2016; 12(3): e1005896.
  37. Allman ES, Baños H, Rhodes JA. NANUQ: A method for inferring species networks from gene trees under the coalescent mode. Algorithms Mol Biol 2019; 14: 24.
  38. Markin A. Robinson-Foulds reticulation networks BCB ’19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. Niagara Falls, NY, USA. 2019; pp. 77-86.
  39. Alexey M. RF-Net 2: Fast inference of virus reassortment and hybridization networks. Bioinformatics 2022; 38(8): 2144-52.
  40. Tutsoy O. Graph theory based large-scale machine learning with multi-dimensional constrained optimization approaches for exact epidemiological modelling of pandemic diseases. IEEE Trans Pattern Anal Mach Intell 2023; 1-10. doi: 10.1109/TPAMI.2023.3256421
  41. Jukes TH, Cantor CR. Evolution of protein molecules. In: Mammalian Protein Metabolism Amsterdam: Elsevier. 2019; 3: pp. 21-132.
  42. Wang J, Guo MZ, Xing LL. Methodology FastJoin, an improved neighbor-joining algorithm. Genet Mol Res 2012; 11(3): 1909-22. doi: 10.4238/2012.July.19.10 PMID: 22869546
  43. Wang J, Guo M. A metric on the space of rooted phylogenetic trees. Curr Bioinform 2018; 13(5): 487-91. doi: 10.2174/1574893612666171122145801
  44. Wang J, Qi X, Cui B, Guo M. A survey of metrics measuring difference for rooted phylogenetic trees. Curr Bioinform 2020; 15(7): 697-702. doi: 10.2174/1574893614666191017130217
  45. Rambaut A, Grass NC. Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics 1997; 13(3): 235-8. doi: 10.1093/bioinformatics/13.3.235 PMID: 9183526
  46. Wang J, Guo M. A review of metrics measuring dissimilarity for rooted phylogenetic networks. Brief Bioinform 2019; 20(6): 1972-80. doi: 10.1093/bib/bby062 PMID: 30020404
  47. Larkin MA, Blackshields G, Brown NP, et al. Clustal W and Clustal X version 2.0. Bioinformatics 2007; 23(21): 2947-8. doi: 10.1093/bioinformatics/btm404 PMID: 17846036
  48. Tamura K, Stecher G, Kumar S. MEGA11: Molecular evolutionary genetics analysis version 11. Mol Biol Evol 2021; 38(7): 3022-7. doi: 10.1093/molbev/msab120 PMID: 33892491

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2024 Bentham Science Publishers