MSSD: An Efficient Method for Constructing Accurate and Stable Phylogenetic Networks by Merging Subtrees of Equal Depth
- Authors: Xing J.1, Song X.1, Yu M.1, Wang J.1, Yu J.2
-
Affiliations:
- School of Computer Science, Inner Mongolia University
- School of Education, Inner Mongolia Normal University
- Issue: Vol 19, No 9 (2024)
- Pages: 879-889
- Section: Life Sciences
- URL: https://gynecology.orscience.ru/1574-8936/article/view/644100
- DOI: https://doi.org/10.2174/0115748936256923230927081102
- ID: 644100
Cite item
Full Text
Abstract
Background:Systematic phylogenetic networks are essential for studying the evolutionary relationships and diversity among species. These networks are particularly important for capturing non-tree-like processes resulting from reticulate evolutionary events. However, existing methods for constructing phylogenetic networks are influenced by the order of inputs. The different orders can lead to inconsistent experimental results. Moreover, constructing a network for large datasets is time-consuming and the network often does not include all of the input tree nodes.
Aims:This paper aims to propose a novel method, called as MSSD, which can construct a phylogenetic network from gene trees by Merging Subtrees with the Same Depth in a bottom-up way.
Methods:The MSSD first decomposes trees into subtrees based on depth. Then it merges subtrees with the same depth from 0 to the maximum depth. For all subtrees of one depth, it inserts each subtree into the current networks by means of identical subtrees.
Results:We test the MSSD on the simulated data and real data. The experimental results show that the networks constructed by the MSSD can represent all input trees and the MSSD is more stable than other methods. The MSSD can construct networks faster and the constructed networks have more similar information with the input trees than other methods.
Conclusion:MSSD is a powerful tool for studying the evolutionary relationships among species in biologyand is free available at https://github.com/xingjiajie2023/MSSD.
Keywords
About the authors
Jiajie Xing
School of Computer Science, Inner Mongolia University
Email: info@benthamscience.net
Xu Song
School of Computer Science, Inner Mongolia University
Email: info@benthamscience.net
Meiju Yu
School of Computer Science, Inner Mongolia University
Email: info@benthamscience.net
Juan Wang
School of Computer Science, Inner Mongolia University
Author for correspondence.
Email: info@benthamscience.net
Jing Yu
School of Education, Inner Mongolia Normal University
Author for correspondence.
Email: info@benthamscience.net
References
- Leventhal GE, Kouyos R, Stadler T, et al. Inferring epidemic contact structure from phylogenetic trees. PLOS Comput Biol 2012; 8(3): e1002413. doi: 10.1371/journal.pcbi.1002413 PMID: 22412361
- Forster P, Forster L, Renfrew C, Forster M. Phylogenetic network analysis of SARS-CoV-2 genomes. Proc Natl Acad Sci 2020; 117(17): 9241-3. doi: 10.1073/pnas.2004999117 PMID: 32269081
- Sapoval N, Aghazadeh A, Nute MG, et al. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022; 13(1): 1728. doi: 10.1038/s41467-022-29268-7 PMID: 35365602
- Linder CR. Network (reticulate) evolution: Biology, models, and algorithms Ninth pacific symposium on biocomputing, Hawaii. USA. 2004.
- Maddison WP. Gene trees in species trees. Syst Biol 1997; 46(3): 523-36. doi: 10.1093/sysbio/46.3.523
- Nakhleh L. Evolutionary phylogenetic networks: Models and issues. In: Heath L, Ramakrishnan N, Eds. Problem Solving Handbook in Computational Biology and Bioinformatics. Boston, MA: Springer 2010; pp. 125-58. doi: 10.1007/978-0-387-09760-2_7
- Gusfield D, Bansal V. A fundamental decomposition theory for phylogenetic networks and incompatible characters. In: Miyano S, Mesirov J, Kasif S, Istrail S, Pevzner PA, Waterman M, Eds. Research in Computational Molecular BiologyRECOMB 2005 Lecture Notes in Computer Science. Berlin, Heidelberg: Springer 2005; p. 3500. doi: 10.1007/11415770_17
- Song YS, Hein J. Constructing minimal ancestral recombination graphs. J Comput Biol 2005; 12(2): 147-69. doi: 10.1089/cmb.2005.12.147 PMID: 15767774
- Huson DH, Scornavacca C. A survey of combinatorial methods for phylogenetic networks. Genome Biol Evol 2011; 3: 23-35. doi: 10.1093/gbe/evq077 PMID: 21081312
- Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 2006; 23(2): 254-67. doi: 10.1093/molbev/msj030 PMID: 16221896
- Jin G, Nakhleh L, Snir S, Tuller T. Maximum likelihood of phylogenetic networks. Bioinformatics 2006; 22(21): 2604-11. doi: 10.1093/bioinformatics/btl452 PMID: 16928736
- Zhu J, Liu X, Ogilvie HA, Nakhleh LK. A divide-and-conquer method for scalable phylogenetic network inference from multilocus data. Bioinformatics 2019; 35(14): i370-8. doi: 10.1093/bioinformatics/btz359 PMID: 31510688
- Yu Y, Nakhleh L. A maximum pseudo-likelihood approach for phylo genetic networks. BMC Genomics 2015; 16(10): 1-10.
- Zhu J, Nakhleh L. Inference of species phylogenies from bi-allelic markers using pseudo-likelihood. Bioinformatics 2018; 34(13): i376-85. doi: 10.1093/bioinformatics/bty295 PMID: 29950004
- Wheeler WC. Phylogenetic network analysis as a parsimony optimization problem. BMC Bioinformatics 2015; 16(1): 296. doi: 10.1186/s12859-015-0675-0 PMID: 26382078
- Van Iersel L, Jones M, Scornavacca C. Improved maximum parsimony models for phylogenetic networks. Syst Biol 2018; 67(3): 518-42. doi: 10.1093/sysbio/syx094 PMID: 29272537
- Bryant C, Fischer M, Linz S, Semple C. On the quirks of maximum parsimony and likelihood on phylogenetic networks. J Theor Biol 2017; 417: 100-8. doi: 10.1016/j.jtbi.2017.01.013 PMID: 28087420
- Nakhleh L. Reconstructing phylogenetic networks using maximum parsimony. IEEE Computational Systems Bioinformatics Conference (CSB05). Stanford, CA, USA. 2017; pp. 93-102.
- Kannan L, Wheeler WC. Maximum parsimony on phylogenetic networks. Algorithms Mol Biol 2012; 7(1): 9. doi: 10.1186/1748-7188-7-9 PMID: 22551229
- Yan Z, et al. Maximum parsimony inference of phylogenetic networks in the presence of polyploid complexes. Syst Biol 2022; 71(3): 706-20. PMID: 34605924
- Hejase HA. Fast and accurate statistical inference of phylogenetic networks using large-scale genomic sequence data. RECOMB International conference on Comparative Genomics Cham: Springer 2018; 242-59. doi: 10.1007/978-3-030-00834-5_14
- Lutteropp S. NetRAX: Accurate and fast maximum likelihood phylogenetic network inference. Bioinformatics 2021; 38(15): 3725-3. PMID: 35713506
- Zhang C, Ogilvie HA, Drummond AJ, Stadler T. Bayesian inference of species networks from multilocus sequence data. Mol Biol Evol 2018; 35(2): 504-17. doi: 10.1093/molbev/msx307 PMID: 29220490
- Poormohammadi H. Constructing rooted phylogenetic networks from triplets based on height function. JETAE 2014; 2: 389-93.
- Reyhani MH, Poormohammadi H. RPNCH: A method for constructing rooted phylogenetic networks from rooted triplets based on height function. Archives Adv Biosci 2017; 8(4): 14-20.
- Tusserkani R, Hadi P, Azin A, Changiz E. Inferring phylogenies from minimal information. Third International Conference on Advances in Electrical. Chennai, India. 2017; pp. 202-6.
- Poormohammadi H, Zarchi MS, Ghaneai H. NCHB: A method for constructing rooted phylogenetic networks from rooted triplets based on height function and binarization. J Theor Biol 2020; 489: 110144. doi: 10.1016/j.jtbi.2019.110144 PMID: 31911141
- Poormohammadi H, Mohsen SZ. Netcombin: An algorithm for constructing optimal phylogenetic network from rooted triplets. PLoS One 2020; 15(9): e0227842.
- Huson DH, Rupp R. Summarizing multiple gene trees using cluster networks. In: Algorithms in Bioinformatics (WABI) Berlin: Springer . 2008; 5251: pp. 296-305.
- Huson DH, Rupp R, Berry V, Gambette P, Paul C. Computing galled networks from real data. Bioinformatics 2009; 25(12): i85-93. doi: 10.1093/bioinformatics/btp217 PMID: 19478021
- van Iersel L, Kelk S, Rupp R, Huson D. Phylogenetic networks do not need to be complex: Using fewer reticulations to represent conflicting clusters. Bioinformatics 2010; 26(12): i124-31. doi: 10.1093/bioinformatics/btq202 PMID: 20529896
- Wang J, Guo M, Xing L, Che K, Liu X, Wang C. BIMLR: A method for constructing rooted phylogenetic networks from rooted phylogenetic trees. Gene 2013; 527(1): 344-51. doi: 10.1016/j.gene.2013.06.036 PMID: 23816409
- Wang J, Guo M, Liu X, et al. Lnetwork: An efficient and effective method for constructing phylogenetic networks. Bioinformatics 2013; 29(18): 2269-76. doi: 10.1093/bioinformatics/btt378 PMID: 23811095
- Wang J, Zhibin Z. yanjuan L. Constructing phylogenetic networks based on the isomorphism of datasets. Biomed Res Int 2016; 2016: 4236858.
- van Iersel L, Janssen R, Jones M, et al. A practical fixed-parameter algorithm for constructing tree-child networks from multiple binary trees. Algorithmica 2019; 84: 917-60.
- Solís-Lemus C, Ané C. Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet 2016; 12(3): e1005896.
- Allman ES, Baños H, Rhodes JA. NANUQ: A method for inferring species networks from gene trees under the coalescent mode. Algorithms Mol Biol 2019; 14: 24.
- Markin A. Robinson-Foulds reticulation networks BCB 19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. Niagara Falls, NY, USA. 2019; pp. 77-86.
- Alexey M. RF-Net 2: Fast inference of virus reassortment and hybridization networks. Bioinformatics 2022; 38(8): 2144-52.
- Tutsoy O. Graph theory based large-scale machine learning with multi-dimensional constrained optimization approaches for exact epidemiological modelling of pandemic diseases. IEEE Trans Pattern Anal Mach Intell 2023; 1-10. doi: 10.1109/TPAMI.2023.3256421
- Jukes TH, Cantor CR. Evolution of protein molecules. In: Mammalian Protein Metabolism Amsterdam: Elsevier. 2019; 3: pp. 21-132.
- Wang J, Guo MZ, Xing LL. Methodology FastJoin, an improved neighbor-joining algorithm. Genet Mol Res 2012; 11(3): 1909-22. doi: 10.4238/2012.July.19.10 PMID: 22869546
- Wang J, Guo M. A metric on the space of rooted phylogenetic trees. Curr Bioinform 2018; 13(5): 487-91. doi: 10.2174/1574893612666171122145801
- Wang J, Qi X, Cui B, Guo M. A survey of metrics measuring difference for rooted phylogenetic trees. Curr Bioinform 2020; 15(7): 697-702. doi: 10.2174/1574893614666191017130217
- Rambaut A, Grass NC. Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics 1997; 13(3): 235-8. doi: 10.1093/bioinformatics/13.3.235 PMID: 9183526
- Wang J, Guo M. A review of metrics measuring dissimilarity for rooted phylogenetic networks. Brief Bioinform 2019; 20(6): 1972-80. doi: 10.1093/bib/bby062 PMID: 30020404
- Larkin MA, Blackshields G, Brown NP, et al. Clustal W and Clustal X version 2.0. Bioinformatics 2007; 23(21): 2947-8. doi: 10.1093/bioinformatics/btm404 PMID: 17846036
- Tamura K, Stecher G, Kumar S. MEGA11: Molecular evolutionary genetics analysis version 11. Mol Biol Evol 2021; 38(7): 3022-7. doi: 10.1093/molbev/msab120 PMID: 33892491
Supplementary files
