SVM-Root: Identification of Root-Associated Proteins in Plants by Employing the Support Vector Machine with Sequence-Derived Features


如何引用文章

全文:

详细

Background:Root is a desirable trait for modern plant breeding programs, as the roots play a pivotal role in the growth and development of plants. Therefore, identification of the genes governing the root traits is an essential research component. With regard to the identification of root-associated genes/proteins, the existing wet-lab experiments are resource intensive and the gene expression studies are species-specific. Thus, we proposed a supervised learning-based computational method for the identification of root-associated proteins.

Method:The problem was formulated as a binary classification, where the root-associated proteins and non-root-associated proteins constituted the two classes. Four different machine learning algorithms such as support vector machine (SVM), extreme gradient boosting, random forest, and adaptive boosting were employed for the classification of proteins of the two classes. Sequence-derived features such as AAC, DPC, CTD, PAAC, and ACF were used as input for the learning algorithms.

Results:The SVM achieved higher accuracy with the 250 selected features of AAC+DPC+CTD than that of other possible combinations of feature sets and learning algorithms. Specifically, SVM with the selected features achieved overall accuracies of 0.74, 0.73, and 0.73 when evaluated with single 5-fold cross-validation (5F-CV), repeated 5F-CV, and independent test set, respectively.

Conclusions:A web-enabled prediction tool SVM-Root (https://iasri-sg.icar.gov.in/svmroot/) has been developed for the computational prediction of the root-associated proteins. Being the first of its kind, the proposed model is believed to supplement the existing experimental methods and high throughput GWAS and transcriptome studies.

作者简介

Prabina Kumar Meher

Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute

编辑信件的主要联系方式.
Email: info@benthamscience.net

Siddhartha Hati

Department of Bioinformatics, Odisha University of Agriculture and Technology

Email: info@benthamscience.net

Tanmaya Sahu

Division of Genomic Resources, National Bureau of Plant Genetic Resources

Email: info@benthamscience.net

Upendra Pradhan

Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute

Email: info@benthamscience.net

Ajit Gupta

Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute

Email: info@benthamscience.net

Surya Rath

Department of Bioinformatics, Odisha University of Agriculture and Technology

Email: info@benthamscience.net

参考

  1. Grierson C, Nielsen E, Ketelaarc T, Schiefelbein J. Root hairs. Arabidopsis Book 2014; 2014(12): e0172. doi: 10.1199/tab.0172
  2. Hayat R, Ali S, Amara U, Khalid R, Ahmed I. Soil beneficial bacteria and their role in plant growth promotion: A review. Ann Microbiol 2010; 60(4): 579-98. doi: 10.1007/s13213-010-0117-1
  3. Brown LK, George TS, Dupuy LX, White PJ. A conceptual model of root hair ideotypes for future agricultural environments: What combination of traits should be targeted to cope with limited P availability? Ann Bot 2013; 112(2): 317-30. doi: 10.1093/aob/mcs231 PMID: 23172412
  4. Moisseyev G, Park K, Cui A, et al. RGPDB: Database of root-associated genes and promoters in maize, soybean, and sorghum. Database 2020; 2020: baaa038. doi: 10.1093/database/baaa038
  5. Coudert Y, Le VAT, Adam H, et al. Identification of CROWN ROOTLESS 1‐regulated genes in rice reveals specific and conserved elements of postembryonic root formation. New Phytol 2015; 206(1): 243-54. doi: 10.1111/nph.13196 PMID: 25442012
  6. Ober ES, Alahmad S, Cockram J, et al. Wheat root systems as a breeding target for climate resilience. Theor Appl Genet 2021; 134(6): 1645-62. doi: 10.1007/s00122-021-03819-w PMID: 33900415
  7. Ogura T, Goeschl C, Filiault D, et al. Root system depth in arabidopsis is shaped by EXOCYST70A3 via the dynamic modulation of auxin transport. Cell 2019; 178(2): 400-412.e16. doi: 10.1016/j.cell.2019.06.021 PMID: 31299202
  8. Li Y, Liu X, Chen R, Tian J, Fan Y, Zhou X. Genome-scale mining of root-preferential genes from maize and characterization of their promoter activity. BMC Plant Biol 2019; 19(1): 584. doi: 10.1186/s12870-019-2198-8 PMID: 31878892
  9. Lynch JP, Lynch JP. Roots of the second green revolution. Aust J Bot 2007; 55(5): 493-512. doi: 10.1071/BT06118
  10. Gewin V. Food: An underground revolution. Nature 2010; 466(7306): 552-3. doi: 10.1038/466552a PMID: 20671689
  11. Coudert Y, Périn C, Courtois B, Khong NG, Gantet P. Genetic control of root development in rice, the model cereal. Trends Plant Sci 2010; 15(4): 219-26. doi: 10.1016/j.tplants.2010.01.008 PMID: 20153971
  12. Uga Y, Kitomi Y, Ishikawa S, Yano M. Genetic improvement for root growth angle to enhance crop production. Breed Sci 2015; 65(2): 111-9. doi: 10.1270/jsbbs.65.111 PMID: 26069440
  13. Kalidhasan N, Joshi D, Bhatt T K, Gupta A K. Identification of key genes involved in root development of tomato using expressed sequence tag analysis. Physiol Mol Biol Plants 2015; 21(4): 491-503. doi: 10.1007/s12298-015-0304-4
  14. Birnbaum K, Shasha DE, Wang JY, et al. A gene expression map of the Arabidopsis root. Science 2003; 302(5652): 1956-60. doi: 10.1126/science.1090022 PMID: 14671301
  15. Fizames C, Muños S, Cazettes C, et al. The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence. Plant Physiol 2004; 134(1): 67-80. doi: 10.1104/pp.103.030536 PMID: 14730065
  16. Jones M, Smirnoff N. Nuclear dynamics during the simultaneous and sustained tip growth of multiple root hairs arising from a single root epidermal cell. J Exp Bot 2006; 57(15): 4269-75. doi: 10.1093/jxb/erl204 PMID: 17088364
  17. Markakis MN, De Cnodder T, Lewandowski M, et al. Identification of genes involved in the ACC-mediated control of root cell elongation in Arabidopsis thaliana. BMC Plant Biol 2012; 12(1): 208. doi: 10.1186/1471-2229-12-208 PMID: 23134674
  18. Toal T W, Ron M, Gibson D, et al. Regulation of root angle and gravitropism. G3 2018; 8(12): 3841-55. doi: 10.1534/g3.118.200540
  19. Kwasniewski M, Nowakowska U, Szumera J, Chwialkowska K, Szarejko I. iRootHair: A comprehensive root hair genomics database. Plant Physiol 2012; 161(1): 28-35. doi: 10.1104/pp.112.206441 PMID: 23129204
  20. Qi XH, Xu XW, Lin XJ, Zhang WJ, Chen XH. Identification of differentially expressed genes in cucumber (Cucumis sativus L.) root under waterlogging stress by digital gene expression profile. Genomics 2012; 99(3): 160-8. doi: 10.1016/j.ygeno.2011.12.008 PMID: 22240004
  21. Halder T, Liu H, Chen Y, Yan G, Siddique KHM. Identification of candidate genes for root traits using genotype–phenotype association analysis of near-isogenic lines in hexaploid Wheat (Triticum aestivum L.). Int J Mol Sci 2021; 22(7): 3579. doi: 10.3390/ijms22073579 PMID: 33808237
  22. Xu F, Chen S, Yang X, et al. Genome-wide association study on root traits under different growing environments in wheat (Triticum aestivum L.). Front Genet 2021; 12: 646712. doi: 10.3389/fgene.2021.646712 PMID: 34178022
  23. Huang F, Chen Z, Du D, et al. Genome-wide linkage mapping of QTL for root hair length in a Chinese common wheat population. Crop J 2020; 8(6): 1049-56. doi: 10.1016/j.cj.2020.02.007
  24. Kirschner GK, Rosignoli S, Guo L, et al. Enhanced gravitropism 2 encodes a sterile alpha motif–containing protein that controls root growth angle in barley and wheat. Proc Natl Acad Sci 2021; 118(35): e2101526118. doi: 10.1073/pnas.2101526118 PMID: 34446550
  25. Cai YD, Chou KC. Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. J Theor Biol 2006; 238(2): 395-400. doi: 10.1016/j.jtbi.2005.05.035 PMID: 16040052
  26. Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci Rep 2017; 7(1): 42362. doi: 10.1038/srep42362 PMID: 28205576
  27. Meher PK, Sahu TK, Mohanty J, et al. nifPred: Proteome-wide identification and categorization of nitrogen-fixation proteins of diaztrophs based on composition-transition-distribution features using support vector machine. Front Microbiol 2018; 9: 1100. doi: 10.3389/fmicb.2018.01100 PMID: 29896173
  28. Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001; 43(3): 246-55. doi: 10.1002/prot.1035 PMID: 11288174
  29. Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci 1995; 92(19): 8700-4. doi: 10.1073/pnas.92.19.8700 PMID: 7568000
  30. Govindan G, Nair AS. Composition, transition and distribution CTD - A dynamic feature for predictions based on hierarchical structure of cellular sorting. Proceedings - 2011 Annual IEEE India Conference: Engineering Sustainable Solutions, INDICON-2011. doi: 10.1109/INDCON.2011.6139332
  31. Liu W, Chou KC. Prediction of protein structural classes by modified mahalanobis discriminant algorithm. J Protein Chem 1998; 17(3): 209-17. doi: 10.1023/A:1022576400291 PMID: 9588944
  32. Zhang CT, Lin ZS, Zhang Z, Yan M. Prediction of the helix/strand content of globular proteins based on their primary sequences. Protein Eng Des Sel 1998; 11(11): 971-9. doi: 10.1093/protein/11.11.971 PMID: 9876917
  33. Ding Y, Cai Y, Zhang G, Xu W. The influence of dipeptide composition on protein thermostability. FEBS Lett 2004; 569(1-3): 284-8. doi: 10.1016/j.febslet.2004.06.009 PMID: 15225649
  34. Wang YC, Wang XB, Yang ZX, Deng NY. Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature. Protein Pept Lett 2010; 17(11): 1441-9. doi: 10.2174/0929866511009011441 PMID: 20666729
  35. Kawashima S, Kanehisa M. AAindex: Amino acid index database. Nucleic Acids Res 2000; 28(1): 374-4. doi: 10.1093/nar/28.1.374 PMID: 10592278
  36. Xiao N, Cao DS, Zhu MF, Xu QS. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 2015; 31(11): 1857-9. doi: 10.1093/bioinformatics/btv042 PMID: 25619996
  37. Li H. Using the BioSeqClass Package. Homo. 2010; pp. 1-18. Available from: https://www.bioconductor.org/packages//2.7/bioc/vignettes/BioSeqClass/inst/doc/BioSeqClass.pdf
  38. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn 2002; 46(1/3): 389-422. doi: 10.1023/A:1012487302797
  39. Harikrishna S, Farquad MAH, Shabana . Credit scoring using support vector machine: A comparative analysis. Adv Mat Res 2012; 433(440): 6527-6533,-. doi: 10.4028/ href='www.scientific.net/AMR.433-440.6527' target='_blank'>www.scientific.net/AMR.433-440.6527
  40. Lin X, Yang F, Zhou L, et al. A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information. J Chromatogr B Analyt Technol Biomed Life Sci 2012; 910: 149-55. doi: 10.1016/j.jchromb.2012.05.020 PMID: 22682888
  41. Huang ML, Hung YH, Lee WM, Li RK, Jiang BR. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. ScientificWorldJournal 2014; 2014: 1-10. doi: 10.1155/2014/795624 PMID: 25295306
  42. Meher PK, Begam S, Sahu TK, et al. ASRmiRNA: Abiotic stress-responsive mirna prediction in plants by using machine learning algorithms with pseudo K-Tuple Nucleotide compositional features. Int J Mol Sci 2022; 23(3): 1612. doi: 10.3390/ijms23031612 PMID: 35163534
  43. Das P, Roychowdhury A, Das S, Roychoudhury S, Tripathy S. sigFeature: Novel significant feature selection method for classification of gene expression data using support vector machine and t statistic. Front Genet 2020; 11: 247. doi: 10.3389/fgene.2020.00247 PMID: 32346383
  44. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20(3): 273-97. doi: 10.1007/BF00994018
  45. Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32. doi: 10.1023/A:1010933404324
  46. Freund Y, Schapire RE. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning. San Fransisco, USA. 1996; pp. 148-56.
  47. Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Anchorage, USA. 2019; pp. 13-7. doi: 10.1145/2939672.2939785
  48. Dimitriadou AE, Hornik K, Leisch F, Meyer D, Weingessel A, Friedrichleischcituwienacat MFL. The E1071 Package. 2014. Available from: https://cran.r-project.org/web/packages/e1071/index.html
  49. Liaw A, Wiener M. Classification and regression by random forest. R News 2002; 2: 18-22. Available from: https://cogns.northwestern.edu/cbmg/LiawAndWiener2002.pdf
  50. Alfaro E, Gámez M, García N. adabag: An R package for classification with boosting and bagging. J Stat Softw 2013; 54(2): 1-35. doi: 10.18637/jss.v054.i02
  51. xgboost: Extreme Gradient Boosting version 1.6.0.1 from CRAN. Available from: https://rdrr.io/cran/xgboost/ (accessed 2022-04-21).
  52. Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett 2006; 27(8): 861-74. doi: 10.1016/j.patrec.2005.10.010
  53. Davis J, Goadrich M. The relationship between precision-recall and ROC curves. In. ACM International Conference Proceeding Series. New York, USA: ACM 2006; pp. 233-40. doi: 10.1145/1143844.1143874
  54. Manschadi AM, Kaul HP, Vollmann J, Eitzinger J, Wenzel W. Developing phosphorus-efficient crop varieties-An interdisciplinary research framework. Field Crops Res 2014; 162: 87-98. doi: 10.1016/j.fcr.2013.12.016
  55. Comas LH, Becker SR, Cruz VMV, Byrne PF, Dierig DA. Root traits contributing to plant productivity under drought. Front Plant Sci 2013; 4: 442. doi: 10.3389/fpls.2013.00442 PMID: 24204374
  56. Fenta B, Beebe S, Kunert K, et al. Field phenotyping of soybean roots for drought stress tolerance. Agronomy 2014; 4(3): 418-35. doi: 10.3390/agronomy4030418
  57. Wade LJ, Bartolome V, Mauleon R, et al. Environmental response and genomic regions correlated with rice root growth and yield under drought in the oryzasnp panel across multiple study systems. PLoS One 2015; 10(4): e0124127. doi: 10.1371/journal.pone.0124127 PMID: 25909711
  58. Rosas-Quijano R, Ontiveros-Cisneros A, Montes-García N, et al. A General Overview of Sweet Sorghum Genomics. London, UK: IntechOpen 2021. doi: 10.5772/intechopen.98539
  59. Brendel V, Kurtz S, Walbot V. Comparative genomics of Arabidopsis and maize: Prospects and limitations. Genome Biol 2002; 3(3): reviews1005.1. doi: 10.1186/gb-2002-3-3-reviews1005 PMID: 11897028
  60. Paterson AH. Genomics of sorghum. Int J Plant Genomics 2008; 2008: 1-6. doi: 10.1155/2008/362451 PMID: 18483564
  61. Traore SM, He G, Traore SM, He G. Soybean as a Model Crop to Study Plant Oil Genes: Mutations in FAD2 Gene Family. London. UK: IntechOpen 2021. doi: 10.5772/intechopen.99752
  62. Ferguson BJ, Gresshoff PM. Soybean as a model legume. Grain Legumes 2009; 53: 7.

补充文件

附件文件
动作
1. JATS XML

版权所有 © Bentham Science Publishers, 2024