RESUMO
The evolution of bootstrap proportions (BP) with sequence length was studied using a 28S ribosomal RNA data set. For different sequence lengths, informative sites were jackknifed several times. Bootstrapping was subsequently performed on each of these subsamples. For each node, BPs so obtained were plotted against sequence length, showing the evolution of the robustness with increasing number of informative sites. For robust nodes (BP of 100%), the pattern of BPs is unvarying and is described by a simple function BP = 100 (1-e-b(x-x')), where x is the number of informative sites and b and x' are two parameters estimated using a nonlinear regression procedure. When a node has a BP < 100% and the pattern of BPs fits this function, it is possible to estimate the number of informative sites required to obtain a given average BP. The method also identifies nonrobust nodes (nonascending clusters of BP dots), for which it seems to be more cost effective and fruitful to turn to other species and/or genes rather than to continue sequencing longer gene lengths from the same species to reach a BP of 95%.
Assuntos
Sequência de Bases , DNA Ribossômico/genética , Interpretação Estatística de Dados , Peixes/classificação , Filogenia , RNA Ribossômico 28S/genética , Animais , Sequência Consenso , Peixes/genética , Distribuição Aleatória , Homologia de Sequência do Ácido Nucleico , Software , Especificidade da EspécieRESUMO
Representative properties of gnathostome species of a rich 28S rRNA data base were studied through the analysis of the fluctuations they provoked in bootstrap proportions (BPs) of nodes of parsimonious trees. Using original programs which permit BP comparison between different trees, it is empirically demonstrated that 4- to 24-species-trees are highly sensitive to species sampling: the inferences obtained from subsets of 4, 8, 16, or 24 species are not congruent with the whole set of 31 species. Study of trees obtained from exhaustively sampling all combinations of single species taken from each presumed monophyletic group shows precisely the impact of each species on the BP of each node. This procedure also shows that the impact of species changes within a given group on tree BPs is localized to its two or three neighboring nodes. The observation of differing impacts of species emphasizes the importance of sampling several species per presumed monophyletic group. It is also concluded that it is necessary to sample several successive outgroups and that the impact of a species on BPs depends mainly on the sampling context. Before undertaking extensive sequencing, the impact of species should be more often considered, since its effect on BPs is stronger than previously thought.