Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 17 de 17
Filter
Add more filters










Publication year range
1.
Sci Data ; 11(1): 321, 2024 Mar 28.
Article in English | MEDLINE | ID: mdl-38548727

ABSTRACT

Flexible bronchoscopy has revolutionized respiratory disease diagnosis. It offers direct visualization and detection of airway abnormalities, including lung cancer lesions. Accurate identification of airway lesions during flexible bronchoscopy plays an important role in the lung cancer diagnosis. The application of artificial intelligence (AI) aims to support physicians in recognizing anatomical landmarks and lung cancer lesions within bronchoscopic imagery. This work described the development of BM-BronchoLC, a rich bronchoscopy dataset encompassing 106 lung cancer and 102 non-lung cancer patients. The dataset incorporates detailed localization and categorical annotations for both anatomical landmarks and lesions, meticulously conducted by senior doctors at Bach Mai Hospital, Vietnam. To assess the dataset's quality, we evaluate two prevalent AI backbone models, namely UNet++ and ESFPNet, on the image segmentation and classification tasks with single-task and multi-task learning paradigms. We present BM-BronchoLC as a reference dataset in developing AI models to assist diagnostic accuracy for anatomical landmarks and lung cancer lesions in bronchoscopy data.


Subject(s)
Bronchoscopy , Lung Neoplasms , Humans , Artificial Intelligence , Lung Neoplasms/diagnostic imaging , Thorax/diagnostic imaging , Anatomic Landmarks/diagnostic imaging
2.
J Evol Biol ; 37(2): 256-265, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38366253

ABSTRACT

Estimating parameters of amino acid substitution models is a crucial task in bioinformatics. The maximum likelihood (ML) approach has been proposed to estimate amino acid substitution models from large datasets. The quality of newly estimated models is normally assessed by comparing with the existing models in building ML trees. Two important questions remained are the correlation of the estimated models with the true models and the required size of the training datasets to estimate reliable models. In this article, we performed a simulation study to answer these two questions based on simulated data. We simulated genome datasets with different numbers of genes/alignments based on predefined models (called true models) and predefined trees (called true trees). The simulated datasets were used to estimate amino acid substitution model using the ML estimation methods. Our experiments showed that models estimated by the ML methods from simulated datasets with more than 100 genes have high correlations with the true models. The estimated models performed well in building ML trees in comparison with the true models. The results suggest that amino acid substitution models estimated by the ML methods from large genome datasets are a reliable tool for analyzing amino acid sequences.


Subject(s)
Algorithms , Genome , Amino Acid Substitution , Phylogeny , Computer Simulation , Models, Genetic
3.
J Evol Biol ; 36(3): 499-506, 2023 03.
Article in English | MEDLINE | ID: mdl-36598184

ABSTRACT

Amino acid substitution models represent the substitution rates among amino acids during the evolution of protein sequences. The models are a prerequisite for maximum likelihood or Bayesian methods to analyse the phylogenetic relationships among species based on their protein sequences. Estimating amino acid substitution models requires large protein datasets and intensive computation. In this paper, we presented the estimation of both time-reversible model (Q.met) and time non-reversible model (NQ.met) for multicellular animals (Metazoa). Analyses showed that the Q.met and NQ.met models were significantly better than existing models in analysing metazoan protein sequences. Moreover, the time non-reversible model NQ.met enables us to reconstruct the rooted phylogenetic tree for Metazoa. We recommend researchers to employ the Q.met and NQ.met models in analysing metazoan protein sequences.


Subject(s)
Evolution, Molecular , Proteins , Animals , Phylogeny , Amino Acid Substitution , Bayes Theorem , Models, Genetic
4.
Ann Hum Biol ; 49(2): 152-155, 2022 Mar.
Article in English | MEDLINE | ID: mdl-35289678

ABSTRACT

BACKGROUND: Human cytochrome P450 (CYPs) genes are essential in metabolising drugs. Due to their high polymorphism, population-specific studies are of great interest. AIM: This research examined the six CYP genes, including CYP2B6, CYP2C9, CYP2C19, CYP2D6, CYP3A5, and CYP4F2 in the Kinh Vietnamese (KHV) for population-scale precision medicine. SUBJECTS AND METHODS: We processed data from a genomics database of 206 healthy and unrelated KHV individuals to calculate CYP allele frequencies. First, we compared the CYP genes of the KHV to six other populations retrieved from the 1000 Genomes Project. Second, we searched the PharmGBK database for drug-CYP interaction data to compile a drug dosage recommendation for the KHV. RESULTS: We observed the diverging trends in genetic variations of CYP2B6, CYP2D6, and CYP3A5 in the KHV. Regarding phenotypic drug responses in the KHV, CYP2C19 exhibited all metabolic phenotypes at a non-trivial frequency. In addition, CYP3A5 metabolised drugs at a lower rate compared to the other five CYPs. CONCLUSION: This is the first large-scale study to investigate multiple CYP genes in the KHV for precision medicine from a public health perspective. Differences found in the distributions of metabolizers for the KHV suggest careful prescriptions for CYP2C19 and CYP3A5-metabolised drugs.


Subject(s)
Cytochrome P-450 CYP2D6 , Cytochrome P-450 CYP3A , Asian People/genetics , Cytochrome P-450 CYP2B6 , Cytochrome P-450 CYP2C19/genetics , Cytochrome P-450 CYP2C9 , Cytochrome P-450 CYP2D6/genetics , Cytochrome P-450 CYP3A/genetics , Cytochrome P-450 Enzyme System/genetics , Genomics , Humans , Public Health
5.
Syst Biol ; 71(5): 1110-1123, 2022 08 10.
Article in English | MEDLINE | ID: mdl-35139203

ABSTRACT

Amino acid substitution models are a key component in phylogenetic analyses of protein sequences. All commonly used amino acid models available to date are time-reversible, an assumption designed for computational convenience but not for biological reality. Another significant downside to time-reversible models is that they do not allow inference of rooted trees without outgroups. In this article, we introduce a maximum likelihood approach nQMaker, an extension of the recently published QMaker method, that allows the estimation of time nonreversible amino acid substitution models and rooted phylogenetic trees from a set of protein sequence alignments. We show that the nonreversible models estimated with nQMaker are a much better fit to empirical alignments than pre-existing reversible models, across a wide range of data sets including mammals, birds, plants, fungi, and other taxa, and that the improvements in model fit scale with the size of the data set. Notably, for the recently published plant and bird trees, these nonreversible models correctly recovered the commonly estimated root placements with very high-statistical support without the need to use an outgroup. We provide nQMaker as an easy-to-use feature in the IQ-TREE software (http://www.iqtree.org), allowing users to estimate nonreversible models and rooted phylogenies from their own protein data sets. The data sets and scripts used in this article are available at https://doi.org/10.5061/dryad.3tx95x6hx. [amino acid sequence analyses; amino acid substitution models; maximum likelihood model estimation; nonreversible models; phylogenetic inference; reversible models.].


Subject(s)
Models, Genetic , Software , Amino Acid Substitution , Animals , Evolution, Molecular , Likelihood Functions , Mammals , Phylogeny , Proteins
6.
Neurogenetics ; 22(2): 133-136, 2021 05.
Article in English | MEDLINE | ID: mdl-33674996

ABSTRACT

Variants in the SCN1A gene have been identified in epilepsy patients with widely variable phenotypes and they are generally heterozygous. Here, we report a homozygous missense variant, NM_001165963.4: c.4319C>T (p.Ala1440Val), in the SCN1A gene which seemed to occur de novo together with a gene conversion event. It's highly possible that this variant, although located in a critical functional domain of protein Nav1.1, depending on the nature of the amino acid substitution, may not cause the complete loss of protein function. And the accumulated effect by having this variant on both alleles results in a Dravet syndrome phenotype which is more severe than average. This first report of a de novo homozygous variant in the SCN1A gene, therefore, provides a clear illustration of a complex genotype-phenotype relationship.


Subject(s)
Brain Diseases/etiology , Epilepsies, Myoclonic/genetics , Mutation, Missense , NAV1.1 Voltage-Gated Sodium Channel/genetics , Point Mutation , Amino Acid Substitution , Autism Spectrum Disorder/genetics , Child Behavior Disorders/genetics , Drug Resistant Epilepsy/genetics , Epilepsies, Myoclonic/complications , Genetic Association Studies , Homozygote , Humans , Infant , Male , Protein Domains/genetics , Sleep Wake Disorders/genetics
7.
Syst Biol ; 70(5): 1046-1060, 2021 08 11.
Article in English | MEDLINE | ID: mdl-33616668

ABSTRACT

Amino acid substitution models play a crucial role in phylogenetic analyses. Maximum likelihood (ML) methods have been proposed to estimate amino acid substitution models; however, they are typically complicated and slow. In this article, we propose QMaker, a new ML method to estimate a general time-reversible $Q$ matrix from a large protein data set consisting of multiple sequence alignments. QMaker combines an efficient ML tree search algorithm, a model selection for handling the model heterogeneity among alignments, and the consideration of rate mixture models among sites. We provide QMaker as a user-friendly function in the IQ-TREE software package (http://www.iqtree.org) supporting the use of multiple CPU cores so that biologists can easily estimate amino acid substitution models from their own protein alignments. We used QMaker to estimate new empirical general amino acid substitution models from the current Pfam database as well as five clade-specific models for mammals, birds, insects, yeasts, and plants. Our results show that the new models considerably improve the fit between model and data and in some cases influence the inference of phylogenetic tree topologies.[Amino acid replacement matrices; amino acid substitution models; maximum likelihood estimation; phylogenetic inferences.].


Subject(s)
Evolution, Molecular , Models, Genetic , Animals , Likelihood Functions , Phylogeny , Proteins/genetics , Sequence Alignment
8.
J Mol Evol ; 88(5): 445-452, 2020 07.
Article in English | MEDLINE | ID: mdl-32356020

ABSTRACT

Amino acid substitution models represent substitution rates among amino acids during the evolution. The models play an important role in analyzing protein sequences, especially inferring phylogenies. The rapid evolution of flaviviruses is expanding the threat in public health. A number of models have been estimated for some viruses, however, they are unable to properly represent amino acid substitution patterns of flaviviruses. In this study, we collected protein sequences from the flavivirus genus to specifically estimate an amino acid substitution model, called FLAVI, for flaviviruses. Experiments showed that the collected dataset was sufficient to estimate a stable model. More importantly, the FLAVI model was remarkably better than other existing models in analyzing flavivirus protein sequences. We recommend researchers to use the FLAVI model when studying protein sequences of flaviviruses or closely related viruses.


Subject(s)
Amino Acid Substitution , Flavivirus , Models, Genetic , Amino Acid Sequence , Flavivirus/genetics
9.
BMC Evol Biol ; 18(1): 11, 2018 02 02.
Article in English | MEDLINE | ID: mdl-29390973

ABSTRACT

BACKGROUND: The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony. RESULTS: To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2-20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3-63.9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased. CONCLUSIONS: MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboot .


Subject(s)
Phylogeny , Software , DNA/genetics , Likelihood Functions , Models, Genetic , Sequence Alignment , Time Factors
10.
Mol Biol Evol ; 35(2): 518-522, 2018 02 01.
Article in English | MEDLINE | ID: mdl-29077904

ABSTRACT

The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.


Subject(s)
Likelihood Functions , Phylogeny , Software , Models, Genetic
11.
J Biosci ; 40(1): 113-24, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25740146

ABSTRACT

We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91 percent of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3 percent) SNPs and 59,119 (7.1 percent) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5 percent) were large indels. There were 6,681 large indels in the range 0.1-100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44 percent) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length greater than or equal to 300 bp. There were 235 contigs from the child genome of which 199 (84.7 percent) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.


Subject(s)
Ethnicity/genetics , Genome, Human/genetics , Asian People/genetics , Base Sequence , DNA/analysis , DNA/genetics , Family , Humans , INDEL Mutation/genetics , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA , Vietnam
12.
Mol Biol Evol ; 28(2): 873-7, 2011 Feb.
Article in English | MEDLINE | ID: mdl-20705907

ABSTRACT

Approaches to reconstruct phylogenies abound and are widely used in the study of molecular evolution. Partially through extensive simulations, we are beginning to understand the potential pitfalls as well as the advantages of different methods. However, little work has been done on possible biases introduced by the methods if the input data are random and do not carry any phylogenetic signal. Although Tree-Puzzle (Strimmer K, von Haeseler A. 1996. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol. 13:964-969; Schmidt HA, Strimmer K, Vingron M, von Haeseler A. 2002. Tree-Puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502-504) has become common in phylogenetics, the resulting distribution of labeled unrooted bifurcating trees when data do not carry any phylogenetic signal has not been investigated. Our note shows that the distribution converges to the well-known Yule-Harding distribution. However, the bias of the Yule-Harding distribution will be diminished by a tiny amount of phylogenetic information. maximum likelihood, phylogenetic reconstruction, Tree-Puzzle, tree distribution, Yule-Harding distribution.


Subject(s)
Models, Genetic , Phylogeny , Algorithms , Bayes Theorem
13.
Cladistics ; 26(1): 72-85, 2010 Feb.
Article in English | MEDLINE | ID: mdl-34875752

ABSTRACT

We present POY version 4, an open source program for the phylogenetic analysis of morphological, prealigned sequence, unaligned sequence, and genomic data. POY allows phylogenetic inference when not only substitutions, but insertions, deletions, and rearrangement events are allowed (computed using the breakpoint or inversion distance). Compared with previous versions, POY 4 provides greater flexibility, a larger number of supported parameter sets, numerous execution time improvements, a vastly improved user interface, greater quality control, and extensive documentation. We introduce POY's basic features, and present a simple example illustrating the performance improvements over previous versions of the application. © The Willi Hennig Society 2009.

14.
Genome Inform ; 17(2): 141-51, 2006.
Article in English | MEDLINE | ID: mdl-17503387

ABSTRACT

The increase of available genomes poses new optimization problems in genome comparisons. A genome can be considered as a sequence of characters (loci) which are genes or segments of nucleotides. Genomes are subject to both nucleotide transformation and character order rearrangement processes. In this context, we define a problem of so-called pairwise alignment with rearrangements (PAR) between two genomes. The PAR generalizes the ordinary pairwise alignment by allowing the rearrangement of character order. The objective is to find the optimal PAR that minimizes the total cost which is composed of three factors: the edit cost between characters, the deletion/insertion cost of characters, and the rearrangement cost between character orders. To this end, we propose simple and effective heuristic methods: character moving and simultaneous character swapping. The efficiency of the methods is tested on Metazoa mitochondrial genomes. Experiments show that, pairwise alignments with rearrangements give better performance than ordinary pairwise alignments without rearrangements. The best proposed method, simultaneous character swapping, is implemented as an essential subroutine in our software POY version 4.0 to reconstruct genome-based phylogenies.


Subject(s)
Gene Rearrangement , Genome , Phylogeny , Recombination, Genetic , Sequence Alignment , Algorithms , Animals , Base Sequence , DNA, Mitochondrial/genetics , Databases, Genetic , Evolution, Molecular , Gene Order , Mitochondria/genetics , Models, Genetic , Mutagenesis, Insertional , Software
15.
Bioinformatics ; 21(19): 3794-6, 2005 Oct 01.
Article in English | MEDLINE | ID: mdl-16046495

ABSTRACT

SUMMARY: IQPNNI is a program to infer maximum-likelihood phylogenetic trees from DNA or protein data with a large number of sequences. We present an improved and MPI-parallel implementation showing very good scaling and speed-up behavior.


Subject(s)
Algorithms , Computing Methodologies , Evolution, Molecular , Models, Genetic , Phylogeny , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Amino Acid Sequence , Animals , Base Sequence , Humans , Likelihood Functions , Molecular Sequence Data
16.
BMC Bioinformatics ; 6: 92, 2005 Apr 08.
Article in English | MEDLINE | ID: mdl-15819989

ABSTRACT

BACKGROUND: Understanding the evolutionary relationships among species based on their genetic information is one of the primary objectives in phylogenetic analysis. Reconstructing phylogenies for large data sets is still a challenging task in Bioinformatics. RESULTS: We propose a new distance-based clustering method, the shortest triplet clustering algorithm (STC), to reconstruct phylogenies. The main idea is the introduction of a natural definition of so-called k-representative sets. Based on k-representative sets, shortest triplets are reconstructed and serve as building blocks for the STC algorithm to agglomerate sequences for tree reconstruction in O(n2) time for n sequences. Simulations show that STC gives better topological accuracy than other tested methods that also build a first starting tree. STC appears as a very good method to start the tree reconstruction. However, all tested methods give similar results if balanced nearest neighbor interchange (BNNI) is applied as a post-processing step. BNNI leads to an improvement in all instances. The program is available at http://www.bi.uni-duesseldorf.de/software/stc/. CONCLUSION: The results demonstrate that the new approach efficiently reconstructs phylogenies for large data sets. We found that BNNI boosts the topological accuracy of all methods including STC, therefore, one should use BNNI as a post-processing step to get better topological accuracy.


Subject(s)
Computational Biology/methods , Data Interpretation, Statistical , Algorithms , Base Sequence , Cluster Analysis , Computer Simulation , Computers , Evolution, Molecular , Internet , Likelihood Functions , Models, Genetic , Models, Statistical , Pattern Recognition, Automated , Phylogeny , Sequence Alignment , Sequence Analysis, DNA , Software
17.
Mol Biol Evol ; 21(8): 1565-71, 2004 Aug.
Article in English | MEDLINE | ID: mdl-15163768

ABSTRACT

An efficient tree reconstruction method (IQPNNI) is introduced to reconstruct a phylogenetic tree based on DNA or amino acid sequence data. Our approach combines various fast algorithms to generate a list of potential candidate trees. The key ingredient is the definition of so-called important quartets (IQs), which allow the computation of an intermediate tree in O(n(2)) time for n sequences. The resulting tree is then further optimized by applying the nearest neighbor interchange (NNI) operation. Subsequently a random fraction of the sequences is deleted from the best tree found so far. The deleted sequences are then re-inserted in the smaller tree using the important quartet puzzling (IQP) algorithm. These steps are repeated several times and the best tree, with respect to the likelihood criterion, is considered as the inferred phylogenetic tree. Moreover, we suggest a rule which indicates when to stop the search. Simulations show that IQPNNI gives a slightly better accuracy than other programs tested. Moreover, we applied the approach to 218 small subunit rRNA sequences and 500 rbcL sequences. We found trees with higher likelihood compared to the results by others. A program to reconstruct DNA or amino acid based phylogenetic trees is available online (http://www.bi.uni-duesseldorf.de/software/iqpnni).


Subject(s)
Algorithms , Evolution, Molecular , Models, Genetic , Phylogeny , Amino Acid Sequence , Animals , Base Sequence , Humans , Molecular Sequence Data , Sequence Analysis, DNA/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...