Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 13(2): e0192081, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29389981

RESUMO

BACKGROUND & METHODS: The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. RESULTS: The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. CONCLUSIONS: The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters.


Assuntos
Genoma Viral , HIV-1/classificação , Adulto , Feminino , HIV-1/genética , Humanos , Londres , Masculino , Pessoa de Meia-Idade , Filogenia , Recombinação Genética
2.
Artif Intell Med ; 69: 22-32, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27235802

RESUMO

OBJECTIVE: This work aims at predicting the patient discharge outcome on each hospitalization day by introducing a new paradigm-evolving classification of event data streams. Most classification algorithms implicitly assume the values of all predictive features to be available at the time of making the prediction. This assumption does not necessarily hold in the evolving classification setting (such as intensive care patient monitoring), where we may be interested in classifying the monitored entities as early as possible, based on the attributes initially available to the classifier, and then keep refining our classification model at each time step (e.g., on daily basis) with the arrival of additional attributes. MATERIALS AND METHODS: An oblivious read-once decision-tree algorithm, called information network (IN), is extended to deal with evolving classification. The new algorithm, named incremental information network (IIN), restricts the order of selected features by the temporal order of feature arrival. The IIN algorithm is compared to six other evolving classification approaches on an 8-year dataset of adult patients admitted to two Intensive Care Units (ICUs) in the United Kingdom. RESULTS: Retrospective study of 3452 episodes of adult patients (≥16years of age) admitted to the ICUs of Guy's and St. Thomas' hospitals in London between 2002 and 2009. Random partition (66:34) into a development (training) set n=2287 and validation set n=1165. Episode-related time steps: Day 0-time of ICU admission, Day x-end of the x-th day at ICU. The most accurate decision-tree models, based on the area under curve (AUC): Day 0: IN (AUC=0.652), Day 1: IIN (AUC=0.660), Day 2: J48 decision-tree algorithm (AUC=0.678), Days 3-7: regenerative IN (AUC=0.717-0.772). Logistic regression AUC: 0.582 (Day 0)-0.827 (Day 7). CONCLUSIONS: Our experimental results have not identified a single optimal approach for evolving classification of ICU episodes. On Days 0 and 1, the IIN algorithm has produced the simplest and the most accurate models, which incorporate the temporal order of feature arrival. However, starting with Day 2, regenerative approaches have reached better performance in terms of predictive accuracy.


Assuntos
Algoritmos , Árvores de Decisões , Unidades de Terapia Intensiva/estatística & dados numéricos , Área Sob a Curva , Cuidados Críticos , Humanos , Modelos Logísticos , Redes Neurais de Computação , Estudos Retrospectivos
3.
Nucleic Acids Res ; 42(Web Server issue): W252-8, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24782522

RESUMO

Protein structure homology modelling has become a routine technique to generate 3D models for proteins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable models without the need for complex software packages or downloading large databases. Here, we describe the latest version of the SWISS-MODEL expert system for protein structure modelling. The SWISS-MODEL template library provides annotation of quaternary structure and essential ligands and co-factors to allow for building of complete structural models, including their oligomeric structure. The improved SWISS-MODEL pipeline makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models. The accuracy of the models generated by SWISS-MODEL is continuously evaluated by the CAMEO system. The new web site allows users to interactively search for templates, cluster them by sequence similarity, structurally compare alternative templates and select the ones to be used for model building. In cases where multiple alternative template structures are available for a protein of interest, a user-guided template selection step allows building models in different functional states. SWISS-MODEL is available at http://swissmodel.expasy.org/.


Assuntos
Modelos Moleculares , Estrutura Quaternária de Proteína , Estrutura Terciária de Proteína , Software , Homologia Estrutural de Proteína , Evolução Molecular , Internet
4.
Proteins ; 82 Suppl 2: 154-63, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24339001

RESUMO

The identification of amino acid residues in proteins involved in binding small molecule ligands is an important step for their functional characterization, as the function of a protein often depends on specific interactions with other molecules. The accuracy of computational methods aiming to predict such binding residues was evaluated within the "function prediction (prediction of binding sites, FN)" category of the critical assessment of protein structure prediction (CASP) experiment. In the last edition of the experiment (CASP10), 17 research groups participated in this category, and their predictions were evaluated on 13 prediction targets containing biologically relevant ligands. The results of this experiment indicate that several methods achieved an overall good performance, showing the usefulness of such methods in predicting ligand binding residues. As in previous years, methods based on a homology transfer approach were dominating. In comparison to CASP9, a larger fraction of the top predictors are automated servers. However, due to the small number of targets and the characteristics of the prediction format, the differences observed among the first ten methods were not statistically significant and it was also not possible to analyze differences in accuracy for different ligand types or overall structure, difficulty. To overcome these limitations and to allow for a more detailed evaluation, in future editions of CASP, methods in the FN category will no longer be evaluated on the "normal" CASP targets, but assessed continuously by CAMEO (continuous automated model evaluation) based on weekly prereleased sequences from the PDB.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Conformação Proteica , Proteínas , Modelos Moleculares , Modelos Estatísticos , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos
5.
Proteins ; 79 Suppl 10: 126-36, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21987472

RESUMO

Interactions between proteins and their ligands play central roles in many physiological processes. The structural details for most of these interactions, however, have not yet been characterized experientially. Therefore, various computational tools have been developed to predict the location of binding sites and the amino acid residues interacting with ligands. In this manuscript, we assess the performance of 33 methods participating in the ligand-binding site prediction category in CASP9. The overall accuracy of ligand-binding site predictions in CASP9 appears rather high (average Matthews correlation coefficient of 0.62 for the 10 top performing groups) and compared to previous experiments more groups performed equally well. However, this should be seen in context of a strong bias in the test data toward easy template-based models. Overall, the top performing methods have converged to a similar approach using ligand-binding site inference from related homologous structures, which limits their applicability for difficult de novo prediction targets. Here, we present the results of the CASP9 assessment of the ligand-binding site category, discuss examples for successful and challenging prediction targets in CASP9, and finally suggest changes in the format of the experiment to overcome the current limitations of the assessment.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Sequência de Aminoácidos , Sítios de Ligação , Bases de Dados de Proteínas , Ligantes , Modelos Biológicos , Modelos Moleculares , Ligação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...