Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 36(24): 5582-5589, 2021 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-33399819

RESUMO

MOTIVATION: Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. RESULTS: We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. AVAILABILITY AND IMPLEMENTATION: We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Bioinformatics ; 36(22-23): 5537-5538, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33300997

RESUMO

SUMMARY: Variant Call Format (VCF), the prevailing representation for germline genotypes in population sequencing, suffers rapid size growth as larger cohorts are sequenced and more rare variants are discovered. We present Sparse Project VCF (spVCF), an evolution of VCF with judicious entropy reduction and run-length encoding, delivering >10× size reduction for modern studies with practically minimal information loss. spVCF interoperates with VCF efficiently, including tabix-based random access. We demonstrate its effectiveness with the DiscovEHR and UK Biobank whole-exome sequencing cohorts. AVAILABILITY AND IMPLEMENTATION: Apache-licensed reference implementation: github.com/mlin/spVCF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Sequência de Bases , Genótipo , Células Germinativas
3.
Gigascience ; 9(10)2020 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-33057676

RESUMO

BACKGROUND: Metagenomic next-generation sequencing (mNGS) has enabled the rapid, unbiased detection and identification of microbes without pathogen-specific reagents, culturing, or a priori knowledge of the microbial landscape. mNGS data analysis requires a series of computationally intensive processing steps to accurately determine the microbial composition of a sample. Existing mNGS data analysis tools typically require bioinformatics expertise and access to local server-class hardware resources. For many research laboratories, this presents an obstacle, especially in resource-limited environments. FINDINGS: We present IDseq, an open source cloud-based metagenomics pipeline and service for global pathogen detection and monitoring (https://idseq.net). The IDseq Portal accepts raw mNGS data, performs host and quality filtration steps, then executes an assembly-based alignment pipeline, which results in the assignment of reads and contigs to taxonomic categories. The taxonomic relative abundances are reported and visualized in an easy-to-use web application to facilitate data interpretation and hypothesis generation. Furthermore, IDseq supports environmental background model generation and automatic internal spike-in control recognition, providing statistics that are critical for data interpretation. IDseq was designed with the specific intent of detecting novel pathogens. Here, we benchmark novel virus detection capability using both synthetically evolved viral sequences and real-world samples, including IDseq analysis of a nasopharyngeal swab sample acquired and processed locally in Cambodia from a tourist from Wuhan, China, infected with the recently emergent SARS-CoV-2. CONCLUSION: The IDseq Portal reduces the barrier to entry for mNGS data analysis and enables bench scientists, clinicians, and bioinformaticians to gain insight from mNGS datasets for both known and novel pathogens.


Assuntos
Betacoronavirus/genética , Computação em Nuvem , Infecções por Coronavirus/virologia , Metagenoma , Metagenômica/métodos , Pneumonia Viral/virologia , Betacoronavirus/patogenicidade , COVID-19 , Infecções por Coronavirus/diagnóstico , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Pandemias , Pneumonia Viral/diagnóstico , SARS-CoV-2 , Software
4.
J Ultrasound Med ; 39(7): 1335-1342, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31995242

RESUMO

OBJECTIVES: To determine patient and procedural risk factors for major complications in ultrasound (US)-guided random renal core biopsy. METHODS: Random renal biopsies performed by radiologists in the US department at a single institution between 2014 and 2018 were retrospectively reviewed. The patient's age, sex, race, and estimated glomerular filtration rate (eGFR) were recorded. The biopsy approach, needle gauge, length of cores, number of throws, and presence of a color flow tract were recorded. Outcome data included minor and major complications. Associations between variables were tested with χ2 analyses and univariable/multivariable logistic regression models. RESULTS: A total of 231 biopsies (167 native and 64 allografts) were reviewed. There was no significant difference in the sex, age, race, or eGFR between native and allograft groups. The overall rate for any complication was 18.2%, with a 4.3% rate of major complications, which was significantly greater in native compared to allograft biopsies (6% versus 0%; P = .045). A risk analysis in native biopsies only showed that major complications were significantly associated with a low eGFR such that patients with stage 4 or 5 kidney disease had higher odds of complications (odds ratio [95% confidence interval]: stage 4, 9.405 [1.995-44.338]; P = .0393; stage 5, 10.749 [2.218-52.080]; P = .0203) than patients with normal function (eGFR >60 mL/min). The presence of a color flow tract portended a 10.7 times greater risk of having any complication (95% confidence interval, 4.595-24.994; P < .001). Other procedural factors were not significantly associated with complications. CONCLUSIONS: There is an increased risk of major complications in US-guided random native kidney biopsy in patients with a low eGFR (<30 mL/min) and a patent color flow tract in the immediate postbiopsy setting.


Assuntos
Biópsia Guiada por Imagem , Ultrassonografia de Intervenção , Biópsia , Biópsia com Agulha de Grande Calibre , Humanos , Rim/diagnóstico por imagem , Estudos Retrospectivos
5.
F1000Res ; 8: 1751, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-34386196

RESUMO

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.

6.
J Ultrasound Med ; 38(3): 581-586, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30043431

RESUMO

OBJECTIVES: Image-guided tissue sampling in the workup of suspected lymphoma can be performed by core needle biopsy (CNB) or CNB with fine-needle aspiration (FNA). We compared the yield of clinically actionable diagnoses between these methods of tissue sampling. METHODS: All ultrasound-guided percutaneous peripheral lymph node biopsies from 2010 to 2017 at a single institution were retrospectively reviewed for biopsy type (CNB versus CNB + FNA), prior diagnosis of lymphoma, size of the target lymph node, number of cores, length of core specimens, and pathologic diagnosis. Lymphoma and lymphoid tissue were included; metastatic disease and nonlymphoid tissue were excluded. An oncologist specializing in lymphoma independently determined whether an actionable diagnosis could be made with the pathologic results in the context of the patient's medical record. χ2 analyses and univariable/multivariable logistic regression models were used for statistical analyses. RESULTS: Of 578 lymph node biopsies, 306 (53%) had a prior diagnosis of lymphoma; 273 (47%) were CNB, and 305 (53%) were CNB + FNA. There was no significant difference between biopsy types (CNB versus CNB + FNA) in the number of cores (median [25th, 75th percentiles], 3 [3, 4] versus 4 [3, 4]; P = .47) or total length of tissue (4.1 [2.5, 6.1] versus 3.7 [2.3, 6] cm; P = .09). There was no difference in obtaining an actionable diagnosis between biopsy types after controlling for a known history of lymphoma (P = .271) or after controlling for the number of core specimens (P = .826). CONCLUSIONS: In cases of suspected lymphoma, CNB without FNA was sufficient to obtain an actionable diagnosis.


Assuntos
Linfonodos/diagnóstico por imagem , Linfonodos/patologia , Linfoma/diagnóstico por imagem , Linfoma/patologia , Ultrassonografia de Intervenção/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Biópsia por Agulha Fina , Biópsia com Agulha de Grande Calibre , Feminino , Humanos , Biópsia Guiada por Imagem/métodos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Adulto Jovem
7.
Nat Biotechnol ; 36(9): 875-879, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30125266

RESUMO

Reference genomes guide our interpretation of DNA sequence data. However, conventional linear references represent only one version of each locus, ignoring variation in the population. Poor representation of an individual's genome sequence impacts read mapping and introduces bias. Variation graphs are bidirected DNA sequence graphs that compactly represent genetic variation across a population, including large-scale structural variation such as inversions and duplications. Previous graph genome software implementations have been limited by scalability or topological constraints. Here we present vg, a toolkit of computational methods for creating, manipulating, and using these structures as references at the scale of the human genome. vg provides an efficient approach to mapping reads onto arbitrary variation graphs using generalized compressed suffix arrays, with improved accuracy over alignment to a linear reference, and effectively removing reference bias. These capabilities make using variation graphs as references for DNA sequencing practical at a gigabase scale, or at the topological complexity of de novo assemblies.


Assuntos
Variação Genética , Simulação por Computador , DNA/genética , Humanos
8.
Mol Biol Evol ; 33(12): 3108-3132, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27604222

RESUMO

Translational stop codon readthrough emerged as a major regulatory mechanism affecting hundreds of genes in animal genomes, based on recent comparative genomics and ribosomal profiling evidence, but its evolutionary properties remain unknown. Here, we leverage comparative genomic evidence across 21 Anopheles mosquitoes to systematically annotate readthrough genes in the malaria vector Anopheles gambiae, and to provide the first study of abundant readthrough evolution, by comparison with 20 Drosophila species. Using improved comparative genomics methods for detecting readthrough, we identify evolutionary signatures of conserved, functional readthrough of 353 stop codons in the malaria vector, Anopheles gambiae, and of 51 additional Drosophila melanogaster stop codons, including several cases of double and triple readthrough and of readthrough of two adjacent stop codons. We find that most differences between the readthrough repertoires of the two species arose from readthrough gain or loss in existing genes, rather than birth of new genes or gene death; that readthrough-associated RNA structures are sometimes gained or lost while readthrough persists; that readthrough is more likely to be lost at TAA and TAG stop codons; and that readthrough is under continued purifying evolutionary selection in mosquito, based on population genetic evidence. We also determine readthrough-associated gene properties that predate readthrough, and identify differences in the characteristic properties of readthrough genes between clades. We estimate more than 600 functional readthrough stop codons in mosquito and 900 in fruit fly, provide evidence of readthrough control of peroxisomal targeting, and refine the phylogenetic extent of abundant readthrough as following divergence from centipede.


Assuntos
Anopheles/genética , Anopheles/metabolismo , Códon de Terminação , Terminação Traducional da Cadeia Peptídica , Animais , Evolução Biológica , Códon , Drosophila melanogaster , Evolução Molecular , Genômica , Fases de Leitura Aberta , Filogenia , Biossíntese de Proteínas , Ribossomos/genética , Ribossomos/metabolismo
9.
Genome Biol ; 16: 38, 2015 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-25853568

RESUMO

BACKGROUND: The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding. RESULTS: We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures. CONCLUSIONS: FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species.


Assuntos
Evolução Molecular , Genoma Viral , Fases de Leitura Aberta/genética , Vírus/genética , Códon/genética , Sequência Conservada , Ebolavirus/genética , Vírus da Hepatite B/genética , Humanos , Vírus Lassa/genética , MicroRNAs/genética , Filogenia , Poliovirus/genética , Alinhamento de Sequência , Mutação Silenciosa/genética , Vírus do Nilo Ocidental/genética
10.
Abdom Imaging ; 40(6): 1666-74, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25488345

RESUMO

OBJECTIVE: To determine the effectiveness of the CT histogram method to characterize indeterminate adrenal nodules above 10 Hounsfield units (HU) on noncontrast CT. MATERIALS AND METHODS: Retrospective review of clinical CT data from January 2005 through 2008 identified 194 indeterminate adrenal nodules (>10 HU on noncontrast CT) in 175 patients. 20 nodules in 18 patients were excluded due to large standard deviation (SD > 30) of HU values. Of the remaining 174 nodules, 131 were classified as benign lipid-poor nodules based on size stability for ≥1 year (104), in- and opposed-phase MRI (17), adrenal washout CT (3), or biopsy (7). 43 were classified as malignant by size increase over a short time (30), avid FDG uptake on PET/CT (15), or biopsy (5). Histogram analysis was performed by drawing a circular region of interest on all adrenal nodules. Mean attenuation, total number of pixels, number of negative pixels, and percentage of negative pixels were recorded for each nodule. RESULTS: At the threshold value of >10% negative pixels, 59/131 benign nodules were correctly characterized, but 1/43 malignant nodules was falsely characterized as benign (sensitivity 45%, specificity 98%, positive predictive value 98%). With a slightly higher threshold value of >15% negative pixels, there were no false benign judgments. 36 nodules had more than 15% negative pixels, all of which were benign (sensitivity 27%, specificity 100%, positive predictive value 100%). In the subgroup of benign nodules measuring 11-20 HU, 80% and 54% were identified with threshold values of >10% and >15% negative pixels, respectively. CONCLUSION: The CT histogram method with a threshold value of >10% negative pixels can identify many benign adrenal nodules with attenuation values >10 HU on unenhanced CT with extremely high specificity. A threshold of >15% negative pixels can achieve 100% specificity. This method is highly robust provided very "noisy" CT examinations (SD > 30) are eliminated.


Assuntos
Neoplasias das Glândulas Suprarrenais/diagnóstico por imagem , Glândulas Suprarrenais/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/métodos , Tomografia Computadorizada por Raios X/métodos , Diagnóstico Diferencial , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Estudos Retrospectivos , Sensibilidade e Especificidade
12.
Radiology ; 265(1): 151-7, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22798224

RESUMO

PURPOSE: To determine which measurement of donor renal size on computed tomographic (CT) angiograms has the greatest correlation with renal function preoperatively in the donor and postoperatively in the transplant recipient. MATERIALS AND METHODS: Informed consent was waived for this retrospective HIPAA-compliant study approved by the institutional review board. Renal length, total volume, and cortical volume were measured on renal donor CT angiograms in 111 patients. Preoperative serum creatinine values for donors and postoperative creatinine values for recipients at hospital discharge and 6, 12, 24, and 36 months after transplant were collected, and estimated glomerular filtration rate (eGFR) was calculated. Correlation coefficients with 95% confidence intervals (CIs) were obtained for renal measures and donor eGFR and for renal measures adjusted to recipient body habitus and posttransplant creatinine level in the recipient. Thresholds were set for adjusted length and volumes, and the odds ratio (OR) for creatinine level less than 1.5 mg/dL at 36 months was calculated. RESULTS: Renal volumes and length were correlated with donor eGFR (r=0.58 [95% CI: 0.44, 0.69] for cortical volume, 0.56 [95% CI: 0.42, 0.68] for total volume, and 0.43 [95% CI: 0.27, 0.57] for renal length). All three measures, adjusted to recipient body habitus, were correlated with recipient renal function from discharge (r=-0.41 to -0.43) up to 36 months after transplantation (r=-0.33 to -0.41). By using a threshold of 1.5 for cortical volume to recipient weight, 2.25 for total volume to recipient weight, and 0.175 for renal length to recipient weight, the odds of creatinine level greater than 1.5 mg/dL were four times as great for smaller kidney-to-recipient weight ratios, a statistically significant pattern for cortical volume (OR, 4.07; 95% CI: 1.10, 15.09) but not total volume (OR, 4.24; 95% CI: 0.90, 20.01) or renal length (OR, 4.08; 95% CI: 0.48-34.29). CONCLUSION: Renal length and volumes correlated with recipient renal function up to 36 months after transplant. A low ratio of cortical volume to recipient weight was associated with diminished renal function at 36 months after transplant.


Assuntos
Angiografia/métodos , Rim/diagnóstico por imagem , Transplante de Fígado , Tomografia Computadorizada por Raios X/métodos , Adolescente , Adulto , Idoso , Biomarcadores/sangue , Intervalos de Confiança , Creatinina/sangue , Feminino , Taxa de Filtração Glomerular , Humanos , Testes de Função Renal , Masculino , Pessoa de Meia-Idade , Nefrectomia , Tamanho do Órgão , Interpretação de Imagem Radiográfica Assistida por Computador , Reprodutibilidade dos Testes , Estudos Retrospectivos
13.
Genome Res ; 22(3): 577-91, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22110045

RESUMO

Long noncoding RNAs (lncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated lncRNAs expressed in cell lines and adult tissues, but a systematic analysis of lncRNAs expressed during vertebrate embryogenesis has been elusive. To identify lncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development. We reconstructed 56,535 high-confidence transcripts in 28,912 loci, recovering the vast majority of expressed RefSeq transcripts while identifying thousands of novel isoforms and expressed loci. We defined a stringent set of 1133 noncoding multi-exonic transcripts expressed during embryogenesis. These include long intergenic ncRNAs (lincRNAs), intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, and precursors for small RNAs (sRNAs). Zebrafish lncRNAs share many of the characteristics of their mammalian counterparts: relatively short length, low exon number, low expression, and conservation levels comparable to that of introns. Subsets of lncRNAs carry chromatin signatures characteristic of genes with developmental functions. The temporal expression profile of lncRNAs revealed two novel properties: lncRNAs are expressed in narrower time windows than are protein-coding genes and are specifically enriched in early-stage embryos. In addition, several lncRNAs show tissue-specific expression and distinct subcellular localization patterns. Integrative computational analyses associated individual lncRNAs with specific pathways and functions, ranging from cell cycle regulation to morphogenesis. Our study provides the first systematic identification of lncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.


Assuntos
Desenvolvimento Embrionário/genética , RNA não Traduzido/genética , Peixe-Zebra/embriologia , Peixe-Zebra/genética , Animais , Cromatina , Análise por Conglomerados , Biologia Computacional/métodos , Expressão Gênica , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Genômica , Camundongos , Fases de Leitura Aberta , Especificidade de Órgãos/genética , Transcrição Gênica
14.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-21993624

RESUMO

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Assuntos
Evolução Molecular , Genoma Humano/genética , Genoma/genética , Mamíferos/genética , Animais , Doença , Éxons/genética , Genômica , Saúde , Humanos , Anotação de Sequência Molecular , Filogenia , RNA/classificação , RNA/genética , Seleção Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNA
15.
Genome Res ; 21(11): 1916-28, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21994248

RESUMO

The degeneracy of the genetic code allows protein-coding DNA and RNA sequences to simultaneously encode additional, overlapping functional elements. A sequence in which both protein-coding and additional overlapping functions have evolved under purifying selection should show increased evolutionary conservation compared to typical protein-coding genes--especially at synonymous sites. In this study, we use genome alignments of 29 placental mammals to systematically locate short regions within human ORFs that show conspicuously low estimated rates of synonymous substitution across these species. The 29-species alignment provides statistical power to locate more than 10,000 such regions with resolution down to nine-codon windows, which are found within more than a quarter of all human protein-coding genes and contain ∼2% of their synonymous sites. We collect numerous lines of evidence that the observed synonymous constraint in these regions reflects selection on overlapping functional elements including splicing regulatory elements, dual-coding genes, RNA secondary structures, microRNA target sites, and developmental enhancers. Our results show that overlapping functional elements are common in mammalian genes, despite the vast genomic landscape.


Assuntos
Genoma , Mamíferos/genética , Fases de Leitura Aberta/genética , Seleção Genética , Animais , Composição de Bases , Sequência de Bases , Códon , Códon de Iniciação , Biologia Computacional , Sequência Conservada , Elementos Facilitadores Genéticos , Éxons , Ordem dos Genes , Genes BRCA1 , Proteínas de Homeodomínio/genética , Humanos , MicroRNAs/metabolismo , Dados de Sequência Molecular , Taxa de Mutação , Conformação de Ácido Nucleico , Nucleossomos/metabolismo , Iniciação Traducional da Cadeia Peptídica , Splicing de RNA , Alinhamento de Sequência , Transcrição Gênica
16.
Genome Res ; 21(12): 2096-113, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21994247

RESUMO

While translational stop codon readthrough is often used by viral genomes, it has been observed for only a handful of eukaryotic genes. We previously used comparative genomics evidence to recognize protein-coding regions in 12 species of Drosophila and showed that for 149 genes, the open reading frame following the stop codon has a protein-coding conservation signature, hinting that stop codon readthrough might be common in Drosophila. We return to this observation armed with deep RNA sequence data from the modENCODE project, an improved higher-resolution comparative genomics metric for detecting protein-coding regions, comparative sequence information from additional species, and directed experimental evidence. We report an expanded set of 283 readthrough candidates, including 16 double-readthrough candidates; these were manually curated to rule out alternatives such as A-to-I editing, alternative splicing, dicistronic translation, and selenocysteine incorporation. We report experimental evidence of translation using GFP tagging and mass spectrometry for several readthrough regions. We find that the set of readthrough candidates differs from other genes in length, composition, conservation, stop codon context, and in some cases, conserved stem-loops, providing clues about readthrough regulation and potential mechanisms. Lastly, we expand our studies beyond Drosophila and find evidence of abundant readthrough in several other insect species and one crustacean, and several readthrough candidates in nematode and human, suggesting that functionally important translational stop codon readthrough is significantly more prevalent in Metazoa than previously recognized.


Assuntos
Códon de Terminação/fisiologia , Genes de Insetos/fisiologia , Fases de Leitura Aberta/fisiologia , Biossíntese de Proteínas/fisiologia , Animais , Proteínas de Drosophila/biossíntese , Proteínas de Drosophila/genética , Drosophila melanogaster , Humanos
17.
Nat Genet ; 43(7): 621-9, 2011 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-21642992

RESUMO

Transcription of long noncoding RNAs (lncRNAs) within gene regulatory elements can modulate gene activity in response to external stimuli, but the scope and functions of such activity are not known. Here we use an ultrahigh-density array that tiles the promoters of 56 cell-cycle genes to interrogate 108 samples representing diverse perturbations. We identify 216 transcribed regions that encode putative lncRNAs, many with RT-PCR-validated periodic expression during the cell cycle, show altered expression in human cancers and are regulated in expression by specific oncogenic stimuli, stem cell differentiation or DNA damage. DNA damage induces five lncRNAs from the CDKN1A promoter, and one such lncRNA, named PANDA, is induced in a p53-dependent manner. PANDA interacts with the transcription factor NF-YA to limit expression of pro-apoptotic genes; PANDA depletion markedly sensitized human fibroblasts to apoptosis by doxorubicin. These findings suggest potentially widespread roles for promoter lncRNAs in cell-growth control.


Assuntos
Genes cdc/fisiologia , Neoplasias/genética , Regiões Promotoras Genéticas/genética , RNA não Traduzido/genética , Transcrição Gênica/genética , Apoptose , Biomarcadores/metabolismo , Ciclo Celular/fisiologia , Diferenciação Celular , Imunoprecipitação da Cromatina , Dano ao DNA , Perfilação da Expressão Gênica , Humanos , Imunoprecipitação , Dados de Sequência Molecular , Neoplasias/patologia , Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Ativação Transcricional
18.
Bioinformatics ; 27(13): i275-82, 2011 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-21685081

RESUMO

MOTIVATION: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multispecies nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. RESULTS: We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE, and as interest grows in long non-coding RNAs, often initially recognized by their lack of protein coding potential rather than conserved RNA secondary structures. AVAILABILITY AND IMPLEMENTATION: The Objective Caml source code and executables for GNU/Linux and Mac OS X are freely available at http://compbio.mit.edu/PhyloCSF CONTACT: mlin@mit.edu; manoli@mit.edu.


Assuntos
Drosophila melanogaster/genética , Genômica/métodos , Fases de Leitura Aberta , Alinhamento de Sequência/métodos , Animais , Sequência de Bases , Drosophila/classificação , Drosophila/genética , Perfilação da Expressão Gênica , Mamíferos/genética , Schizosaccharomyces/genética
19.
Science ; 332(6032): 930-6, 2011 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-21511999

RESUMO

The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus--occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.


Assuntos
Genoma Fúngico , Schizosaccharomyces/genética , Centrômero/genética , Centrômero/fisiologia , Centrômero/ultraestrutura , Elementos de DNA Transponíveis , Evolução Molecular , Perfilação da Expressão Gênica , Regulação Fúngica da Expressão Gênica , Genes Fúngicos Tipo Acasalamento , Genômica , Glucose/metabolismo , Meiose , Anotação de Sequência Molecular , Dados de Sequência Molecular , Filogenia , RNA Antissenso/genética , RNA Fúngico/genética , RNA Interferente Pequeno/genética , RNA não Traduzido/genética , Elementos Reguladores de Transcrição , Schizosaccharomyces/crescimento & desenvolvimento , Schizosaccharomyces/metabolismo , Proteínas de Schizosaccharomyces pombe/genética , Proteínas de Schizosaccharomyces pombe/metabolismo , Análise de Sequência de DNA , Especificidade da Espécie , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica
20.
PLoS One ; 6(2): e17034, 2011 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-21340033

RESUMO

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1-4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download.


Assuntos
Bases de Dados de Ácidos Nucleicos/normas , Anotação de Sequência Molecular/normas , Projetos de Pesquisa , Análise de Sequência de DNA/normas , Animais , Mapeamento Cromossômico/métodos , Genoma/genética , Genômica/métodos , Humanos , Mamíferos/genética , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...