Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
1.
Bioinformatics ; 38(Suppl 1): i84-i91, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758812

RESUMO

MOTIVATION: Molecular carcinogenicity is a preventable cause of cancer, but systematically identifying carcinogenic compounds, which involves performing experiments on animal models, is expensive, time consuming and low throughput. As a result, carcinogenicity information is limited and building data-driven models with good prediction accuracy remains a major challenge. RESULTS: In this work, we propose CONCERTO, a deep learning model that uses a graph transformer in conjunction with a molecular fingerprint representation for carcinogenicity prediction from molecular structure. Special efforts have been made to overcome the data size constraint, such as multi-round pre-training on related but lower quality mutagenicity data, and transfer learning from a large self-supervised model. Extensive experiments demonstrate that our model performs well and can generalize to external validation sets. CONCERTO could be useful for guiding future carcinogenicity experiments and provide insight into the molecular basis of carcinogenicity. AVAILABILITY AND IMPLEMENTATION: The code and data underlying this article are available on github at https://github.com/bowang-lab/CONCERTO.


Assuntos
Carcinógenos , Redes Neurais de Computação , Animais , Carcinógenos/toxicidade , Previsões , Mutagênicos
2.
NPJ Genom Med ; 5: 16, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32284880

RESUMO

Wilson disease is a recessive genetic disorder caused by pathogenic loss-of-function variants in the ATP7B gene. It is characterized by disrupted copper homeostasis resulting in liver disease and/or neurological abnormalities. The variant NM_000053.3:c.1934T > G (Met645Arg) has been reported as compound heterozygous, and is highly prevalent among Wilson disease patients of Spanish descent. Accordingly, it is classified as pathogenic by leading molecular diagnostic centers. However, functional studies suggest that the amino acid change does not alter protein function, leading one ClinVar submitter to question its pathogenicity. Here, we used a minigene system and gene-edited HepG2 cells to demonstrate that c.1934T > G causes ~70% skipping of exon 6. Exon 6 skipping results in frameshift and stop-gain, leading to loss of ATP7B function. The elucidation of the mechanistic effect for this variant resolves any doubt about its pathogenicity and enables the development of genetic medicines for restoring correct splicing.

3.
Nat Biotechnol ; 36(9): 829-838, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30188539

RESUMO

Deep learning is beginning to impact biological research and biomedical applications as a result of its ability to integrate vast datasets, learn arbitrarily complex relationships and incorporate existing knowledge. Already, deep learning models can predict, with varying degrees of success, how genetic variation alters cellular processes involved in pathogenesis, which small molecules will modulate the activity of therapeutically relevant proteins, and whether radiographic images are indicative of disease. However, the flexibility of deep learning creates new challenges in guaranteeing the performance of deployed systems and in establishing trust with stakeholders, clinicians and regulators, who require a rationale for decision making. We argue that these challenges will be overcome using the same flexibility that created them; for example, by training deep models so that they can output a rationale for their predictions. Significant research in this direction will be needed to realize the full potential of deep learning in biomedicine.


Assuntos
Aprendizado Profundo , Algoritmos , Humanos
4.
Bioinformatics ; 34(13): i429-i437, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29949959

RESUMO

Motivation: Alternative splice site selection is inherently competitive and the probability of a given splice site to be used also depends on the strength of neighboring sites. Here, we present a new model named the competitive splice site model (COSSMO), which explicitly accounts for these competitive effects and predicts the percent selected index (PSI) distribution over any number of putative splice sites. We model an alternative splicing event as the choice of a 3' acceptor site conditional on a fixed upstream 5' donor site or the choice of a 5' donor site conditional on a fixed 3' acceptor site. We build four different architectures that use convolutional layers, communication layers, long short-term memory and residual networks, respectively, to learn relevant motifs from sequence alone. We also construct a new dataset from genome annotations and RNA-Seq read data that we use to train our model. Results: COSSMO is able to predict the most frequently used splice site with an accuracy of 70% on unseen test data, and achieve an R2 of 0.6 in modeling the PSI distribution. We visualize the motifs that COSSMO learns from sequence and show that COSSMO recognizes the consensus splice site sequences and many known splicing factors with high specificity. Availability and implementation: Model predictions, our training dataset, and code are available from http://cossmo.genes.toronto.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , Aprendizado Profundo , Sítios de Splice de RNA , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Humanos , Modelos Genéticos , Probabilidade , Software
5.
Bioinformatics ; 34(17): 2889-2898, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29648582

RESUMO

Motivation: Processing of transcripts at the 3'-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which site is cleaved, the process of alternative polyadenylation enables genes to produce transcript isoforms with different 3'-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the sequence determinants underlying this regulatory process, a computational model that can accurately predict polyadenylation patterns from genomic features is desirable. Results: Previous works have focused on identifying candidate polyadenylation sites and classifying tissue-specific sites. By training on how multiple sites in genes are competitively selected for polyadenylation from 3'-end sequencing data, we developed a deep learning model that can predict the tissue-specific strength of a polyadenylation site in the 3' untranslated region of the human genome given only its genomic sequence. We demonstrate the model's broad utility on multiple tasks, without any application-specific training. The model can be used to predict which polyadenylation site is more likely to be selected in genes with multiple sites. It can be used to scan the 3' untranslated region to find candidate polyadenylation sites. It can be used to classify the pathogenicity of variants near annotated polyadenylation sites in ClinVar. It can also be used to anticipate the effect of antisense oligonucleotide experiments to redirect polyadenylation. We provide analysis on how different features affect the model's predictive performance and a method to identify sensitive regions of the genome at the single-based resolution that can affect polyadenylation regulation. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Poliadenilação , Regiões 3' não Traduzidas , Regulação da Expressão Gênica , Genoma Humano , Genômica , Humanos , Poli A
6.
Nat Med ; 23(8): 984-989, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28714989

RESUMO

Splice-site defects account for about 10% of pathogenic mutations that cause Mendelian diseases. Prevalence is higher in neuromuscular disorders (NMDs), owing to the unusually large size and multi-exonic nature of genes encoding muscle structural proteins. Therapeutic genome editing to correct disease-causing splice-site mutations has been accomplished only through the homology-directed repair pathway, which is extremely inefficient in postmitotic tissues such as skeletal muscle. Here we describe a strategy using nonhomologous end-joining (NHEJ) to correct a pathogenic splice-site mutation. As a proof of principle, we focus on congenital muscular dystrophy type 1A (MDC1A), which is characterized by severe muscle wasting and paralysis. Specifically, we correct a splice-site mutation that causes the exclusion of exon 2 from Lama2 mRNA and the truncation of Lama2 protein in the dy2J/dy2J mouse model of MDC1A. Through systemic delivery of adeno-associated virus (AAV) carrying clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 genome-editing components, we simultaneously excise an intronic region containing the mutation and create a functional donor splice site through NHEJ. This strategy leads to the inclusion of exon 2 in the Lama2 transcript and restoration of full-length Lama2 protein. Treated dy2J/dy2J mice display substantial improvement in muscle histopathology and function without signs of paralysis.


Assuntos
Reparo do DNA por Junção de Extremidades , Terapia Genética/métodos , Laminina/genética , Distrofias Musculares/genética , Sítios de Splice de RNA/genética , RNA Mensageiro/genética , Animais , Western Blotting , Sistemas CRISPR-Cas , Modelos Animais de Doenças , Imunofluorescência , Laminina/metabolismo , Camundongos , Músculo Esquelético/metabolismo , Músculo Esquelético/patologia , Distrofias Musculares/patologia , Mutação , Reação em Cadeia da Polimerase em Tempo Real
7.
Mol Syst Biol ; 13(4): 924, 2017 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-28420678

RESUMO

Existing computational pipelines for quantitative analysis of high-content microscopy data rely on traditional machine learning approaches that fail to accurately classify more than a single dataset without substantial tuning and training, requiring extensive analysis. Here, we demonstrate that the application of deep learning to biological image data can overcome the pitfalls associated with conventional machine learning classifiers. Using a deep convolutional neural network (DeepLoc) to analyze yeast cell images, we show improved performance over traditional approaches in the automated classification of protein subcellular localization. We also demonstrate the ability of DeepLoc to classify highly divergent image sets, including images of pheromone-arrested cells with abnormal cellular morphology, as well as images generated in different genetic backgrounds and in different laboratories. We offer an open-source implementation that enables updating DeepLoc on new microscopy datasets. This study highlights deep learning as an important tool for the expedited analysis of high-content microscopy data.


Assuntos
Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/ultraestrutura , Biologia de Sistemas/métodos , Aprendizado de Máquina , Microscopia , Redes Neurais de Computação , Saccharomyces cerevisiae/metabolismo
8.
BMC Genomics ; 17(1): 787, 2016 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-27717327

RESUMO

BACKGROUND: Alternative mRNA splicing is critical to proteomic diversity and tissue and species differentiation. Exclusion of cassette exons, also called exon skipping, is the most common type of alternative splicing in mammals. RESULTS: We present a computational model that predicts absolute (though not tissue-differential) percent-spliced-in of cassette exons more accurately than previous models, despite not using any 'hand-crafted' biological features such as motif counts. We achieve nearly identical performance using only the conservation score (mammalian phastCons) of each splice junction normalized by average conservation over 100 bp of the corresponding flanking intron, demonstrating that conservation is an unexpectedly powerful indicator of alternative splicing patterns. Using this method, we provide evidence that intronic splicing regulation occurs predominantly within 100 bp of the alternative splice sites and that conserved elements in this region are, as expected, functioning as splicing regulators. We show that among conserved cassette exons, increased conservation of flanking introns is associated with reduced inclusion. We also propose a new definition of intronic splicing regulatory elements (ISREs) that is independent of conservation, and show that most ISREs do not match known binding sites or splicing factors despite being predictive of percent-spliced-in. CONCLUSIONS: These findings suggest that one mechanism for the evolutionary transition from constitutive to alternative splicing is the emergence of cis-acting splicing inhibitors. The association of our ISREs with differences in splicing suggests the existence of novel RNA-binding proteins and/or novel splicing roles for known RNA-binding proteins.


Assuntos
Processamento Alternativo , Evolução Molecular , Modelos Biológicos , Animais , Área Sob a Curva , Encéfalo/metabolismo , Éxons , Regulação da Expressão Gênica , Humanos , Íntrons , Especificidade de Órgãos/genética , Sítios de Splice de RNA , Sequências Reguladoras de Ácido Nucleico
9.
NPJ Genom Med ; 1: 160271-1602710, 2016 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-27525107

RESUMO

De novo mutations (DNMs) are important in Autism Spectrum Disorder (ASD), but so far analyses have mainly been on the ~1.5% of the genome encoding genes. Here, we performed whole genome sequencing (WGS) of 200 ASD parent-child trios and characterized germline and somatic DNMs. We confirmed that the majority of germline DNMs (75.6%) originated from the father, and these increased significantly with paternal age only (p=4.2×10-10). However, when clustered DNMs (those within 20kb) were found in ASD, not only did they mostly originate from the mother (p=7.7×10-13), but they could also be found adjacent to de novo copy number variations (CNVs) where the mutation rate was significantly elevated (p=2.4×10-24). By comparing DNMs detected in controls, we found a significant enrichment of predicted damaging DNMs in ASD cases (p=8.0×10-9; OR=1.84), of which 15.6% (p=4.3×10-3) and 22.5% (p=7.0×10-5) were in the non-coding or genic non-coding, respectively. The non-coding elements most enriched for DNM were untranslated regions of genes, boundaries involved in exon-skipping and DNase I hypersensitive regions. Using microarrays and a novel outlier detection test, we also found aberrant methylation profiles in 2/185 (1.1%) of ASD cases. These same individuals carried independently identified DNMs in the ASD risk- and epigenetic- genes DNMT3A and ADNP. Our data begins to characterize different genome-wide DNMs, and highlight the contribution of non-coding variants, to the etiology of ASD.

10.
Bioinformatics ; 32(12): i52-i59, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307644

RESUMO

MOTIVATION: High-content screening (HCS) technologies have enabled large scale imaging experiments for studying cell biology and for drug screening. These systems produce hundreds of thousands of microscopy images per day and their utility depends on automated image analysis. Recently, deep learning approaches that learn feature representations directly from pixel intensity values have dominated object recognition challenges. These tasks typically have a single centered object per image and existing models are not directly applicable to microscopy datasets. Here we develop an approach that combines deep convolutional neural networks (CNNs) with multiple instance learning (MIL) in order to classify and segment microscopy images using only whole image level annotations. RESULTS: We introduce a new neural network architecture that uses MIL to simultaneously classify and segment microscopy images with populations of cells. We base our approach on the similarity between the aggregation function used in MIL and pooling layers used in CNNs. To facilitate aggregating across large numbers of instances in CNN feature maps we present the Noisy-AND pooling function, a new MIL operator that is robust to outliers. Combining CNNs with MIL enables training CNNs using whole microscopy images with image level labels. We show that training end-to-end MIL CNNs outperforms several previous methods on both mammalian and yeast datasets without requiring any segmentation steps. AVAILABILITY AND IMPLEMENTATION: Torch7 implementation available upon request. CONTACT: oren.kraus@mail.utoronto.ca.


Assuntos
Interpretação de Imagem Assistida por Computador , Aprendizado de Máquina , Microscopia , Algoritmos , Humanos , Redes Neurais de Computação , Leveduras/citologia
11.
Crit Rev Biochem Mol Biol ; 51(2): 102-9, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26806341

RESUMO

High Content Screening (HCS) technologies that combine automated fluorescence microscopy with high throughput biotechnology have become powerful systems for studying cell biology and drug screening. These systems can produce more than 100 000 images per day, making their success dependent on automated image analysis. In this review, we describe the steps involved in quantifying microscopy images and different approaches for each step. Typically, individual cells are segmented from the background using a segmentation algorithm. Each cell is then quantified by extracting numerical features, such as area and intensity measurements. As these feature representations are typically high dimensional (>500), modern machine learning algorithms are used to classify, cluster and visualize cells in HCS experiments. Machine learning algorithms that learn feature representations, in addition to the classification or clustering task, have recently advanced the state of the art on several benchmarking tasks in the computer vision community. These techniques have also recently been applied to HCS image analysis.


Assuntos
Processamento de Imagem Assistida por Computador , Microscopia de Fluorescência , Algoritmos , Biotecnologia , Aprendizado de Máquina , Software , Visão Ocular
12.
NPJ Genom Med ; 12016 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-28567303

RESUMO

The standard of care for first-tier clinical investigation of the etiology of congenital malformations and neurodevelopmental disorders is chromosome microarray analysis (CMA) for copy number variations (CNVs), often followed by gene(s)-specific sequencing searching for smaller insertion-deletions (indels) and single nucleotide variant (SNV) mutations. Whole genome sequencing (WGS) has the potential to capture all classes of genetic variation in one experiment; however, the diagnostic yield for mutation detection of WGS compared to CMA, and other tests, needs to be established. In a prospective study we utilized WGS and comprehensive medical annotation to assess 100 patients referred to a paediatric genetics service and compared the diagnostic yield versus standard genetic testing. WGS identified genetic variants meeting clinical diagnostic criteria in 34% of cases, representing a 4-fold increase in diagnostic rate over CMA (8%) (p-value = 1.42e-05) alone and >2-fold increase in CMA plus targeted gene sequencing (13%) (p-value = 0.0009). WGS identified all rare clinically significant CNVs that were detected by CMA. In 26 patients, WGS revealed indel and missense mutations presenting in a dominant (63%) or a recessive (37%) manner. We found four subjects with mutations in at least two genes associated with distinct genetic disorders, including two cases harboring a pathogenic CNV and SNV. When considering medically actionable secondary findings in addition to primary WGS findings, 38% of patients would benefit from genetic counseling. Clinical implementation of WGS as a primary test will provide a higher diagnostic yield than conventional genetic testing and potentially reduce the time required to reach a genetic diagnosis.

13.
G3 (Bethesda) ; 5(11): 2453-61, 2015 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-26384369

RESUMO

Chromosome 22q11.2 microdeletions impart a high but incomplete risk for schizophrenia. Possible mechanisms include genome-wide effects of DGCR8 haploinsufficiency. In a proof-of-principle study to assess the power of this model, we used high-quality, whole-genome sequencing of nine individuals with 22q11.2 deletions and extreme phenotypes (schizophrenia, or no psychotic disorder at age >50 years). The schizophrenia group had a greater burden of rare, damaging variants impacting protein-coding neurofunctional genes, including genes involved in neuron projection (nominal P = 0.02, joint burden of three variant types). Variants in the intact 22q11.2 region were not major contributors. Restricting to genes affected by a DGCR8 mechanism tended to amplify between-group differences. Damaging variants in highly conserved long intergenic noncoding RNA genes also were enriched in the schizophrenia group (nominal P = 0.04). The findings support the 22q11.2 deletion model as a threshold-lowering first hit for schizophrenia risk. If applied to a larger and thus better-powered cohort, this appears to be a promising approach to identify genome-wide rare variants in coding and noncoding sequence that perturb gene networks relevant to idiopathic schizophrenia. Similarly designed studies exploiting genetic models may prove useful to help delineate the genetic architecture of other complex phenotypes.


Assuntos
Síndrome de DiGeorge/complicações , Genoma Humano , Esquizofrenia/genética , Adolescente , Adulto , Estudos de Casos e Controles , Síndrome de DiGeorge/genética , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , RNA Longo não Codificante/genética , Proteínas de Ligação a RNA/genética , Esquizofrenia/epidemiologia
14.
Nat Biotechnol ; 33(8): 831-8, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26213851

RESUMO

Knowing the sequence specificities of DNA- and RNA-binding proteins is essential for developing models of the regulatory processes in biological systems and for identifying causal disease variants. Here we show that sequence specificities can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for pattern discovery. Using a diverse array of experimental data and evaluation metrics, we find that deep learning outperforms other state-of-the-art methods, even when training on in vitro data and testing on in vivo data. We call this approach DeepBind and have built a stand-alone software tool that is fully automatic and handles millions of sequences per experiment. Specificities determined by DeepBind are readily visualized as a weighted ensemble of position weight matrices or as a 'mutation map' that indicates how variations affect binding within a specific sequence.


Assuntos
Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a RNA/química , Análise de Sequência de Proteína/métodos , Software , Matrizes de Pontuação de Posição Específica
15.
Nat Biotechnol ; 33(5): 555-62, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25690854

RESUMO

Cys2-His2 zinc finger (C2H2-ZF) proteins represent the largest class of putative human transcription factors. However, for most C2H2-ZF proteins it is unknown whether they even bind DNA or, if they do, to which sequences. Here, by combining data from a modified bacterial one-hybrid system with protein-binding microarray and chromatin immunoprecipitation analyses, we show that natural C2H2-ZFs encoded in the human genome bind DNA both in vitro and in vivo, and we infer the DNA recognition code using DNA-binding data for thousands of natural C2H2-ZF domains. In vivo binding data are generally consistent with our recognition code and indicate that C2H2-ZF proteins recognize more motifs than all other human transcription factors combined. We provide direct evidence that most KRAB-containing C2H2-ZF proteins bind specific endogenous retroelements (EREs), ranging from currently active to ancient families. The majority of C2H2-ZF proteins, including KRAB proteins, also show widespread binding to regulatory regions, indicating that the human genome contains an extensive and largely unstudied adaptive C2H2-ZF regulatory network that targets a diverse range of genes and pathways.


Assuntos
Proteínas de Transporte/metabolismo , Genoma Humano , Proteínas Nucleares/metabolismo , Proteínas Repressoras/metabolismo , Retroelementos/genética , Proteínas de Transporte/genética , Cromatina/metabolismo , Proteínas de Ligação a DNA/genética , Regulação da Expressão Gênica , Humanos , Proteínas Nucleares/genética , Ligação Proteica , Sequências Reguladoras de Ácido Nucleico , Proteínas Repressoras/genética
16.
Science ; 347(6218): 1254806, 2015 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-25525159

RESUMO

To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.


Assuntos
Inteligência Artificial , Transtornos Globais do Desenvolvimento Infantil/genética , Neoplasias Colorretais Hereditárias sem Polipose/genética , Estudo de Associação Genômica Ampla/métodos , Anotação de Sequência Molecular/métodos , Atrofia Muscular Espinal/genética , Splicing de RNA/genética , Proteínas Adaptadoras de Transdução de Sinal/genética , Simulação por Computador , DNA/genética , Éxons/genética , Código Genético , Marcadores Genéticos , Variação Genética , Humanos , Íntrons/genética , Modelos Genéticos , Proteína 1 Homóloga a MutL , Mutação de Sentido Incorreto , Proteínas Nucleares/genética , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sítios de Splice de RNA/genética , Proteínas de Ligação a RNA/genética
17.
Genome Res ; 24(11): 1774-86, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25258385

RESUMO

Alternative splicing (AS) of precursor RNAs is responsible for greatly expanding the regulatory and functional capacity of eukaryotic genomes. Of the different classes of AS, intron retention (IR) is the least well understood. In plants and unicellular eukaryotes, IR is the most common form of AS, whereas in animals, it is thought to represent the least prevalent form. Using high-coverage poly(A)(+) RNA-seq data, we observe that IR is surprisingly frequent in mammals, affecting transcripts from as many as three-quarters of multiexonic genes. A highly correlated set of cis features comprising an "IR code" reliably discriminates retained from constitutively spliced introns. We show that IR acts widely to reduce the levels of transcripts that are less or not required for the physiology of the cell or tissue type in which they are detected. This "transcriptome tuning" function of IR acts through both nonsense-mediated mRNA decay and nuclear sequestration and turnover of IR transcripts. We further show that IR is linked to a cross-talk mechanism involving localized stalling of RNA polymerase II (Pol II) and reduced availability of spliceosomal components. Collectively, the results implicate a global checkpoint-type mechanism whereby reduced recruitment of splicing components coupled to Pol II pausing underlies widespread IR-mediated suppression of inappropriately expressed transcripts.


Assuntos
Processamento Alternativo , Íntrons/genética , Mamíferos/genética , Transcriptoma/genética , Células 3T3 , Animais , Diferenciação Celular/genética , Linhagem Celular , Linhagem Celular Tumoral , Células Cultivadas , Evolução Molecular , Células HeLa , Humanos , Células K562 , Mamíferos/classificação , Camundongos , Modelos Genéticos , Especificidade de Órgãos , Análise de Componente Principal , RNA Polimerase II/metabolismo , Precursores de RNA/genética , Precursores de RNA/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Especificidade da Espécie , Vertebrados/classificação , Vertebrados/genética
18.
Bioinformatics ; 30(12): i121-9, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24931975

RESUMO

MOTIVATION: Alternative splicing (AS) is a regulated process that directs the generation of different transcripts from single genes. A computational model that can accurately predict splicing patterns based on genomic features and cellular context is highly desirable, both in understanding this widespread phenomenon, and in exploring the effects of genetic variations on AS. METHODS: Using a deep neural network, we developed a model inferred from mouse RNA-Seq data that can predict splicing patterns in individual tissues and differences in splicing patterns across tissues. Our architecture uses hidden variables that jointly represent features in genomic sequences and tissue types when making predictions. A graphics processing unit was used to greatly reduce the training time of our models with millions of parameters. RESULTS: We show that the deep architecture surpasses the performance of the previous Bayesian method for predicting AS patterns. With the proper optimization procedure and selection of hyperparameters, we demonstrate that deep architectures can be beneficial, even with a moderately sparse dataset. An analysis of what the model has learned in terms of the genomic features is presented.


Assuntos
Processamento Alternativo , Inteligência Artificial , Algoritmos , Animais , Teorema de Bayes , Genômica/métodos , Humanos , Camundongos , Redes Neurais de Computação , Análise de Sequência de RNA
19.
Nat Genet ; 46(7): 742-7, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24859339

RESUMO

A universal challenge in genetic studies of autism spectrum disorders (ASDs) is determining whether a given DNA sequence alteration will manifest as disease. Among different population controls, we observed, for specific exons, an inverse correlation between exon expression level in brain and burden of rare missense mutations. For genes that harbor de novo mutations predicted to be deleterious, we found that specific critical exons were significantly enriched in individuals with ASD relative to their siblings without ASD (P < 1.13 × 10(-38); odds ratio (OR) = 2.40). Furthermore, our analysis of genes with high exonic expression in brain and low burden of rare mutations demonstrated enrichment for known ASD-associated genes (P < 3.40 × 10(-11); OR = 6.08) and ASD-relevant fragile-X protein targets (P < 2.91 × 10(-157); OR = 9.52). Our results suggest that brain-expressed exons under purifying selection should be prioritized in genotype-phenotype studies for ASD and related neurodevelopmental conditions.


Assuntos
Encéfalo/metabolismo , Transtornos Globais do Desenvolvimento Infantil/genética , Éxons/genética , Mutação de Sentido Incorreto/genética , Adolescente , Adulto , Encéfalo/patologia , Estudos de Casos e Controles , Pré-Escolar , Feminino , Redes Reguladoras de Genes , Predisposição Genética para Doença , Humanos , Lactente , Masculino , Fenótipo , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Reação em Cadeia da Polimerase Via Transcriptase Reversa
20.
Genome Biol ; 14(10): R114, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24156756

RESUMO

Transcriptome complexity and its relation to numerous diseases underpins the need to predict in silico splice variants and the regulatory elements that affect them. Building upon our recently described splicing code, we developed AVISPA, a Galaxy-based web tool for splicing prediction and analysis. Given an exon and its proximal sequence, the tool predicts whether the exon is alternatively spliced, displays tissue-dependent splicing patterns, and whether it has associated regulatory elements. We assess AVISPA's accuracy on an independent dataset of tissue-dependent exons, and illustrate how the tool can be applied to analyze a gene of interest. AVISPA is available at http://avispa.biociphers.org.


Assuntos
Processamento Alternativo , Biologia Computacional/métodos , Navegador , Algoritmos , Bases de Dados de Ácidos Nucleicos , Éxons , Genômica/métodos , Especificidade de Órgãos/genética , Curva ROC , Transcriptoma , Fator A de Crescimento do Endotélio Vascular/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...