Pesquisa | Portal Regional da BVS (teste)

1.

CircNetVis: an interactive web application for visualizing interaction networks of circular RNAs.

Nguyen, Thi-Hau; Nguyen, Ha-Nam; Vu, Trung Nghia.

BMC Bioinformatics ; 25(1): 31, 2024 Jan 17.

Artigo em Inglês | MEDLINE | ID: mdl-38233808

RESUMO

Analyzing the interactions of circular RNAs (circRNAs) is a crucial step in understanding their functional impacts. While there are numerous visualization tools available for investigating circRNA interaction networks, these tools are typically limited to known circRNAs from specific databases. Moreover, these existing tools usually require complex installation procedures which can be time-consuming and challenging for users. There is a lack of a user-friendly web application that facilitates interactive exploration and visualization of circRNA interaction networks. CircNetVis is an interactive online web application to enhance the analysis of human/mouse circRNA interactions. The tool allows three different input formats of circRNAs including circRNA IDs from CircBase, circRNA coordinates (chromosome, start position, end position), and circRNA sequences in the FASTA format. It integrates multiple interaction networks for visualization and investigation of the interplay between circRNA, microRNAs, mRNAs and RNA binding proteins. CircNetVis also enables users to interactively explore the interactions of unknown circRNAs which are not reported from previous databases. The tool can generate interactive plots and allows users to save results as output files for offline usage. CircNetVis is implemented as a web application using R-shiny and freely available for academic use at https://www.meb.ki.se/shiny/truvu/CircNetVis/ .

Assuntos

MicroRNAs , RNA Circular , Humanos , Camundongos , Animais , MicroRNAs/genética , MicroRNAs/metabolismo , RNA Mensageiro/genética , Software , Bases de Dados Factuais , Redes Reguladoras de Genes

2.

Hidden Genetic Regulation of Human Complex Traits via Brain Isoforms.

Pan, Lu; Zheng, Chenqing; Yang, Zhijian; Pawitan, Yudi; Vu, Trung Nghia; Shen, Xia.

Phenomics ; 3(3): 217-227, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-37325708

RESUMO

Alternative splicing exists in most multi-exonic genes, and exploring these complex alternative splicing events and their resultant isoform expressions is essential. However, it has become conventional that RNA sequencing results have often been summarized into gene-level expression counts mainly due to the multiple ambiguous mapping of reads at highly similar regions. Transcript-level quantification and interpretation are often overlooked, and biological interpretations are often deduced based on combined transcript information at the gene level. Here, for the most variable tissue of alternative splicing, the brain, we estimate isoform expressions in 1,191 samples collected by the Genotype-Tissue Expression (GTEx) Consortium using a powerful method that we previously developed. We perform genome-wide association scans on the isoform ratios per gene and identify isoform-ratio quantitative trait loci (irQTL), which could not be detected by studying gene-level expressions alone. By analyzing the genetic architecture of the irQTL, we show that isoform ratios regulate educational attainment via multiple tissues including the frontal cortex (BA9), cortex, cervical spinal cord, and hippocampus. These tissues are also associated with different neuro-related traits, including Alzheimer's or dementia, mood swings, sleep duration, alcohol intake, intelligence, anxiety or depression, etc. Mendelian randomization (MR) analysis revealed 1,139 pairs of isoforms and neuro-related traits with plausible causal relationships, showing much stronger causal effects than on general diseases measured in the UK Biobank (UKB). Our results highlight essential transcript-level biomarkers in the human brain for neuro-related complex traits and diseases, which could be missed by merely investigating overall gene expressions. Supplementary Information: The online version contains supplementary material available at 10.1007/s43657-023-00100-6.

3.

Prediction model for drug response of acute myeloid leukemia patients.

Trac, Quang Thinh; Pawitan, Yudi; Mou, Tian; Erkers, Tom; Östling, Päivi; Bohlin, Anna; Österroos, Albin; Vesterlund, Mattias; Jafari, Rozbeh; Siavelis, Ioannis; Bäckvall, Helena; Kiviluoto, Santeri; Orre, Lukas M; Rantalainen, Mattias; Lehtiö, Janne; Lehmann, Sören; Kallioniemi, Olli; Vu, Trung Nghia.

NPJ Precis Oncol ; 7(1): 32, 2023 Mar 24.

Artigo em Inglês | MEDLINE | ID: mdl-36964195

RESUMO

Despite some encouraging successes, predicting the therapy response of acute myeloid leukemia (AML) patients remains highly challenging due to tumor heterogeneity. Here we aim to develop and validate MDREAM, a robust ensemble-based prediction model for drug response in AML based on an integration of omics data, including mutations and gene expression, and large-scale drug testing. Briefly, MDREAM is first trained in the BeatAML cohort (n = 278), and then validated in the BeatAML (n = 183) and two external cohorts, including a Swedish AML cohort (n = 45) and a relapsed/refractory acute leukemia cohort (n = 12). The final prediction is based on 122 ensemble models, each corresponding to a drug. A confidence score metric is used to convey the uncertainty of predictions; among predictions with a confidence score >0.75, the validated proportion of good responders is 77%. The Spearman correlations between the predicted and the observed drug response are 0.68 (95% CI: [0.64, 0.68]) in the BeatAML validation set, -0.49 (95% CI: [-0.53, -0.44]) in the Swedish cohort and 0.59 (95% CI: [0.51, 0.67]) in the relapsed/refractory cohort. A web-based implementation of MDREAM is publicly available at https://www.meb.ki.se/shiny/truvu/MDREAM/ .

4.

A Comprehensive Landscape of Imaging Feature-Associated RNA Expression Profiles in Human Breast Tissue.

Mou, Tian; Liang, Jianwen; Vu, Trung Nghia; Tian, Mu; Gao, Yi.

Sensors (Basel) ; 23(3)2023 Jan 28.

Artigo em Inglês | MEDLINE | ID: mdl-36772473

RESUMO

The expression abundance of transcripts in nondiseased breast tissue varies among individuals. The association study of genotypes and imaging phenotypes may help us to understand this individual variation. Since existing reports mainly focus on tumors or lesion areas, the heterogeneity of pathological image features and their correlations with RNA expression profiles for nondiseased tissue are not clear. The aim of this study is to discover the association between the nucleus features and the transcriptome-wide RNAs. We analyzed both microscopic histology images and RNA-sequencing data of 456 breast tissues from the Genotype-Tissue Expression (GTEx) project and constructed an automatic computational framework. We classified all samples into four clusters based on their nucleus morphological features and discovered feature-specific gene sets. The biological pathway analysis was performed on each gene set. The proposed framework evaluates the morphological characteristics of the cell nucleus quantitatively and identifies the associated genes. We found image features that capture population variation in breast tissue associated with RNA expressions, suggesting that the variation in expression pattern affects population variation in the morphological traits of breast tissue. This study provides a comprehensive transcriptome-wide view of imaging-feature-specific RNA expression for healthy breast tissue. Such a framework could also be used for understanding the connection between RNA expression and morphology in other tissues and organs. Pathway analysis indicated that the gene sets we identified were involved in specific biological processes, such as immune processes.

Assuntos

Neoplasias da Mama , Transcriptoma , Humanos , Feminino , Transcriptoma/genética , RNA/genética , Análise de Sequência de RNA , Genótipo , Fenótipo , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/genética

5.

Whole-genome sequencing of antimicrobial-resistant Salmonella enterica isolates from a Cairina moschata carcass.

Nguyen, Trung Thanh; Le, Hoa Vinh; Xuan, Da Pham; Vu, Trung Nghia; Nguyen, Minh Hong; Tran, Huyen Thi Thanh.

Data Brief ; 47: 108932, 2023 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-36819900

RESUMO

Salmonella enterica is one of the most common agents of foodborne bacterial illness with poultry being an important reservoir. The indiscriminate use of antimicrobial compounds in poultry farming increasingly leads to antimicrobial-resistant (AMR) which threatens the health of both animals and humans. Antimicrobial-resistant Salmonella enterica from the poultry can spread to human through the direct contact with infected poultry or fecal contaminated environments. Antimicrobial-resistant S. enterica, especially fluoroquinolone-resistant nontyphoidal Salmonella is in the list of global health concern stated by the World Health Organization (WHO). Here we report the whole-genome sequencing data and de novo genome assemble of antimicrobial-resistant S. enterica strains S8 and S9 from the C. moschata carcass collected in Vietnam. Genomic DNA of S. enterica were extracted and subjected to whole-genome sequencing using Illumina MiSeq platform. The genome size of antimicrobial-resistant S. enterica strain S8 is 4,707,459 bp with a GC-content of 52.38%, containing 10 antimicrobial resistant genes. The genome size of antimicrobial-resistant Samonella enterica strain S9 is 4,923,944 bp with a GC-content of 52,39%, containing 10 antimicrobial resistance genes. Our data provided the insights on antimicrobial resistant genes of S. enterica isolates from the C. moschata carcass, which help to understand the infection mechanism of antimicrobial-resistant S. enterica in human.

6.

T cell responses at diagnosis of amyotrophic lateral sclerosis predict disease progression.

Yazdani, Solmaz; Seitz, Christina; Cui, Can; Lovik, Anikó; Pan, Lu; Piehl, Fredrik; Pawitan, Yudi; Kläppe, Ulf; Press, Rayomand; Samuelsson, Kristin; Yin, Li; Vu, Trung Nghia; Joly, Anne-Laure; Westerberg, Lisa S; Evertsson, Björn; Ingre, Caroline; Andersson, John; Fang, Fang.

Nat Commun ; 13(1): 6733, 2022 11 08.

Artigo em Inglês | MEDLINE | ID: mdl-36347843

RESUMO

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease, involving neuroinflammation and T cell infiltration in the central nervous system. However, the contribution of T cell responses to the pathology of the disease is not fully understood. Here we show, by flow cytometric analysis of blood and cerebrospinal fluid (CSF) samples of a cohort of 89 newly diagnosed ALS patients in Stockholm, Sweden, that T cell phenotypes at the time of diagnosis are good predictors of disease outcome. High frequency of CD4+FOXP3- effector T cells in blood and CSF is associated with poor survival, whereas high frequency of activated regulatory T (Treg) cells and high ratio between activated and resting Treg cells in blood are associated with better survival. Besides survival, phenotypic profiling of T cells could also predict disease progression rate. Single cell transcriptomics analysis of CSF samples shows clonally expanded CD4+ and CD8+ T cells in CSF, with characteristic gene expression patterns. In summary, T cell responses associate with and likely contribute to disease progression in ALS, supporting modulation of adaptive immunity as a viable therapeutic option.

Assuntos

Esclerose Lateral Amiotrófica , Doenças Neurodegenerativas , Humanos , Esclerose Lateral Amiotrófica/diagnóstico , Esclerose Lateral Amiotrófica/genética , Esclerose Lateral Amiotrófica/patologia , Linfócitos T CD8-Positivos/patologia , Doenças Neurodegenerativas/metabolismo , Linfócitos T Reguladores , Progressão da Doença

7.

Discovery of druggable cancer-specific pathways with application in acute myeloid leukemia.

Trac, Quang Thinh; Zhou, Tingyou; Pawitan, Yudi; Vu, Trung Nghia.

Gigascience ; 112022 09 29.

Artigo em Inglês | MEDLINE | ID: mdl-36173247

RESUMO

An individualized cancer therapy is ideally chosen to target the cancer's driving biological pathways, but identifying such pathways is challenging because of their underlying heterogeneity and there is no guarantee that they are druggable. We hypothesize that a cancer with an activated druggable cancer-specific pathway (DCSP) is more likely to respond to the relevant drug. Here we develop and validate a systematic method to search for such DCSPs, by (i) introducing a pathway activation score (PAS) that integrates cancer-specific driver mutations and gene expression profile and drug-specific gene targets, (ii) applying the method to identify DCSPs from pan-cancer datasets, and (iii) analyzing the correlation between PAS and the response to relevant drugs. In total, 4,794 DCSPs from 23 different cancers have been discovered in the Genomics of Drug Sensitivity in Cancer database and validated in The Cancer Genome Atlas database. Supporting the hypothesis, for the DCSPs in acute myeloid leukemia, cancers with higher PASs are shown to have stronger drug response, and this is validated in the BeatAML cohort. All DCSPs are publicly available at https://www.meb.ki.se/shiny/truvu/DCSP/.

Assuntos

Leucemia Mieloide Aguda , Genômica/métodos , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Transcriptoma

8.

Quantification of mutant-allele expression at isoform level in cancer from RNA-seq data.

Deng, Wenjiang; Mou, Tian; Pawitan, Yudi; Vu, Trung Nghia.

NAR Genom Bioinform ; 4(3): lqac052, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-35855322

RESUMO

Even though the role of DNA mutations in cancer is well recognized, current quantification of the RNA expression, performed either at gene or isoform level, typically ignores the mutation status. Standard methods for estimating allele-specific expression (ASE) consider gene-level expression, but the functional impact of a mutation is best assessed at isoform level. Hence our goal is to quantify the mutant-allele expression at isoform level. We have developed and implemented a method, named MAX, for quantifying mutant-allele expression given a list of mutations. For a gene of interest, a mutant reference is constructed by incorporating all possible mutant versions of the wild-type isoforms in the transcriptome annotation. The mutant reference is then used for the RNA-seq reads mapping, which in principle works similarly for any quantification tool. We apply an alternating EM algorithm to the read-count data from the mapping step. In a simulation study, MAX performs well against standard isoform-quantification methods. Also, MAX achieves higher accuracy than conventional gene-based ASE methods such as ASEP. An analysis of a real dataset of acute myeloid leukemia reveals a subgroup of NPM1-mutated patients responding well to a kinase inhibitor. Our findings indicate that quantification of mutant-allele expression at isoform level is feasible and has potential added values for assessing the functional impact of DNA mutations in cancers.

9.

Fusion Gene Detection Using Whole-Exome Sequencing Data in Cancer Patients.

Deng, Wenjiang; Murugan, Sarath; Lindberg, Johan; Chellappa, Venkatesh; Shen, Xia; Pawitan, Yudi; Vu, Trung Nghia.

Front Genet ; 13: 820493, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35251131

RESUMO

Several fusion genes are directly involved in the initiation and progression of cancers. Numerous bioinformatics tools have been developed to detect fusion events, but they are mainly based on RNA-seq data. The whole-exome sequencing (WES) represents a powerful technology that is widely used for disease-related DNA variant detection. In this study, we build a novel analysis pipeline called Fuseq-WES to detect fusion genes at DNA level based on the WES data. The same method applies also for targeted panel sequencing data. We assess the method to real datasets of acute myeloid leukemia (AML) and prostate cancer patients. The result shows that two of the main AML fusion genes discovered in RNA-seq data, PML-RARA and CBFB-MYH11, are detected in the WES data in 36 and 63% of the available samples, respectively. For the targeted deep-sequencing of prostate cancer patients, detection of the TMPRSS2-ERG fusion, which is the most frequent chimeric alteration in prostate cancer, is 91% concordant with a manually curated procedure based on four other methods. In summary, the overall results indicate that it is challenging to detect fusion genes in WES data with a standard coverage of â¼ 15-30x, where fusion candidates discovered in the RNA-seq data are often not detected in the WES data and vice versa. A subsampling study of the prostate data suggests that a coverage of at least 75x is necessary to achieve high accuracy.

10.

Evaluation of methods to detect circular RNAs from single-end RNA-sequencing data.

Nguyen, Manh Hung; Nguyen, Ha-Nam; Vu, Trung Nghia.

BMC Genomics ; 23(1): 106, 2022 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-35135477

RESUMO

BACKGROUND: Circular RNA (circRNA), a class of RNA molecule with a loop structure, has recently attracted researchers due to its diverse biological functions and potential biomarkers of human diseases. Most of the current circRNA detection methods from RNA-sequencing (RNA-Seq) data utilize the mapping information of paired-end (PE) reads to eliminate false positives. However, much of the practical RNA-Seq data such as cross-linking immunoprecipitation sequencing (CLIP-Seq) data usually contain single-end (SE) reads. It is not clear how well these tools perform on SE RNA-Seq data. RESULTS: In this study, we present a systematic evaluation of six advanced RNA-based methods and two CLIP-Seq based methods for detecting circRNAs from SE RNA-Seq data. The performances of the methods are rigorously assessed based on precision, sensitivity, F1 score, and true discovery rate. We investigate the impacts of read length, false positive ratio, sequencing depth and PE mapping information on the performances of the methods using simulated SE RNA-Seq simulated datasets. The real datasets used in this study consist of four experimental RNA-Seq datasets with ≥100bp read length and 124 CLIP-Seq samples from 45 studies that contain mostly short-read (≤50bp) RNA-Seq data. The simulation study shows that the sensitivities of most of the methods can be improved by increasing either read length or sequencing depth, and that the levels of false positive rates significantly affect the precision of all methods. Furthermore, the PE mapping information can improve the method's precision but can not always guarantee the increase of F1 score. Overall, no method is dominant for all SE RNA-Seq data. The RNA-based methods perform better for the long-read datasets but are worse for the short-read datasets. In contrast, the CLIP-Seq based methods outperform the RNA-Seq based methods for all the short-read samples. Combining the results of these methods can significantly improve precision in the CLIP-Seq data. CONCLUSIONS: The results provide a systematic evaluation of circRNA detection methods on SE RNA-Seq data that would facilitate researchers' strategies in circRNA analysis.

Assuntos

RNA Circular , RNA , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Imunoprecipitação , RNA/genética , RNA-Seq , Análise de Sequência de RNA

11.

Isoform-level quantification for single-cell RNA sequencing.

Pan, Lu; Dinh, Huy Q; Pawitan, Yudi; Vu, Trung Nghia.

Bioinformatics ; 38(5): 1287-1294, 2022 02 07.

Artigo em Inglês | MEDLINE | ID: mdl-34864849

RESUMO

MOTIVATION: RNA expression at isoform level is biologically more informative than at gene level and can potentially reveal cellular subsets and corresponding biomarkers that are not visible at gene level. However, due to the strong 3' bias sequencing protocol, mRNA quantification for high-throughput single-cell RNA sequencing such as Chromium Single Cell 3' 10× Genomics is currently performed at the gene level. RESULTS: We have developed an isoform-level quantification method for high-throughput single-cell RNA sequencing by exploiting the concepts of transcription clusters and isoform paralogs. The method, called Scasa, compares well in simulations against competing approaches including Alevin, Cellranger, Kallisto, Salmon, Terminus and STARsolo at both isoform- and gene-level expression. The reanalysis of a CITE-Seq dataset with isoform-based Scasa reveals a subgroup of CD14 monocytes missed by gene-based methods. AVAILABILITY AND IMPLEMENTATION: Implementation of Scasa including source code, documentation, tutorials and test data supporting this study is available at Github: https://github.com/eudoraleer/scasa and Zenodo: https://doi.org/10.5281/zenodo.5712503. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA Mensageiro/genética , RNA

12.

Circall: fast and accurate methodology for discovery of circular RNAs from paired-end RNA-sequencing data.

Nguyen, Dat Thanh; Trac, Quang Thinh; Nguyen, Thi-Hau; Nguyen, Ha-Nam; Ohad, Nir; Pawitan, Yudi; Vu, Trung Nghia.

BMC Bioinformatics ; 22(1): 495, 2021 Oct 13.

Artigo em Inglês | MEDLINE | ID: mdl-34645386

RESUMO

BACKGROUND: Circular RNA (circRNA) is an emerging class of RNA molecules attracting researchers due to its potential for serving as markers for diagnosis, prognosis, or therapeutic targets of cancer, cardiovascular, and autoimmune diseases. Current methods for detection of circRNA from RNA sequencing (RNA-seq) focus mostly on improving mapping quality of reads supporting the back-splicing junction (BSJ) of a circRNA to eliminate false positives (FPs). We show that mapping information alone often cannot predict if a BSJ-supporting read is derived from a true circRNA or not, thus increasing the rate of FP circRNAs. RESULTS: We have developed Circall, a novel circRNA detection method from RNA-seq. Circall controls the FPs using a robust multidimensional local false discovery rate method based on the length and expression of circRNAs. It is computationally highly efficient by using a quasi-mapping algorithm for fast and accurate RNA read alignments. We applied Circall on two simulated datasets and three experimental datasets of human cell-lines. The results show that Circall achieves high sensitivity and precision in the simulated data. In the experimental datasets it performs well against current leading methods. Circall is also substantially faster than the other methods, particularly for large datasets. CONCLUSIONS: With those better performances in the detection of circRNAs and in computational time, Circall facilitates the analyses of circRNAs in large numbers of samples. Circall is implemented in C++ and R, and available for use at https://www.meb.ki.se/sites/biostatwiki/circall and https://github.com/datngu/Circall.

Assuntos

RNA Circular , RNA , Humanos , RNA/genética , Splicing de RNA , RNA-Seq , Análise de Sequência de RNA

13.

The transcriptome-wide landscape of molecular subtype-specific mRNA expression profiles in acute myeloid leukemia.

Mou, Tian; Pawitan, Yudi; Stahl, Matthias; Vesterlund, Mattias; Deng, Wenjiang; Jafari, Rozbeh; Bohlin, Anna; Österroos, Albin; Siavelis, Loannis; Bäckvall, Helena; Erkers, Tom; Kiviluoto, Santeri; Seashore-Ludlow, Brinton; Östling, Päivi; Orre, Lukas M; Kallioniemi, Olli; Lehmann, Sören; Lehtiö, Janne; Vu, Trung Nghia.

Am J Hematol ; 96(5): 580-588, 2021 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-33625756

RESUMO

Molecular classification of acute myeloid leukemia (AML) aids prognostic stratification and clinical management. Our aim in this study is to identify transcriptome-wide mRNAs that are specific to each of the molecular subtypes of AML. We analyzed RNA-sequencing data of 955 AML samples from three cohorts, including the BeatAML project, the Cancer Genome Atlas, and a cohort of Swedish patients to provide a comprehensive transcriptome-wide view of subtype-specific mRNA expression. We identified 729 subtype-specific mRNAs, discovered in the BeatAML project and validated in the other two cohorts. Using unique proteomics data, we also validated the presence of subtype-specific mRNAs at the protein level, yielding a rich collection of potential protein-based biomarkers for the AML community. To enable the exploration of subtype-specific mRNA expression by the broader scientific community, we provide an interactive resource to the public.

Assuntos

Leucemia Mieloide Aguda/genética , RNA Mensageiro/biossíntese , RNA Neoplásico/biossíntese , Transcriptoma , Biomarcadores Tumorais , Genes Neoplásicos , Humanos , Leucemia Mieloide Aguda/classificação , Leucemia Mieloide Aguda/metabolismo , Proteínas de Neoplasias/biossíntese , Proteínas de Neoplasias/genética , Proteínas de Fusão Oncogênica/biossíntese , Proteínas de Fusão Oncogênica/genética , Proteoma , RNA Mensageiro/genética , RNA Neoplásico/genética , RNA-Seq , Estudos Retrospectivos , Suécia

14.

Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

Deng, Wenjiang; Mou, Tian; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Vu, Trung Nghia.

Bioinformatics ; 36(3): 805-812, 2020 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-31400221

RESUMO

MOTIVATION: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known. RESULTS: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xß, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xß as a bilinear model with both X and ß unknown. Joint estimation of X and ß is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and ß. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets. AVAILABILITY AND IMPLEMENTATION: The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica , RNA-Seq , Algoritmos , Isoformas de Proteínas/genética , Análise de Sequência de RNA , Software

15.

Cell-level somatic mutation detection from single-cell RNA sequencing.

Vu, Trung Nghia; Nguyen, Ha-Nam; Calza, Stefano; Kalari, Krishna R; Wang, Liewei; Pawitan, Yudi.

Bioinformatics ; 35(22): 4679-4687, 2019 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-31028395

RESUMO

MOTIVATION: Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data. RESULTS: Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method-called SCmut-to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity. AVAILABILITY AND IMPLEMENTATION: The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Mutação , Perfilação da Expressão Gênica , Humanos , Análise de Sequência de RNA , Análise de Célula Única , Software

16.

Reproducibility of Methods to Detect Differentially Expressed Genes from Single-Cell RNA Sequencing.

Mou, Tian; Deng, Wenjiang; Gu, Fengyun; Pawitan, Yudi; Vu, Trung Nghia.

Front Genet ; 10: 1331, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-32010190

RESUMO

Detection of differentially expressed genes is a common task in single-cell RNA-seq (scRNA-seq) studies. Various methods based on both bulk-cell and single-cell approaches are in current use. Due to the unique distributional characteristics of single-cell data, it is important to compare these methods with rigorous statistical assessments. In this study, we assess the reproducibility of 9 tools for differential expression analysis in scRNA-seq data. These tools include four methods originally designed for scRNA-seq data, three popular methods originally developed for bulk-cell RNA-seq data but have been applied in scRNA-seq analysis, and two general statistical tests. Instead of comparing the performance across all genes, we compare the methods in terms of the rediscovery rates (RDRs) of top-ranked genes, separately for highly and lowly expressed genes. Three real and one simulated scRNA-seq data sets are used for the comparisons. The results indicate that some widely used methods, such as edgeR and monocle, have worse RDR performances compared to the other methods, especially for the top-ranked genes. For highly expressed genes, many bulk-cell-based methods can perform similarly to the methods designed for scRNA-seq data. But for the lowly expressed genes performance varies substantially; edgeR and monocle are too liberal and have poor control of false positives, while DESeq2 is too conservative and consequently loses sensitivity compared to the other methods. BPSC, Limma, DEsingle, MAST, t-test and Wilcoxon have similar performances in the real data sets. Overall, the scRNA-seq based method BPSC performs well against the other methods, particularly when there is a sufficient number of cells.

17.

A fast detection of fusion genes from paired-end RNA-seq data.

Vu, Trung Nghia; Deng, Wenjiang; Trac, Quang Thinh; Calza, Stefano; Hwang, Woochang; Pawitan, Yudi.

BMC Genomics ; 19(1): 786, 2018 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-30382840

RESUMO

BACKGROUND: Fusion genes are known to be drivers of many common cancers, so they are potential markers for diagnosis, prognosis or therapy response. The advent of paired-end RNA sequencing enhances our ability to discover fusion genes. While there are available methods, routine analyses of large number of samples are still limited due to high computational demands. RESULTS: We develop FuSeq, a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates. We apply FuSeq to four validated datasets: breast cancer, melanoma and glioma datasets, and one spike-in dataset. The results reveal high sensitivity and specificity in all datasets, and compare well against other methods such as FusionMap, TRUP, TopHat-Fusion, SOAPfuse and JAFFA. In terms of computational time, FuSeq is two-fold faster than FusionMap and orders of magnitude faster than the other methods. CONCLUSIONS: With this advantage of less computational demands, FuSeq makes it practical to investigate fusion genes in large numbers of samples. FuSeq is implemented in C++ and R, and available at https://github.com/nghiavtr/FuSeq for non-commercial uses.

Assuntos

Fusão Gênica , RNA/genética , Análise de Sequência de RNA , Algoritmos , Linhagem Celular Tumoral , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias/genética , Proteínas de Fusão Oncogênica/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos

18.

Accumulation of potential driver genes with genomic alterations predicts survival of high-risk neuroblastoma patients.

Suo, Chen; Deng, Wenjiang; Vu, Trung Nghia; Li, Mingrui; Shi, Leming; Pawitan, Yudi.

Biol Direct ; 13(1): 14, 2018 07 16.

Artigo em Inglês | MEDLINE | ID: mdl-30012197

RESUMO

BACKGROUND: Neuroblastoma is the most common pediatric malignancy with heterogeneous clinical behaviors, ranging from spontaneous regression to aggressive progression. Many studies have identified aberrations related to the pathogenesis and prognosis, broadly classifying neuroblastoma patients into high- and low-risk groups, but predicting tumor progression and clinical management of high-risk patients remains a big challenge. RESULTS: We integrate gene-level expression, array-based comparative genomic hybridization and functional gene-interaction network of 145 neuroblastoma patients to detect potential driver genes. The drivers are summarized into a driver-gene score (DGscore) for each patient, and we then validate its clinical relevance in terms of association with patient survival. Focusing on a subset of 48 clinically defined high-risk patients, we identify 193 recurrent regions of copy number alterations (CNAs), resulting in 274 altered genes whose copy-number gain or loss have parallel impact on the gene expression. Using a network enrichment analysis, we detect four common driver genes, ERCC6, HECTD2, KIAA1279, EMX2, and 66 patient-specific driver genes. Patients with high DGscore, thus carrying more copy-number-altered genes with correspondingly up- or down-regulated expression and functional implications, have worse survival than those with low DGscore (P = 0.006). Furthermore, Cox proportional-hazards regression analysis shows that, adjusted for age, tumor stage and MYCN amplification, DGscore is the only significant prognostic factor for high-risk neuroblastoma patients (P = 0.008). CONCLUSIONS: Integration of genomic copy number alteration, expression and functional interaction-network data reveals clinically relevant and prognostic putative driver genes in high-risk neuroblastoma patients. The identified putative drivers are potential drug targets for individualized therapy. REVIEWERS: This article was reviewed by Armand Valsesia, Susmita Datta and Aleksandra Gruca.

Assuntos

Hibridização Genômica Comparativa/métodos , Neuroblastoma/genética , Animais , Variações do Número de Cópias de DNA/genética , Dosagem de Genes/genética , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Humanos , Modelos de Riscos Proporcionais

19.

Isoform-level gene expression patterns in single-cell RNA-sequencing data.

Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Pawitan, Yudi; Rantalainen, Mattias.

Bioinformatics ; 34(14): 2392-2400, 2018 07 15.

Artigo em Inglês | MEDLINE | ID: mdl-29490015

RESUMO

Motivation: RNA sequencing of single cells enables characterization of transcriptional heterogeneity in seemingly homogeneous cell populations. Single-cell sequencing has been applied in a wide range of researches fields. However, few studies have focus on characterization of isoform-level expression patterns at the single-cell level. In this study, we propose and apply a novel method, ISOform-Patterns (ISOP), based on mixture modeling, to characterize the expression patterns of isoform pairs from the same gene in single-cell isoform-level expression data. Results: We define six principal patterns of isoform expression relationships and describe a method for differential-pattern analysis. We demonstrate ISOP through analysis of single-cell RNA-sequencing data from a breast cancer cell line, with replication in three independent datasets. We assigned the pattern types to each of 16 562 isoform-pairs from 4929 genes. Among those, 26% of the discovered patterns were significant (P<0.05), while remaining patterns are possibly effects of transcriptional bursting, drop-out and stochastic biological heterogeneity. Furthermore, 32% of genes discovered through differential-pattern analysis were not detected by differential-expression analysis. Finally, the effects of drop-out events and expression levels of isoforms on ISOP's performances were investigated through simulated datasets. To conclude, ISOP provides a novel approach for characterization of isoform-level preference, commitment and heterogeneity in single-cell RNA-sequencing data. Availability and implementation: The ISOP method has been implemented as a R package and is available at https://github.com/nghiavtr/ISOP under a GPL-3 license. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica/métodos , Expressão Gênica , Isoformas de RNA/genética , Análise de Sequência de RNA/métodos , Software , Neoplasias da Mama/genética , Linhagem Celular Tumoral , Feminino , Humanos

20.

speaq 2.0: A complete workflow for high-throughput 1D NMR spectra processing and quantification.

Beirnaert, Charlie; Meysman, Pieter; Vu, Trung Nghia; Hermans, Nina; Apers, Sandra; Pieters, Luc; Covaci, Adrian; Laukens, Kris.

PLoS Comput Biol ; 14(3): e1006018, 2018 03.

Artigo em Inglês | MEDLINE | ID: mdl-29494588

RESUMO

Nuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. Meanwhile its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping, thus avoiding the binning step. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using a simulated dataset and two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq).

Assuntos

Espectroscopia de Ressonância Magnética/métodos , Metabolômica/métodos , Algoritmos , Cromatografia Líquida/métodos , Imageamento por Ressonância Magnética/métodos , Software , Fluxo de Trabalho

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA