Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 170
Filter
1.
Nat Methods ; 21(5): 793-797, 2024 May.
Article in English | MEDLINE | ID: mdl-38509328

ABSTRACT

SQANTI3 is a tool designed for the quality control, curation and annotation of long-read transcript models obtained with third-generation sequencing technologies. Leveraging its annotation framework, SQANTI3 calculates quality descriptors of transcript models, junctions and transcript ends. With this information, potential artifacts can be identified and replaced with reliable sequences. Furthermore, the integrated functional annotation feature enables subsequent functional iso-transcriptomics analyses.


Subject(s)
Molecular Sequence Annotation , Transcriptome , Humans , Molecular Sequence Annotation/methods , Software , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Protein Isoforms/genetics , High-Throughput Nucleotide Sequencing/methods
2.
Nucleic Acids Res ; 52(5): e28, 2024 Mar 21.
Article in English | MEDLINE | ID: mdl-38340337

ABSTRACT

Advances in affordable transcriptome sequencing combined with better exon and gene prediction has motivated many to compare transcription across the tree of life. We develop a mathematical framework to calculate complexity and compare transcript models. Structural features, i.e. intron retention (IR), donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs, are compared and the distance between transcript models is calculated with nucleotide level precision. All metrics are implemented in a PyPi package, TranD and output can be used to summarize splicing patterns for a transcriptome (1GTF) and between transcriptomes (2GTF). TranD output enables quantitative comparisons between: annotations augmented by empirical RNA-seq data and the original transcript models; transcript model prediction tools for longread RNA-seq (e.g. FLAIR versus Isoseq3); alternate annotations for a species (e.g. RefSeq vs Ensembl); and between closely related species. In C. elegans, Z. mays, D. melanogaster, D. simulans and H. sapiens, alternative exons were observed more frequently in combination with an alternative donor/acceptor than alone. Transcript models in RefSeq and Ensembl are linked and both have unique transcript models with empirical support. D. melanogaster and D. simulans, share many transcript models and long-read RNAseq data suggests that both species are under-annotated. We recommend combined references.


Subject(s)
Alternative Splicing , Transcriptome , Animals , Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Gene Expression Profiling , Nucleotides , RNA Splicing , Sequence Analysis, RNA , Species Specificity , Transcriptome/genetics , Software
3.
Genome Biol ; 24(1): 286, 2023 Dec 11.
Article in English | MEDLINE | ID: mdl-38082294

ABSTRACT

Long-read RNA sequencing has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile tool that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field.


Subject(s)
Benchmarking , Transcriptome , Sequence Analysis, RNA , Base Sequence , Computer Simulation , High-Throughput Nucleotide Sequencing , Gene Expression Profiling
4.
bioRxiv ; 2023 Aug 24.
Article in English | MEDLINE | ID: mdl-37662216

ABSTRACT

Long-read RNA-seq has emerged as a powerful tool for transcript discovery, even in well-annotated organisms. However, assessing the accuracy of different methods in identifying annotated and novel transcripts remains a challenge. Here, we present SQANTI-SIM, a versatile utility that wraps around popular long-read simulators to allow precise management of transcript novelty based on the structural categories defined by SQANTI3. By selectively excluding specific transcripts from the reference dataset, SQANTI-SIM effectively emulates scenarios involving unannotated transcripts. Furthermore, the tool provides customizable features and supports the simulation of additional types of data, representing the first multi-omics simulation tool for the lrRNA-seq field. We demonstrate the effectiveness of SQANTI-SIM by benchmarking five transcriptome reconstruction pipelines using the simulated data.

5.
Front Microbiol ; 14: 1174685, 2023.
Article in English | MEDLINE | ID: mdl-37577445

ABSTRACT

Microbes continually shape Earth's biochemical and physical landscapes by inhabiting diverse metabolic niches. Despite the important role microbes play in ecosystem functioning, most microbial species remain unknown highlighting a gap in our understanding of structured complex ecosystems. To elucidate the relevance of these unknown taxa, often referred to as "microbial dark matter," the integration of multiple high throughput sequencing technologies was used to evaluate the co-occurrence and connectivity of all microbes within the community. Since there are no standard methodologies for multi-omics integration of microbiome data, we evaluated the abundance of "microbial dark matter" in microbialite-forming communities using different types meta-omic datasets: amplicon, metagenomic, and metatranscriptomic sequencing previously generated for this ecosystem. Our goal was to compare the community structure and abundances of unknown taxa within the different data types rather than to perform a functional characterization of the data. Metagenomic and metatranscriptomic data were input into SortMeRNA to extract 16S rRNA gene reads. The output, as well as amplicon sequences, were processed through QIIME2 for taxonomy analysis. The R package mdmnets was utilized to build co-occurrence networks. Most hubs presented unknown classifications, even at the phyla level. Comparisons of the highest scoring hubs of each data type using sequence similarity networks allowed the identification of the most relevant hubs within the microbialite-forming communities. This work highlights the importance of unknown taxa in community structure and proposes that ecosystem network construction can be used on several types of data to identify keystone taxa and their potential function within microbial ecosystems.

6.
bioRxiv ; 2023 Jun 03.
Article in English | MEDLINE | ID: mdl-37398077

ABSTRACT

The emergence of long-read RNA sequencing (lrRNA-seq) has provided an unprecedented opportunity to analyze transcriptomes at isoform resolution. However, the technology is not free from biases, and transcript models inferred from these data require quality control and curation. In this study, we introduce SQANTI3, a tool specifically designed to perform quality analysis on transcriptomes constructed using lrRNA-seq data. SQANTI3 provides an extensive naming framework to describe transcript model diversity in comparison to the reference transcriptome. Additionally, the tool incorporates a wide range of metrics to characterize various structural properties of transcript models, such as transcription start and end sites, splice junctions, and other structural features. These metrics can be utilized to filter out potential artifacts. Moreover, SQANTI3 includes a Rescue module that prevents the loss of known genes and transcripts exhibiting evidence of expression but displaying low-quality features. Lastly, SQANTI3 incorporates IsoAnnotLite, which enables functional annotation at the isoform level and facilitates functional iso-transcriptomics analyses. We demonstrate the versatility of SQANTI3 in analyzing different data types, isoform reconstruction pipelines, and sequencing platforms, and how it provides novel biological insights into isoform biology. The SQANTI3 software is available at https://github.com/ConesaLab/SQANTI3 .

7.
Front Nutr ; 10: 1118679, 2023.
Article in English | MEDLINE | ID: mdl-37153913

ABSTRACT

A previous double-blind, randomized clinical trial of 42 healthy individuals conducted with Lactobacillus johnsonii N6.2 found that the probiotic's mechanistic tryptophan pathway was significantly modified when the data was stratified based on the individuals' lactic acid bacteria (LAB) stool content. These results suggest that confounding factors such as dietary intake which impact stool LAB content may affect the response to the probiotic treatment. Using dietary intake, serum metabolite, and stool LAB colony forming unit (CFU) data from a previous clinical trial, the relationships between diet, metabolic response, and fecal LAB were assessed. The diets of subject groups with high vs. low CFUs of LAB/g of wet stool differed in their intakes of monounsaturated fatty acids, vegetables, proteins, and dairy. Individuals with high LAB consumed greater amounts of cheese, fermented meats, soy, nuts and seeds, alcoholic beverages, and oils whereas individuals with low LAB consumed higher amounts of tomatoes, starchy vegetables, and poultry. Several dietary variables correlated with LAB counts; positive correlations were determined for nuts and seeds, fish high in N-3 fatty acids, soy, and processed meats, and negative correlations to consumption of vegetables including tomatoes. Using machine learning, predictors of LAB count included cheese, nuts and seeds, fish high in N-3 fatty acids, and erucic acid. Erucic acid alone accurately predicted LAB categorization, and was shown to be utilized as a sole fatty acid source by several Lactobacillus species regardless of their mode of fermentation. Several metabolites were significantly upregulated in each group based on LAB titers, notably polypropylene glycol, caproic acid, pyrazine, and chondroitin sulfate; however, none were correlated with the dietary intake variables. These findings suggest that dietary variables may drive the presence of LAB in the human gastrointestinal tract and potentially impact response to probiotic interventions.

8.
Life Sci Alliance ; 6(1)2023 01.
Article in English | MEDLINE | ID: mdl-36302651

ABSTRACT

Obesity and elevated circulating lipids may impair metabolism by disrupting the molecular circadian clock. We tested the hypothesis that lipid overload may interact with the circadian clock and alter the rhythmicity of gene expression through epigenomic mechanisms in skeletal muscle. Palmitate reprogrammed the circadian transcriptome in myotubes without altering the rhythmic mRNA expression of core clock genes. Genes with enhanced cycling in response to palmitate were associated with post-translational modification of histones. The cycling of histone 3 lysine 27 acetylation (H3K27ac), a marker of active gene enhancers, was modified by palmitate treatment. Chromatin immunoprecipitation and sequencing confirmed that palmitate exposure altered the cycling of DNA regions associated with H3K27ac. The overlap between mRNA and DNA regions associated with H3K27ac and the pharmacological inhibition of histone acetyltransferases revealed novel cycling genes associated with lipid exposure of primary human myotubes. Palmitate exposure disrupts transcriptomic rhythmicity and modifies enhancers through changes in histone H3K27 acetylation in a circadian manner. Thus, histone acetylation is responsive to lipid overload and may redirect the circadian chromatin landscape, leading to the reprogramming of circadian genes and pathways involved in lipid biosynthesis in skeletal muscle.


Subject(s)
Histones , Transcriptome , Humans , Histones/metabolism , Transcriptome/genetics , Palmitates/pharmacology , Palmitates/metabolism , Histone Code/genetics , Protein Processing, Post-Translational , RNA, Messenger/metabolism , Muscle Fibers, Skeletal/metabolism , DNA/metabolism
9.
Genome Biol ; 23(1): 252, 2022 12 09.
Article in English | MEDLINE | ID: mdl-36494864

ABSTRACT

BACKGROUND: JUNB transcription factor contributes to the formation of the ubiquitous transcriptional complex AP-1 involved in the control of many physiological and disease-associated functions. The roles of JUNB in the control of cell division and tumorigenic processes are acknowledged but still unclear. RESULTS: Here, we report the results of combined transcriptomic, genomic, and functional studies showing that JUNB promotes cell cycle progression via induction of cyclin E1 and repression of transforming growth factor (TGF)-ß2 genes. We also show that high levels of JUNB switch the response of TGF-ß2 stimulation from an antiproliferative to a pro-invasive one, induce endogenous TGF-ß2 production by promoting TGF-ß2 mRNA translation, and enhance tumor growth and metastasis in mice. Moreover, tumor genomic data indicate that JUNB amplification associates with poor prognosis in breast and ovarian cancer patients. CONCLUSIONS: Our results reveal novel functions for JUNB in cell proliferation and tumor aggressiveness through regulation of cyclin E1 and TGF-ß2 expression, which might be exploited for cancer prognosis and therapy.


Subject(s)
Neoplasms , Transforming Growth Factor beta2 , Mice , Animals , Transforming Growth Factor beta2/genetics , Transcription Factor AP-1 , Cell Division , Cell Cycle Checkpoints , Carcinogenesis , Transcription Factors/genetics
10.
Nat Metab ; 4(9): 1150-1165, 2022 09.
Article in English | MEDLINE | ID: mdl-36097183

ABSTRACT

Studies in genetically 'identical' individuals indicate that as much as 50% of complex trait variation cannot be traced to genetics or to the environment. The mechanisms that generate this 'unexplained' phenotypic variation (UPV) remain largely unknown. Here, we identify neuronatin (NNAT) as a conserved factor that buffers against UPV. We find that Nnat deficiency in isogenic mice triggers the emergence of a bi-stable polyphenism, where littermates emerge into adulthood either 'normal' or 'overgrown'. Mechanistically, this is mediated by an insulin-dependent overgrowth that arises from histone deacetylase (HDAC)-dependent ß-cell hyperproliferation. A multi-dimensional analysis of monozygotic twin discordance reveals the existence of two patterns of human UPV, one of which (Type B) phenocopies the NNAT-buffered polyphenism identified in mice. Specifically, Type-B monozygotic co-twins exhibit coordinated increases in fat and lean mass across the body; decreased NNAT expression; increased HDAC-responsive gene signatures; and clinical outcomes linked to insulinemia. Critically, the Type-B UPV signature stratifies both childhood and adult cohorts into four metabolic states, including two phenotypically and molecularly distinct types of obesity.


Subject(s)
Membrane Proteins , Nerve Tissue Proteins , Adaptation, Physiological , Adult , Animals , Child , Histone Deacetylases , Humans , Insulin , Membrane Proteins/metabolism , Mice , Nerve Tissue Proteins/genetics , Obesity/genetics , Obesity/metabolism
11.
Metallomics ; 14(9)2022 09 24.
Article in English | MEDLINE | ID: mdl-36066904

ABSTRACT

Queuosine (Q) is a conserved hypermodification of the wobble base of tRNA containing GUN anticodons but the physiological consequences of Q deficiency are poorly understood in bacteria. This work combines transcriptomic, proteomic and physiological studies to characterize a Q-deficient Escherichia coli K12 MG1655 mutant. The absence of Q led to an increased resistance to nickel and cobalt, and to an increased sensitivity to cadmium, compared to the wild-type (WT) strain. Transcriptomic analysis of the WT and Q-deficient strains, grown in the presence and absence of nickel, revealed that the nickel transporter genes (nikABCDE) are downregulated in the Q- mutant, even when nickel is not added. This mutant is therefore primed to resist to high nickel levels. Downstream analysis of the transcriptomic data suggested that the absence of Q triggers an atypical oxidative stress response, confirmed by the detection of slightly elevated reactive oxygen species (ROS) levels in the mutant, increased sensitivity to hydrogen peroxide and paraquat, and a subtle growth phenotype in a strain prone to accumulation of ROS.


Subject(s)
Escherichia coli K12 , Nucleoside Q , Anticodon , Cadmium , Cobalt , Escherichia coli K12/genetics , Escherichia coli K12/metabolism , Homeostasis , Hydrogen Peroxide , Nickel , Nucleoside Q/metabolism , Oxidative Stress , Paraquat , Phenotype , Proteomics , RNA, Transfer/genetics , RNA, Transfer/metabolism , Reactive Oxygen Species
12.
Cell Rep Methods ; 2(8): 100269, 2022 08 22.
Article in English | MEDLINE | ID: mdl-36046619

ABSTRACT

B and T cell receptor (immune) repertoires can represent an individual's immune history. While current repertoire analysis methods aim to discriminate between health and disease states, they are typically based on only a limited number of parameters. Here, we introduce immuneREF: a quantitative multidimensional measure of adaptive immune repertoire (and transcriptome) similarity that allows interpretation of immune repertoire variation by relying on both repertoire features and cross-referencing of simulated and experimental datasets. To quantify immune repertoire similarity landscapes across health and disease, we applied immuneREF to >2,400 datasets from individuals with varying immune states (healthy, [autoimmune] disease, and infection). We discovered, in contrast to the current paradigm, that blood-derived immune repertoires of healthy and diseased individuals are highly similar for certain immune states, suggesting that repertoire changes to immune perturbations are less pronounced than previously thought. In conclusion, immuneREF enables the population-wide study of adaptive immune response similarity across immune states.


Subject(s)
Adaptive Immunity , Autoimmune Diseases , Humans , Receptors, Antigen, T-Cell/genetics , Receptors, Immunologic
14.
Database (Oxford) ; 20222022 08 12.
Article in English | MEDLINE | ID: mdl-35961013

ABSTRACT

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.


Subject(s)
Genomics , Proteins , Base Sequence , Computational Biology , Genome , Molecular Sequence Annotation
15.
Genome Biol ; 23(1): 149, 2022 07 07.
Article in English | MEDLINE | ID: mdl-35799267

ABSTRACT

BACKGROUND: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. RESULTS: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. CONCLUSIONS: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.


Subject(s)
Arabidopsis , Transcriptome , Alternative Splicing , Arabidopsis/genetics , Gene Expression Profiling/methods , RNA-Seq , Sequence Analysis, RNA/methods
16.
Obes Surg ; 32(8): 2598-2604, 2022 08.
Article in English | MEDLINE | ID: mdl-35687255

ABSTRACT

PURPOSE: Bariatric surgery is currently considered the most effective and durable treatment option for morbid obesity. Laparoscopic sleeve gastrectomy (LSG) has become a popular technique and may currently be the most frequently practiced surgical operation to treat obesity. However, no objective analyses of its learning curve have been reported. OBJECTIVE: to analyze the learning curve for LSG. MATERIALS AND METHODS: We included all LSGs performed in our hospital (University Hospital, Spain; Public Practice) from April 2013 to February 2016. The learning curve for LSG was evaluated using cumulative sum (CUSUM) analysis. All variables among the learning curve phases were compared. RESULTS: According to the CUSUM analysis, the learning curve was divided into three unique phases: early learning (the initial 26 patients), acquisition of skills (the middle 30 patients), and mastery of technique (the final 56 patients). The operative time and gastric stenosis significantly decreased with progression of the learning curve without differences in the 30-day postoperative complication rate, postoperative stay, or weight loss. CONCLUSION: According to this study, the learning curve for LSG can be divided into 3 distinct phases, and about 25 patients are needed to demonstrate an improvement in surgical skill.


Subject(s)
Laparoscopy , Obesity, Morbid , Gastrectomy/methods , Humans , Laparoscopy/methods , Learning Curve , Obesity, Morbid/surgery , Postoperative Complications/epidemiology , Postoperative Complications/surgery , Retrospective Studies , Treatment Outcome
17.
Nucleic Acids Res ; 50(W1): W551-W559, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35609982

ABSTRACT

PaintOmics is a web server for the integrative analysis and visualisation of multi-omics datasets using biological pathway maps. PaintOmics 4 has several notable updates that improve and extend analyses. Three pathway databases are now supported: KEGG, Reactome and MapMan, providing more comprehensive pathway knowledge for animals and plants. New metabolite analysis methods fill gaps in traditional pathway-based enrichment methods. The metabolite hub analysis selects compounds with a high number of significant genes in their neighbouring network, suggesting regulation by gene expression changes. The metabolite class activity analysis tests the hypothesis that a metabolic class has a higher-than-expected proportion of significant elements, indicating that these compounds are regulated in the experiment. Finally, PaintOmics 4 includes a regulatory omics module to analyse the contribution of trans-regulatory layers (microRNA and transcription factors, RNA-binding proteins) to regulate pathways. We show the performance of PaintOmics 4 on both mouse and plant data to highlight how these new analysis features provide novel insights into regulatory biology. PaintOmics 4 is available at https://paintomics.org/.


Subject(s)
MicroRNAs , Multiomics , Animals , Mice , Databases, Factual , MicroRNAs/genetics , Transcription Factors , Computational Biology/methods
18.
Genetics ; 221(4)2022 07 30.
Article in English | MEDLINE | ID: mdl-35579358

ABSTRACT

We examine the impact of sustained elevated ozone concentration on the leaf transcriptome of 5 diverse maize inbred genotypes, which vary in physiological sensitivity to ozone (B73, Mo17, Hp301, C123, and NC338), using long reads to assemble transcripts and short reads to quantify expression of these transcripts. More than 99% of the long reads, 99% of the assembled transcripts, and 97% of the short reads map to both B73 and Mo17 reference genomes. Approximately 95% of the genes with assembled transcripts belong to known B73-Mo17 syntenic loci and 94% of genes with assembled transcripts are present in all temperate lines in the nested association mapping pan-genome. While there is limited evidence for alternative splicing in response to ozone stress, there is a difference in the magnitude of differential expression among the 5 genotypes. The transcriptional response to sustained ozone stress in the ozone resistant B73 genotype (151 genes) was modest, while more than 3,300 genes were significantly differentially expressed in the more sensitive NC338 genotype. There is the potential for tandem duplication in 30% of genes with assembled transcripts, but there is no obvious association between potential tandem duplication and differential expression. Genes with a common response across the 5 genotypes (83 genes) were associated with photosynthesis, in particular photosystem I. The functional annotation of genes not differentially expressed in B73 but responsive in the other 4 genotypes (789) identifies reactive oxygen species. This suggests that B73 has a different response to long-term ozone exposure than the other 4 genotypes. The relative magnitude of the genotypic response to ozone, and the enrichment analyses are consistent regardless of whether aligning short reads to: long read assembled transcripts; the B73 reference; the Mo17 reference. We find that prolonged ozone exposure directly impacts the photosynthetic machinery of the leaf.


Subject(s)
Ozone , Zea mays , Gene Expression Regulation, Plant , Genotype , Ozone/metabolism , Ozone/toxicity , Plant Leaves/genetics , Plant Leaves/metabolism , Transcriptome , Zea mays/genetics , Zea mays/metabolism
19.
Nat Commun ; 13(1): 1828, 2022 04 05.
Article in English | MEDLINE | ID: mdl-35383181

ABSTRACT

Alternative splicing (AS) is a highly-regulated post-transcriptional mechanism known to modulate isoform expression within genes and contribute to cell-type identity. However, the extent to which alternative isoforms establish co-expression networks that may be relevant in cellular function has not been explored yet. Here, we present acorde, a pipeline that successfully leverages bulk long reads and single-cell data to confidently detect alternative isoform co-expression relationships. To achieve this, we develop and validate percentile correlations, an innovative approach that overcomes data sparsity and yields accurate co-expression estimates from single-cell data. Next, acorde uses correlations to cluster co-expressed isoforms into a network, unraveling cell type-specific alternative isoform usage patterns. By selecting same-gene isoforms between these clusters, we subsequently detect and characterize genes with co-differential isoform usage (coDIU) across cell types. Finally, we predict functional elements from long read-defined isoforms and provide insight into biological processes, motifs, and domains potentially controlled by the coordination of post-transcriptional regulation. The code for acorde is available at https://github.com/ConesaLab/acorde .


Subject(s)
Alternative Splicing , Protein Isoforms/genetics , Protein Isoforms/metabolism , Sequence Analysis, RNA
20.
Genome Biol ; 23(1): 69, 2022 03 03.
Article in English | MEDLINE | ID: mdl-35241129

ABSTRACT

BACKGROUND: The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. RESULTS: We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. CONCLUSIONS: Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.


Subject(s)
Proteogenomics , Alternative Splicing , Humans , Protein Isoforms/genetics , Proteomics , Sequence Analysis, RNA/methods , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL
...