Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 43
Filter
1.
Br J Clin Pharmacol ; 88(10): 4297-4310, 2022 10.
Article in English | MEDLINE | ID: mdl-34907575

ABSTRACT

Pharmacogenomics (PGx) relates to the study of genetic factors determining variability in drug response. Implementing PGx testing in paediatric patients can enhance drug safety, helping to improve drug efficacy or reduce the risk of toxicity. Despite its clinical relevance, the implementation of PGx testing in paediatric practice to date has been variable and limited. As with most paediatric pharmacological studies, there are well-recognised barriers to obtaining high-quality PGx evidence, particularly when patient numbers may be small, and off-label or unlicensed prescribing remains widespread. Furthermore, trials enrolling small numbers of children can rarely, in isolation, provide sufficient PGx evidence to change clinical practice, so extrapolation from larger PGx studies in adult patients, where scientifically sound, is essential. This review paper discusses the relevance of PGx to paediatrics and considers implementation strategies from a child health perspective. Examples are provided from Canada, the Netherlands and the UK, with consideration of the different healthcare systems and their distinct approaches to implementation, followed by future recommendations based on these cumulative experiences. Improving the evidence base demonstrating the clinical utility and cost-effectiveness of paediatric PGx testing will be critical to drive implementation forwards. International, interdisciplinary collaborations will enhance paediatric data collation, interpretation and evidence curation, while also supporting dedicated paediatric PGx educational initiatives. PGx consortia and paediatric clinical research networks will continue to play a central role in the streamlined development of effective PGx implementation strategies to help optimise paediatric pharmacotherapy.


Subject(s)
Pediatrics , Pharmacogenomic Testing , Child , Cost-Benefit Analysis , Humans , Netherlands , Pharmacogenetics
2.
Dement Geriatr Cogn Disord ; 49(3): 295-302, 2020.
Article in English | MEDLINE | ID: mdl-32854092

ABSTRACT

INTRODUCTION: Caregivers for people with dementia face a number of challenges such as changing family relationships, social isolation, or financial difficulties. Internet usage and social media are increasingly being recognised as resources to increase support and general public health. OBJECTIVE: Using automated analysis, the aim of this study was to explore (i) the age and sex of people who post to the social media forum Reddit about dementia diagnoses, (ii) the affected person and their diagnosis, (iii) which subreddits authors are posting to, (iv) the types of messages posted, and (v) the content of these posts. METHODS: We analysed Reddit posts concerning dementia diagnoses and used a previously developed text analysis pipeline to determine attributes of the posts and their authors. The posts were further examined through manual annotation of the diagnosis provided and the person affected. Lastly, we investigated the communities posters engage with and assessed the contents of the posts with an automated topic gathering/clustering technique. RESULTS: Five hundred and thirty-five Reddit posts were identified as relevant and further processed. The majority of posters in our dataset are females and predominantly close relatives, such as parents and grandparents, are mentioned. The communities frequented and topics gathered reflect not only the person's diagnosis but also potential outcomes, for example hardships experienced by the caregiver or the requirement for legal support. CONCLUSIONS: This work demonstrates the value of social media data as a resource for in-depth examination of caregivers' experience after a dementia diagnosis. It is important to study groups actively posting online, both in topic-specific and general communities, as they are most likely to benefit from novel internet-based support systems or interventions.


Subject(s)
Caregivers/psychology , Dementia , Internet-Based Intervention/statistics & numerical data , Social Media/statistics & numerical data , Social Support , Dementia/diagnosis , Dementia/economics , Dementia/psychology , Family Relations , Financial Stress , Humans , Social Isolation
3.
Nucleic Acids Res ; 47(D1): D766-D773, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30357393

ABSTRACT

The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.


Subject(s)
Databases, Genetic , Genome, Human/genetics , Genomics , Pseudogenes/genetics , Animals , Computational Biology , Humans , Internet , Mice , Molecular Sequence Annotation , Software
5.
Database (Oxford) ; 2017(1)2017 01 01.
Article in English | MEDLINE | ID: mdl-28365743

ABSTRACT

Neurodegenerative disorders such as Parkinson's and Alzheimer's disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F 1 -measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process. Database URL: https://github.com/KHP-Informatics/NapEasy.


Subject(s)
Alzheimer Disease , Data Curation/methods , Data Mining/methods , Parkinson Disease , Alzheimer Disease/genetics , Alzheimer Disease/metabolism , Animals , Data Curation/standards , Data Mining/standards , Humans , Parkinson Disease/genetics , Parkinson Disease/metabolism
6.
Sci Rep ; 7: 45141, 2017 03 22.
Article in English | MEDLINE | ID: mdl-28327593

ABSTRACT

The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients' own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of 'in the moment' daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.

7.
PLoS One ; 12(2): e0171526, 2017.
Article in English | MEDLINE | ID: mdl-28207753

ABSTRACT

The UK government has recently recognised the need to improve mental health services in the country. Electronic health records provide a rich source of patient data which could help policymakers to better understand needs of the service users. The main objective of this study is to unveil statistics of diagnoses recorded in the Case Register of the South London and Maudsley NHS Foundation Trust, one of the largest mental health providers in the UK and Europe serving a source population of over 1.2 million people residing in south London. Based on over 500,000 diagnoses recorded in ICD10 codes for a cohort of approximately 200,000 mental health patients, we established frequency rate of each diagnosis (the ratio of the number of patients for whom a diagnosis has ever been recorded to the number of patients in the entire population who have made contact with mental disorders). We also investigated differences in diagnoses prevalence between subgroups of patients stratified by gender and ethnicity. The most common diagnoses in the considered population were (recurrent) depression (ICD10 codes F32-33; 16.4% of patients), reaction to severe stress and adjustment disorders (F43; 7.1%), mental/behavioural disorders due to use of alcohol (F10; 6.9%), and schizophrenia (F20; 5.6%). We also found many diagnoses which were more likely to be recorded in patients of a certain gender or ethnicity. For example, mood (affective) disorders (F31-F39); neurotic, stress-related and somatoform disorders (F40-F48, except F42); and eating disorders (F50) were more likely to be found in records of female patients, while males were more likely to be diagnosed with mental/behavioural disorders due to psychoactive substance use (F10-F19). Furthermore, mental/behavioural disorders due to use of alcohol and opioids were more likely to be recorded in patients of white ethnicity, and disorders due to use of cannabinoids in those of black ethnicity.


Subject(s)
Electronic Health Records/statistics & numerical data , Mental Disorders/diagnosis , Mental Health/statistics & numerical data , Registries/statistics & numerical data , Female , Humans , Male
8.
Bioinformatics ; 31(24): 4029-31, 2015 Dec 15.
Article in English | MEDLINE | ID: mdl-26315906

ABSTRACT

UNLABELLED: High-throughput sequencing technologies survey genetic variation at genome scale and are increasingly used to study the contribution of rare and low-frequency genetic variants to human traits. As part of the Cohorts arm of the UK10K project, genetic variants called from low-read depth (average 7×) whole genome sequencing of 3621 cohort individuals were analysed for statistical associations with 64 different phenotypic traits of biomedical importance. Here, we describe a novel genome browser based on the Biodalliance platform developed to provide interactive access to the association results of the project. AVAILABILITY AND IMPLEMENTATION: The browser is available at http://www.uk10k.org/dalliance.html. Source code for the Biodalliance platform is available under a BSD license from http://github.com/dasmoth/dalliance, and for the LD-display plugin and backend from http://github.com/dasmoth/ldserv.


Subject(s)
Genetic Association Studies , Genetic Variation , Genome, Human , Software , High-Throughput Nucleotide Sequencing , Humans , Linkage Disequilibrium
9.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Article in English | MEDLINE | ID: mdl-25164755

ABSTRACT

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Subject(s)
Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Gene Expression Profiling , Transcriptome/genetics , Animals , Caenorhabditis elegans/embryology , Caenorhabditis elegans/growth & development , Chromatin/genetics , Cluster Analysis , Drosophila melanogaster/growth & development , Gene Expression Regulation, Developmental/genetics , Histones/metabolism , Humans , Larva/genetics , Larva/growth & development , Models, Genetic , Molecular Sequence Annotation , Promoter Regions, Genetic/genetics , Pupa/genetics , Pupa/growth & development , RNA, Untranslated/genetics , Sequence Analysis, RNA
10.
Am J Med Genet C Semin Med Genet ; 166C(1): 93-104, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24634402

ABSTRACT

Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: (1) identify clinically valid genetic variants; (2) decide whether they are actionable and what the action should be; and (3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop.


Subject(s)
Genetic Variation/genetics , Medical Informatics/methods , Phenotype , Precision Medicine/methods , Education , Humans , Information Dissemination/methods , National Human Genome Research Institute (U.S.) , Precision Medicine/trends , United States
11.
Nucleic Acids Res ; 42(Database issue): D749-55, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24316576

ABSTRACT

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.


Subject(s)
Databases, Genetic , Genomics , Animals , Chordata/genetics , Genetic Variation , Humans , Internet , Mice , Molecular Sequence Annotation , Phenotype , Rats
12.
Nat Methods ; 10(12): 1185-91, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24185836

ABSTRACT

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.


Subject(s)
RNA Splicing , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Animals , Chromosome Mapping/methods , Computational Biology/methods , Exons , False Positive Reactions , High-Throughput Nucleotide Sequencing/methods , Humans , K562 Cells , Mice , RNA, Messenger/metabolism , Reproducibility of Results , Software
13.
Nat Methods ; 10(12): 1177-84, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24185837

ABSTRACT

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.


Subject(s)
Computational Biology/methods , RNA Splicing , Sequence Analysis, RNA/methods , Algorithms , Animals , Caenorhabditis elegans , Drosophila melanogaster , Exons , Gene Expression Profiling , Genome , Humans , Introns , RNA Splice Sites , RNA, Messenger/metabolism , Software
14.
PLoS One ; 8(7): e69853, 2013.
Article in English | MEDLINE | ID: mdl-23922824

ABSTRACT

BACKGROUND: DNase I is an enzyme which cuts duplex DNA at a rate that depends strongly upon its chromatin environment. In combination with high-throughput sequencing (HTS) technology, it can be used to infer genome-wide landscapes of open chromatin regions. Using this technology, systematic identification of hundreds of thousands of DNase I hypersensitive sites (DHS) per cell type has been possible, and this in turn has helped to precisely delineate genomic regulatory compartments. However, to date there has been relatively little investigation into possible biases affecting this data. RESULTS: We report a significant degree of sequence preference spanning sites cut by DNase I in a number of published data sets. The two major protocols in current use each show a different pattern, but for a given protocol the pattern of sequence specificity seems to be quite consistent. The patterns are substantially different from biases seen in other types of HTS data sets, and in some cases the most constrained position lies outside the sequenced fragment, implying that this constraint must relate to the digestion process rather than events occurring during library preparation or sequencing. CONCLUSIONS: DNase I is a sequence-specific enzyme, with a specificity that may depend on experimental conditions. This sequence specificity is not taken into account by existing pipelines for identifying open chromatin regions. Care must be taken when interpreting DNase I results, especially when looking at the precise locations of the reads. Future studies may be able to improve the sensitivity and precision of chromatin state measurement by compensating for sequence bias.


Subject(s)
Chromatin/metabolism , Databases, Nucleic Acid , Deoxyribonuclease I/metabolism , Base Sequence , Bias , Cell Line , Humans , Molecular Sequence Data , Nucleotide Motifs/genetics , Substrate Specificity
15.
Nucleic Acids Res ; 41(Database issue): D48-55, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23203987

ABSTRACT

The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.


Subject(s)
Databases, Genetic , Genomics , Animals , Gene Expression Regulation , Genetic Variation , Humans , Internet , Mice , Molecular Sequence Annotation , Rats , Software , Zebrafish/genetics
16.
Genome Biol ; 13(9): R51, 2012 Sep 26.
Article in English | MEDLINE | ID: mdl-22951037

ABSTRACT

BACKGROUND: Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data. RESULTS: As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection. CONCLUSIONS: At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.


Subject(s)
Genome, Human , Pseudogenes , Transcription, Genetic , Animals , Binding Sites , Chromatin/chemistry , Chromatin/genetics , Humans , Models, Genetic , Models, Statistical , Molecular Sequence Annotation , Phylogeny , Primates , RNA Polymerase II/metabolism , Regulatory Sequences, Nucleic Acid , Selection, Genetic , Sequence Analysis, DNA , Transcription Factors/metabolism
17.
Genome Res ; 22(9): 1698-710, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955982

ABSTRACT

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.


Subject(s)
Gene Expression Profiling/methods , Genome, Human , Transcriptome , Computational Biology/methods , Exons , High-Throughput Nucleotide Sequencing , Humans , Introns , Molecular Sequence Annotation , Open Reading Frames , RNA Isoforms , RNA, Messenger/chemistry , RNA, Messenger/genetics , Reproducibility of Results , Reverse Transcriptase Polymerase Chain Reaction , Sensitivity and Specificity
18.
Genome Res ; 22(9): 1760-74, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955987

ABSTRACT

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Subject(s)
Databases, Genetic , Genome, Human , Genomics/methods , Molecular Sequence Annotation , Animals , Computational Biology/methods , DNA, Complementary/chemistry , DNA, Complementary/genetics , Evolution, Molecular , Exons , Genetic Loci , Humans , Internet , Models, Molecular , Open Reading Frames , Pseudogenes , Quality Control , RNA Splice Sites , RNA, Long Noncoding , Reproducibility of Results , Untranslated Regions
19.
Genome Res ; 22(9): 1775-89, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955988

ABSTRACT

The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.


Subject(s)
Databases, Genetic , RNA, Long Noncoding/genetics , Alternative Splicing , Animals , Cell Nucleus/genetics , Cell Nucleus/metabolism , Cluster Analysis , Evolution, Molecular , Exons , Gene Expression Profiling , Gene Expression Regulation , Histones/metabolism , Humans , Molecular Sequence Annotation , Open Reading Frames , Organ Specificity/genetics , Primates/genetics , RNA Processing, Post-Transcriptional , RNA Splice Sites , RNA, Messenger/genetics , Selection, Genetic , Transcription, Genetic
20.
Nucleic Acids Res ; 40(Database issue): D84-90, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22086963

ABSTRACT

The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.


Subject(s)
Databases, Genetic , Genomics , Animals , Gene Expression Regulation , Genetic Variation , Humans , Mice , Molecular Sequence Annotation , Rats
SELECTION OF CITATIONS
SEARCH DETAIL
...