Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Nat Commun ; 12(1): 892, 2021 02 09.
Article in English | MEDLINE | ID: mdl-33563972

ABSTRACT

Given their copy number differences and unique modes of inheritance, the evolved gene content and expression of sex chromosomes is unusual. In many organisms the X and Y chromosomes are inactivated in spermatocytes, possibly as a defense mechanism against insertions into unpaired chromatin. In addition to current sex chromosomes, Drosophila has a small gene-poor X-chromosome relic (4th) that re-acquired autosomal status. Here we use single cell RNA-Seq on fly larvae to demonstrate that the single X and pair of 4th chromosomes are specifically inactivated in primary spermatocytes, based on measuring all genes or a set of broadly expressed genes in testis we identified. In contrast, genes on the single Y chromosome become maximally active in primary spermatocytes. Reduced X transcript levels are due to failed activation of RNA-Polymerase-II by phosphorylation of Serine 2 and 5.


Subject(s)
Drosophila/genetics , Sex Chromosomes/genetics , Spermatocytes/metabolism , Animals , Drosophila/growth & development , Gene Expression Regulation , Genes, X-Linked/genetics , Genes, Y-Linked/genetics , Larva/genetics , Larva/growth & development , Male , Organ Specificity , RNA Polymerase II/metabolism , Sex Chromosomes/metabolism , Spermatogenesis/genetics , Testis/cytology , Testis/metabolism , Transcription, Genetic
2.
Proc Int Conf Comput Ling ; 2020: 5640-5646, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33293900

ABSTRACT

Recent work has shown that pre-trained Transformers obtain remarkable performance on many natural language processing tasks, including automatic summarization. However, most work has focused on (relatively) data-rich single-document summarization settings. In this paper, we explore highly-abstractive multi-document summarization, where the summary is explicitly conditioned on a user-given topic statement or question. We compare the summarization quality produced by three state-of-the-art transformer-based models: BART, T5, and PEGASUS. We report the performance on four challenging summarization datasets: three from the general domain and one from consumer health in both zero-shot and few-shot learning settings. While prior work has shown significant differences in performance for these models on standard summarization tasks, our results indicate that with as few as 10 labeled examples, there is no statistically significant difference in summary quality, suggesting the need for more abstractive benchmark collections when determining state-of-the-art.

3.
Proc Conf Empir Methods Nat Lang Process ; 2020: 3215-3226, 2020 Nov.
Article in English | MEDLINE | ID: mdl-33364629

ABSTRACT

Automatic summarization research has traditionally focused on providing high quality general-purpose summaries of documents. However, there are many applications that require more specific summaries, such as supporting question answering or topic-based literature discovery. In this paper, we study the problem of conditional summarization in which content selection and surface realization are explicitly conditioned on an ad-hoc natural language question or topic description. Because of the difficulty in obtaining sufficient reference summaries to support arbitrary conditional summarization, we explore the use of multi-task fine-tuning (MTFT) on twenty-one natural language tasks to enable zero-shot conditional summarization on five tasks. We present four new summarization datasets, two novel "online" or adaptive task-mixing strategies, and report zero-shot performance using T5 and BART, demonstrating that MTFT can improve zero-shot summarization quality.

4.
Sci Data ; 7(1): 322, 2020 10 02.
Article in English | MEDLINE | ID: mdl-33009402

ABSTRACT

Automatic summarization of natural language is a widely studied area in computer science, one that is broadly applicable to anyone who needs to understand large quantities of information. In the medical domain, automatic summarization has the potential to make health information more accessible to people without medical expertise. However, to evaluate the quality of summaries generated by summarization algorithms, researchers first require gold standard, human generated summaries. Unfortunately there is no available data for the purpose of assessing summaries that help consumers of health information answer their questions. To address this issue, we present the MEDIQA-Answer Summarization dataset, the first dataset designed for question-driven, consumer-focused summarization. It contains 156 health questions asked by consumers, answers to these questions, and manually generated summaries of these answers. The dataset's unique structure allows it to be used for at least eight different types of summarization evaluations. We also benchmark the performance of baseline and state-of-the-art deep learning approaches on the dataset, demonstrating how it can be used to evaluate automatically generated summaries.


Subject(s)
Consumer Health Informatics , Information Storage and Retrieval , Natural Language Processing
5.
AMIA Jt Summits Transl Sci Proc ; 2020: 561-568, 2020.
Article in English | MEDLINE | ID: mdl-32477678

ABSTRACT

Chemical entity recognition is essential for indexing scientific literature in the MEDLINE database at the National Library of Medicine. However, the tool currently used to suggest terms for indexing, the Medical Text Indexer, was not originally conceived as a chemical recognition tool. It has instead been adapted to the task via its use of MetaMap and the addition of in-house patterns and rules. In order to develop a tool more suitable for chemical recognition, we have created a collection of 200 MEDLINE titles and abstracts annotated with genes, proteins, inorganic and organic chemicals, as well as other biological molecules. We use this collection to evaluate eleven chemical entity recognition systems, where we seek to identify a tool that effectively recognizes chemical entities for indexing and also performs well on chemical recognition beyond the indexing task. We observe the highest performance with a SciBERT ensemble.

6.
AMIA Annu Symp Proc ; 2019: 727-734, 2019.
Article in English | MEDLINE | ID: mdl-32308868

ABSTRACT

MEDLINE is the National Library of Medicine's premier bibliographic database for biomedical literature. A highly valuable feature of the database is that each record is manually indexed with a controlled vocabulary called MeSH. Most MEDLINE journals are indexed cover-to-cover, but there are about 200 selectively indexed journals for which only articles related to biomedicine and life sciences are indexed. In recent years, the selection process has become an increasing burden for indexing staff, and this paper presents a machine learning based system that offers very significant time savings by semi-automating the task. At the core of the system is a high recall classifier for the identification of journal articles that are in-scope for MEDLINE. The system is shown to reduce the number of articles requiring manual review by 54%, equivalent to approximately 40,000 articles per year.


Subject(s)
Abstracting and Indexing , MEDLINE , Machine Learning , Neural Networks, Computer , Medical Subject Headings , National Library of Medicine (U.S.) , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...