Pesquisa | Portal Regional da BVS

WormBase 2024: status and transitioning to Alliance infrastructure.

Sternberg, Paul W; Van Auken, Kimberly; Wang, Qinghua; Wright, Adam; Yook, Karen; Zarowiecki, Magdalena; Arnaboldi, Valerio; Becerra, Andrés; Brown, Stephanie; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Cho, Jaehyoung; Davis, Paul; Diamantakis, Stavros; Dyer, Sarah; Grigoriadis, Dionysis; Grove, Christian A; Harris, Todd; Howe, Kevin; Kishore, Ranjana; Lee, Raymond; Longden, Ian; Luypaert, Manuel; Müller, Hans-Michael; Nuin, Paulo; Quinton-Tulloch, Mark; Raciti, Daniela; Schedl, Tim; Schindelman, Gary; Stein, Lincoln.

Genetics ; 227(1)2024 05 07.

Artigo em Inglês | MEDLINE | ID: mdl-38573366

RESUMO

WormBase has been the major repository and knowledgebase of information about the genome and genetics of Caenorhabditis elegans and other nematodes of experimental interest for over 2 decades. We have 3 goals: to keep current with the fast-paced C. elegans research, to provide better integration with other resources, and to be sustainable. Here, we discuss the current state of WormBase as well as progress and plans for moving core WormBase infrastructure to the Alliance of Genome Resources (the Alliance). As an Alliance member, WormBase will continue to interact with the C. elegans community, develop new features as needed, and curate key information from the literature and large-scale projects.

Assuntos

Caenorhabditis elegans , Caenorhabditis elegans/genética , Animais , Bases de Dados Genéticas , Genoma Helmíntico , Genômica/métodos

WormBase single-cell tools.

da Veiga Beltrame, Eduardo; Arnaboldi, Valerio; Sternberg, Paul W.

Bioinform Adv ; 2(1): vbac018, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35814290

RESUMO

: We present two web apps for interactively performing common tasks with single-cell RNA sequencing data: scdefg for differential expression and wormcells-viz for visualization of gene expression. We deployed these tools with public Caenorhabditis elegans datasets curated by WormBase at https://single-cell.wormbase.org. Source code for deploying these tools with other datasets is available at https://github.com/WormBase/scdefg and https://github.com/WormBase/wormcells-viz. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

Accelerated variant curation from scientific literature using biomedical text mining.

Mallick, Rishab; Arnaboldi, Valerio; Davis, Paul; Diamantakis, Stavros; Zarowiecki, Magdalena; Howe, Kevin.

MicroPubl Biol ; 20222022.

Artigo em Inglês | MEDLINE | ID: mdl-35663412

RESUMO

Biological databases collect and standardize data through biocuration. Even though major model organism databases have adopted some automation of curation methods, a large portion of biocuration is still performed manually. To speed up the extraction of the genomic positions of variants, we have developed a hybrid approach that combines regular expressions, Named Entity Recognition based on BERT (Bidirectional Encoder Representations from Transformers) and bag-of-words to extract variant genomic locations from C. elegans papers for WormBase. Our model has a precision of 82.59% for the gene-mutation matches tested on extracted text from 100 papers, and even recovers some data not discovered during manual curation. Code at: https://github.com/WormBase/genomic-info-from-papers.

WormBase in 2022-data, processes, and tools for analyzing Caenorhabditis elegans.

Davis, Paul; Zarowiecki, Magdalena; Arnaboldi, Valerio; Becerra, Andrés; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Cho, Jaehyoung; da Veiga Beltrame, Eduardo; Diamantakis, Stavros; Gao, Sibyl; Grigoriadis, Dionysis; Grove, Christian A; Harris, Todd W; Kishore, Ranjana; Le, Tuan; Lee, Raymond Y N; Luypaert, Manuel; Müller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Quinton-Tulloch, Mark; Raciti, Daniela; Rodgers, Faye H; Russell, Matthew; Schindelman, Gary; Singh, Archana; Stickland, Tim; Van Auken, Kimberly; Wang, Qinghua; Williams, Gary; Wright, Adam J; Yook, Karen; Berriman, Matt; Howe, Kevin L; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

Genetics ; 220(4)2022 04 04.

Artigo em Inglês | MEDLINE | ID: mdl-35134929

RESUMO

WormBase (www.wormbase.org) is the central repository for the genetics and genomics of the nematode Caenorhabditis elegans. We provide the research community with data and tools to facilitate the use of C. elegans and related nematodes as model organisms for studying human health, development, and many aspects of fundamental biology. Throughout our 22-year history, we have continued to evolve to reflect progress and innovation in the science and technologies involved in the study of C. elegans. We strive to incorporate new data types and richer data sets, and to provide integrated displays and services that avail the knowledge generated by the published nematode genetics literature. Here, we provide a broad overview of the current state of WormBase in terms of data type, curation workflows, analysis, and tools, including exciting new advances for analysis of single-cell data, text mining and visualization, and the new community collaboration forum. Concurrently, we continue the integration and harmonization of infrastructure, processes, and tools with the Alliance of Genome Resources, of which WormBase is a founding member.

Assuntos

Caenorhabditis , Nematoides , Animais , Caenorhabditis/genética , Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genoma , Genômica , Humanos , Nematoides/genética

Wormicloud: a new text summarization tool based on word clouds to explore the C. elegans literature.

Arnaboldi, Valerio; Cho, Jaehyoung; Sternberg, Paul W.

Database (Oxford) ; 20212021 03 31.

Artigo em Inglês | MEDLINE | ID: mdl-33787871

RESUMO

Finding relevant information from newly published scientific papers is becoming increasingly difficult due to the pace at which articles are published every year as well as the increasing amount of information per paper. Biocuration and model organism databases provide a map for researchers to navigate through the complex structure of the biomedical literature by distilling knowledge into curated and standardized information. In addition, scientific search engines such as PubMed and text-mining tools such as Textpresso allow researchers to easily search for specific biological aspects from newly published papers, facilitating knowledge transfer. However, digesting the information returned by these systems-often a large number of documents-still requires considerable effort. In this paper, we present Wormicloud, a new tool that summarizes scientific articles in a graphical way through word clouds. This tool is aimed at facilitating the discovery of new experimental results not yet curated by model organism databases and is designed for both researchers and biocurators. Wormicloud is customized for the Caenorhabditis elegans literature and provides several advantages over existing solutions, including being able to perform full-text searches through Textpresso, which provides more accurate results than other existing literature search engines. Wormicloud is integrated through direct links from gene interaction pages in WormBase. Additionally, it allows analysis on the gene sets obtained from literature searches with other WormBase tools such as SimpleMine and Gene Set Enrichment. Database URL: https://wormicloud.textpressolab.com.

Assuntos

Caenorhabditis elegans , Mineração de Dados , Animais , Caenorhabditis elegans/genética , PubMed , Publicações , Ferramenta de Busca

Automated generation of gene summaries at the Alliance of Genome Resources.

Kishore, Ranjana; Arnaboldi, Valerio; Van Slyke, Ceri E; Chan, Juancarlos; Nash, Robert S; Urbano, Jose M; Dolan, Mary E; Engel, Stacia R; Shimoyama, Mary; Sternberg, Paul W; Genome Resources, The Alliance Of.

Database (Oxford) ; 20202020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-32559296

RESUMO

Short paragraphs that describe gene function, referred to as gene summaries, are valued by users of biological knowledgebases for the ease with which they convey key aspects of gene function. Manual curation of gene summaries, while desirable, is difficult for knowledgebases to sustain. We developed an algorithm that uses curated, structured gene data at the Alliance of Genome Resources (Alliance; www.alliancegenome.org) to automatically generate gene summaries that simulate natural language. The gene data used for this purpose include curated associations (annotations) to ontology terms from the Gene Ontology, Disease Ontology, model organism knowledgebase (MOK)-specific anatomy ontologies and Alliance orthology data. The method uses sentence templates for each data category included in the gene summary in order to build a natural language sentence from the list of terms associated with each gene. To improve readability of the summaries when numerous gene annotations are present, we developed a new algorithm that traverses ontology graphs in order to group terms by their common ancestors. The algorithm optimizes the coverage of the initial set of terms and limits the length of the final summary, using measures of information content of each ontology term as a criterion for inclusion in the summary. The automated gene summaries are generated with each Alliance release, ensuring that they reflect current data at the Alliance. Our method effectively leverages category-specific curation efforts of the Alliance member databases to create modular, structured and standardized gene summaries for seven member species of the Alliance. These automatically generated gene summaries make cross-species gene function comparisons tenable and increase discoverability of potential models of human disease. In addition to being displayed on Alliance gene pages, these summaries are also included on several MOK gene pages.

Assuntos

Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular/métodos , Ontologia Genética , Armazenamento e Recuperação da Informação

Text mining meets community curation: a newly designed curation platform to improve author experience and participation at WormBase.

Arnaboldi, Valerio; Raciti, Daniela; Van Auken, Kimberly; Chan, Juancarlos N; Müller, Hans-Michael; Sternberg, Paul W.

Database (Oxford) ; 20202020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-32185395

RESUMO

Biological knowledgebases rely on expert biocuration of the research literature to maintain up-to-date collections of data organized in machine-readable form. To enter information into knowledgebases, curators need to follow three steps: (i) identify papers containing relevant data, a process called triaging; (ii) recognize named entities; and (iii) extract and curate data in accordance with the underlying data models. WormBase (WB), the authoritative repository for research data on Caenorhabditis elegans and other nematodes, uses text mining (TM) to semi-automate its curation pipeline. In addition, WB engages its community, via an Author First Pass (AFP) system, to help recognize entities and classify data types in their recently published papers. In this paper, we present a new WB AFP system that combines TM and AFP into a single application to enhance community curation. The system employs string-searching algorithms and statistical methods (e.g. support vector machines (SVMs)) to extract biological entities and classify data types, and it presents the results to authors in a web form where they validate the extracted information, rather than enter it de novo as the previous form required. With this new system, we lessen the burden for authors, while at the same time receive valuable feedback on the performance of our TM tools. The new user interface also links out to specific structured data submission forms, e.g. for phenotype or expression pattern data, giving the authors the opportunity to contribute a more detailed curation that can be incorporated into WB with minimal curator review. Our approach is generalizable and could be applied to additional knowledgebases that would like to engage their user community in assisting with the curation. In the five months succeeding the launch of the new system, the response rate has been comparable with that of the previous AFP version, but the quality and quantity of the data received has greatly improved.

Assuntos

Caenorhabditis elegans/genética , Curadoria de Dados/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Bases de Conhecimento , Animais , Caenorhabditis elegans/metabolismo , Internet , Máquina de Vetores de Suporte , Interface Usuário-Computador

WormBase: a modern Model Organism Information Resource.

Harris, Todd W; Arnaboldi, Valerio; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Cho, Jaehyoung; Davis, Paul; Gao, Sibyl; Grove, Christian A; Kishore, Ranjana; Lee, Raymond Y N; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Rodgers, Faye H; Russell, Matthew; Schindelman, Gary; Auken, Kimberly V; Wang, Qinghua; Williams, Gary; Wright, Adam J; Yook, Karen; Howe, Kevin L; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

Nucleic Acids Res ; 48(D1): D762-D767, 2020 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-31642470

RESUMO

WormBase (https://wormbase.org/) is a mature Model Organism Information Resource supporting researchers using the nematode Caenorhabditis elegans as a model system for studies across a broad range of basic biological processes. Toward this mission, WormBase efforts are arranged in three primary facets: curation, user interface and architecture. In this update, we describe progress in each of these three areas. In particular, we discuss the status of literature curation and recently added data, detail new features of the web interface and options for users wishing to conduct data mining workflows, and discuss our efforts to build a robust and scalable architecture by leveraging commercial cloud offerings. We conclude with a description of WormBase's role as a founding member of the nascent Alliance of Genome Resources.

Assuntos

Caenorhabditis elegans/genética , Bases de Dados Genéticas , Genes de Helmintos , Animais , Mineração de Dados , Genômica , Internet , Interface Usuário-Computador

WormBase 2017: molting into a new stage.

Lee, Raymond Y N; Howe, Kevin L; Harris, Todd W; Arnaboldi, Valerio; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Davis, Paul; Gao, Sibyl; Grove, Christian; Kishore, Ranjana; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Rodgers, Faye; Russell, Matt; Schindelman, Gary; Tuli, Mary Ann; Van Auken, Kimberly; Wang, Qinghua; Williams, Gary; Wright, Adam; Yook, Karen; Berriman, Matthew; Kersey, Paul; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

Nucleic Acids Res ; 46(D1): D869-D874, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29069413

RESUMO

WormBase (http://www.wormbase.org) is an important knowledge resource for biomedical researchers worldwide. To accommodate the ever increasing amount and complexity of research data, WormBase continues to advance its practices on data acquisition, curation and retrieval to most effectively deliver comprehensive knowledge about Caenorhabditis elegans, and genomic information about other nematodes and parasitic flatworms. Recent notable enhancements include user-directed submission of data, such as micropublication; genomic data curation and presentation, including additional genomes and JBrowse, respectively; new query tools, such as SimpleMine, Gene Enrichment Analysis; new data displays, such as the Person Lineage browser and the Summary of Ontology-based Annotations. Anticipating more rapid data growth ahead, WormBase continues the process of migrating to a cutting-edge database technology to achieve better stability, scalability, reproducibility and a faster response time. To better serve the broader research community, WormBase, with five other Model Organism Databases and The Gene Ontology project, have begun to collaborate formally as the Alliance of Genome Resources.

Assuntos

Bases de Dados Genéticas , Genoma , Nematoides/genética , Animais , Caenorhabditis/genética , Caenorhabditis elegans/genética , Curadoria de Dados , Mineração de Dados , Conjuntos de Dados como Assunto , Modelos Animais de Doenças , Previsões , Ontologia Genética , Humanos , Armazenamento e Recuperação da Informação , Platelmintos/genética , Editoração , Interferência de RNA , Alinhamento de Sequência , Interface Usuário-Computador , Navegador

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA