Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 24(1): 159, 2023 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-37081398

RESUMO

BACKGROUND: Biomedical researchers are strongly encouraged to make their research outputs more Findable, Accessible, Interoperable, and Reusable (FAIR). While many biomedical research outputs are more readily accessible through open data efforts, finding relevant outputs remains a significant challenge. Schema.org is a metadata vocabulary standardization project that enables web content creators to make their content more FAIR. Leveraging Schema.org could benefit biomedical research resource providers, but it can be challenging to apply Schema.org standards to biomedical research outputs. We created an online browser-based tool that empowers researchers and repository developers to utilize Schema.org or other biomedical schema projects. RESULTS: Our browser-based tool includes features which can help address many of the barriers towards Schema.org-compliance such as: The ability to easily browse for relevant Schema.org classes, the ability to extend and customize a class to be more suitable for biomedical research outputs, the ability to create data validation to ensure adherence of a research output to a customized class, and the ability to register a custom class to our schema registry enabling others to search and re-use it. We demonstrate the use of our tool with the creation of the Outbreak.info schema-a large multi-class schema for harmonizing various COVID-19 related resources. CONCLUSIONS: We have created a browser-based tool to empower biomedical research resource providers to leverage Schema.org classes to make their research outputs more FAIR.


Assuntos
Pesquisa Biomédica , COVID-19 , Humanos , Metadados
2.
Nucleic Acids Res ; 51(W1): W350-W356, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37070209

RESUMO

Gene definitions and identifiers can be painful to manage-more so when trying to include gene function annotations as this can be highly context-dependent. Creating groups of genes or gene sets can help provide such context, but it compounds the issue as each gene within the gene set can map to multiple identifiers and have annotations derived from multiple sources. We developed MyGeneset.info to provide an API for integrated annotations for gene sets suitable for use in analytical pipelines or web servers. Leveraging our previous work with MyGene.info (a server that provides gene-centric annotations and identifiers), MyGeneset.info addresses the challenge of managing gene sets from multiple resources. With our API, users readily have read-only access to gene sets imported from commonly-used resources such as Wikipathways, CTD, Reactome, SMPDB, MSigDB, GO, and DO. In addition to supporting the access and reuse of approximately 180k gene sets from humans, common model organisms (mice, yeast, etc.), and less-common ones (e.g. black cottonwood tree), MyGeneset.info supports user-created gene sets, providing an important means for making gene sets more FAIR. User-created gene sets can serve as a way to store and manage collections for analysis or easy dissemination through a consistent API.


Assuntos
Internet , Software , Humanos , Animais , Camundongos , Anotação de Sequência Molecular , Interface Usuário-Computador
3.
Sci Data ; 10(1): 99, 2023 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-36823157

RESUMO

Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.


Assuntos
Doenças Transmissíveis , Conjuntos de Dados como Assunto , Metadados , Reprodutibilidade dos Testes , Conjuntos de Dados como Assunto/normas , Humanos
5.
Nat Methods ; 20(4): 536-540, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36823331

RESUMO

Outbreak.info Research Library is a standardized, searchable interface of coronavirus disease 2019 (COVID-19) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) publications, clinical trials, datasets, protocols and other resources, built with a reusable framework. We developed a rigorous schema to enforce consistency across different sources and resource types and linked related resources. Researchers can quickly search the latest research across data repositories, regardless of resource type or repository location, via a search interface, public application programming interface (API) and R package.


Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Surtos de Doenças
6.
Nat Methods ; 20(4): 512-522, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36823332

RESUMO

In response to the emergence of SARS-CoV-2 variants of concern, the global scientific community, through unprecedented effort, has sequenced and shared over 11 million genomes through GISAID, as of May 2022. This extraordinarily high sampling rate provides a unique opportunity to track the evolution of the virus in near real-time. Here, we present outbreak.info , a platform that currently tracks over 40 million combinations of Pango lineages and individual mutations, across over 7,000 locations, to provide insights for researchers, public health officials and the general public. We describe the interpretable visualizations available in our web application, the pipelines that enable the scalable ingestion of heterogeneous sources of SARS-CoV-2 variant data and the server infrastructure that enables widespread data dissemination via a high-performance API that can be accessed using an R package. We show how outbreak.info can be used for genomic surveillance and as a hypothesis-generation tool to understand the ongoing pandemic at varying geographic and temporal scales.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Genômica , Surtos de Doenças , Mutação
7.
Res Sq ; 2022 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-35794893

RESUMO

The emergence of SARS-CoV-2 variants of concern has prompted the need for near real-time genomic surveillance to inform public health interventions. In response to this need, the global scientific community, through unprecedented effort, has sequenced and shared over 11 million genomes through GISAID, as of May 2022. This extraordinarily high sampling rate provides a unique opportunity to track the evolution of the virus in near real-time. Here, we present outbreak.info, a platform that currently tracks over 40 million combinations of PANGO lineages and individual mutations, across over 7,000 locations, to provide insights for researchers, public health officials, and the general public. We describe the interpretable and opinionated visualizations in the variant and location focussed reports available in our web application, the pipelines that enable the scalable ingestion of heterogeneous sources of SARS-CoV-2 variant data, and the server infrastructure that enables widespread data dissemination via a high performance API that can be accessed using an R package. We present a case study that illustrates how outbreak.info can be used for genomic surveillance and as a hypothesis generation tool to understand the ongoing pandemic at varying geographic and temporal scales. With an emphasis on scalability, interactivity, interpretability, and reusability, outbreak.info provides a template to enable genomic surveillance at a global and localized scale.

8.
bioRxiv ; 2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35677074

RESUMO

Background: Biomedical researchers are strongly encouraged to make their research outputs more Findable, Accessible, Interoperable, and Reusable (FAIR). While many biomedical research outputs are more readily accessible through open data efforts, finding relevant outputs remains a significant challenge. Schema.org is a metadata vocabulary standardization project that enables web content creators to make their content more FAIR. Leveraging schema.org could benefit biomedical research resource providers, but it can be challenging to apply schema.org standards to biomedical research outputs. We created an online browser-based tool that empowers researchers and repository developers to utilize schema.org or other biomedical schema projects. Results: Our browser-based tool includes features which can help address many of the barriers towards schema.org -compliance such as: The ability to easily browse for relevant schema.org classes, the ability to extend and customize a class to be more suitable for biomedical research outputs, the ability to create data validation to ensure adherence of a research output to a customized class, and the ability to register a custom class to our schema registry enabling others to search and re-use it. We demonstrate the use of our tool with the creation of the Outbreak.info schemaâ€"a large multi-class schema for harmonizing various COVID-19 related resources. Conclusions: We have created a browser-based tool to empower biomedical research resource providers to leverage schema.org classes to make their research outputs more FAIR.

10.
bioRxiv ; 2022 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-35132411

RESUMO

To combat the ongoing COVID-19 pandemic, scientists have been conducting research at breakneck speeds, producing over 52,000 peer-reviewed articles within the first year. To address the challenge in tracking the vast amount of new research located in separate repositories, we developed outbreak.info Research Library, a standardized, searchable interface of COVID-19 and SARS-CoV-2 resources. Unifying metadata from sixteen repositories, we assembled a collection of over 350,000 publications, clinical trials, datasets, protocols, and other resources as of October 2022. We used a rigorous schema to enforce consistency across different sources and resource types and linked related resources. Researchers can quickly search the latest research across data repositories, regardless of resource type or repository location, via a search interface, public API, and R package. Finally, we discuss the challenges inherent in combining metadata from scattered and heterogeneous resources and provide recommendations to streamline this process to aid scientific research.

11.
Bioinformatics ; 38(7): 2077-2079, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35020801

RESUMO

SUMMARY: To meet the increased need of making biomedical resources more accessible and reusable, Web Application Programming Interfaces (APIs) or web services have become a common way to disseminate knowledge sources. The BioThings APIs are a collection of high-performance, scalable, annotation as a service APIs that automate the integration of biological annotations from disparate data sources. This collection of APIs currently includes MyGene.info, MyVariant.info and MyChem.info for integrating annotations on genes, variants and chemical compounds, respectively. These APIs are used by both individual researchers and application developers to simplify the process of annotation retrieval and identifier mapping. Here, we describe the BioThings Software Development Kit (SDK), a generalizable and reusable toolkit for integrating data from multiple disparate data sources and creating high-performance APIs. This toolkit allows users to easily create their own BioThings APIs for any data type of interest to them, as well as keep APIs up-to-date with their underlying data sources. AVAILABILITY AND IMPLEMENTATION: The BioThings SDK is built in Python and released via PyPI (https://pypi.org/project/biothings/). Its source code is hosted at its github repository (https://github.com/biothings/biothings.api). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Pesquisa Biomédica , Software , Armazenamento e Recuperação da Informação
12.
Elife ; 92020 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-32180547

RESUMO

Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.


Assuntos
Disciplinas das Ciências Biológicas , Biologia Computacional , Bases de Dados Factuais , Genômica , Proteômica , Humanos , Reconhecimento Automatizado de Padrão
13.
Bioinformatics ; 36(4): 1226-1233, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31504205

RESUMO

MOTIVATION: Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depend on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction (RE). RESULTS: In this article, we introduce the Relationship Extraction Module of the web-based application Mark2Cure (M2C) and demonstrate that citizen scientists can perform RE. We confirm the importance of accurate named entity recognition on user performance of RE and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the M2C Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration and natural language processing. AVAILABILITY AND IMPLEMENTATION: Mark2Cure platform: https://mark2cure.org; Mark2Cure source code: https://github.com/sulab/mark2cure; and data and analysis code for this article: https://github.com/gtsueng/M2C_rel_nb. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Ciência do Cidadão , Processamento de Linguagem Natural , Armazenamento e Recuperação da Informação , Projetos de Pesquisa , Software
14.
Hum Comput (Fairfax) ; 6(1): 56-82, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31363486

RESUMO

Citizen science is the participation in scientific research by members of the public, and it is an increasingly valuable tool for both scientists and educators. For researchers, citizen science is a means of more quickly investigating questions which would otherwise be time-consuming and costly to study. For educators, citizen science offers a means to engage students in actual research and improve learning outcomes. Since most citizen science projects are usually designed with research goals in mind, many lack the necessary educator materials for successful integration in a formal science education (FSE) setting. In an ideal world, researchers and educators would build the necessary materials together; however, many researchers lack the time, resources, and networks to create these materials early on in the life of a citizen science project. For resource-poor projects, we propose an intermediate entry point for recruiting from the educational setting: community service or service learning requirements (CSSLRs). Many schools require students to participate in community service or service learning activities in order to graduate. When implemented well, CSSLRs provide students with growth and development opportunities outside the classroom while contributing to the community and other worthwhile causes. However, CSSLRs take time, resources, and effort to implement well. Just as citizen science projects need to establish relationships to transition well into formal science education, schools need to cultivate relationships with community service organizations. Students and educators at schools with CSSLRs where implementation is still a work in progress may be left with a burdensome requirement and inadequate support. With the help of a volunteer fulfilling a CSSLR, we investigated the number of students impacted by CSSLRs set at different levels of government and explored the qualifications needed for citizen science projects to fulfill CSSLRs by examining the explicitly-stated justifications for having CSSLRs, surveying how CSSLRs are verified, and using these qualifications to demonstrate how an online citizen science project, Mark2Cure, could use this information to meet the needs of students fulfilling CSSLRs.

15.
BMC Bioinformatics ; 19(1): 30, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29390967

RESUMO

BACKGROUND: Application Programming Interfaces (APIs) are now widely used to distribute biological data. And many popular biological APIs developed by many different research teams have adopted Javascript Object Notation (JSON) as their primary data format. While usage of a common data format offers significant advantages, that alone is not sufficient for rich integrative queries across APIs. RESULTS: Here, we have implemented JSON for Linking Data (JSON-LD) technology on the BioThings APIs that we have developed, MyGene.info , MyVariant.info and MyChem.info . JSON-LD provides a standard way to add semantic context to the existing JSON data structure, for the purpose of enhancing the interoperability between APIs. We demonstrated several use cases that were facilitated by semantic annotations using JSON-LD, including simpler and more precise query capabilities as well as API cross-linking. CONCLUSIONS: We believe that this pattern offers a generalizable solution for interoperability of APIs in the life sciences.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Software , Disciplinas das Ciências Biológicas , Bases de Dados Factuais , Humanos , Internet
16.
Gene ; 592(2): 235-8, 2016 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-27150585

RESUMO

Wikipedia and other openly available resources are increasingly becoming commonly used sources of information not just among the lay public but even in academic circles including undergraduate students and postgraduate trainees. To enhance the quality of the Wikipedia articles, in 2013, we initiated the Gene Wiki Reviews on genes and proteins as a series of invited reviews that stipulated editing the corresponding Wikipedia article(s) that would be also subject to peer-review. Thus, while the review article serves as an authoritative snapshot of the field, the "living article" can continue to evolve with the crowdsourcing model of Wikipedia. After publication of over 50 articles, we surveyed the authors to assess the impact of the project. The author survey results revealed that the Gene Wiki project is achieving its major objectives to increase the involvement of scientists in authoring Wikipedia articles and to enhance the quantity and quality of the information about genes and their protein products. Thus, the dual publication model introduced in the Gene Wiki Reviews series represents a valuable innovation in scientific publishing and biomedical knowledge management. We invite experts on specific genes to contact the editors to take part in this project to enhance the quality and accessibility of information about the human genome.


Assuntos
Acesso à Informação , Bases de Dados de Ácidos Nucleicos/normas , Genoma Humano , Anotação de Sequência Molecular/normas , Humanos
17.
Genome Biol ; 17(1): 91, 2016 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-27154141

RESUMO

Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.


Assuntos
Variação Genética , Anotação de Sequência Molecular , Análise de Sequência de DNA , Software , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Internet , Interface Usuário-Computador
18.
Nucleic Acids Res ; 44(D1): D313-6, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26578587

RESUMO

BioGPS (http://biogps.org) is a centralized gene-annotation portal that enables researchers to access distributed gene annotation resources. This article focuses on the updates to BioGPS since our last paper (2013 database issue). The unique features of BioGPS, compared to those of other gene portals, are its community extensibility and user customizability. Users contribute the gene-specific resources accessible from BioGPS ('plugins'), which helps ensure that the resource collection is always up-to-date and that it will continue expanding over time (since the 2013 paper, 162 resources have been added, for a 34% increase in the number of resources available). BioGPS users can create their own collections of relevant plugins and save them as customized gene-report pages or 'layouts' (since the 2013 paper, 488 user-created layouts have been added, for a 22% increase in the number of layouts). In addition, we recently updated the most popular plugin, the 'Gene expression/activity chart', to include ∼ 6000 datasets (from ∼ 2000 datasets) and we enhanced user interactivity. We also added a new 'gene list' feature that allows users to save query results for future reference.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Genes , Anotação de Sequência Molecular , Animais , Humanos , Camundongos , Ratos
19.
Citiz Sci ; 1(2)2016.
Artigo em Inglês | MEDLINE | ID: mdl-30416754

RESUMO

Biomedical literature represents one of the largest and fastest growing collections of unstructured biomedical knowledge. Finding critical information buried in the literature can be challenging. To extract information from free-flowing text, researchers need to: 1. identify the entities in the text (named entity recognition), 2. apply a standardized vocabulary to these entities (normalization), and 3. identify how entities in the text are related to one another (relationship extraction). Researchers have primarily approached these information extraction tasks through manual expert curation and computational methods. We have previously demonstrated that named entity recognition (NER) tasks can be crowdsourced to a group of non-experts via the paid microtask platform, Amazon Mechanical Turk (AMT), and can dramatically reduce the cost and increase the throughput of biocuration efforts. However, given the size of the biomedical literature, even information extraction via paid microtask platforms is not scalable. With our web-based application Mark2Cure (http://mark2cure.org), we demonstrate that NER tasks also can be performed by volunteer citizen scientists with high accuracy. We apply metrics from the Zooniverse Matrices of Citizen Science Success and provide the results here to serve as a basis of comparison for other citizen science projects. Further, we discuss design considerations, issues, and the application of analytics for successfully moving a crowdsourcing workflow from a paid microtask platform to a citizen science platform. To our knowledge, this study is the first application of citizen science to a natural language processing task.

20.
PLoS Pathog ; 10(4): e1004045, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24722773

RESUMO

Coxsackievirus B3 (CVB3), a member of the picornavirus family and enterovirus genus, causes viral myocarditis, aseptic meningitis, and pancreatitis in humans. We genetically engineered a unique molecular marker, "fluorescent timer" protein, within our infectious CVB3 clone and isolated a high-titer recombinant viral stock (Timer-CVB3) following transfection in HeLa cells. "Fluorescent timer" protein undergoes slow conversion of fluorescence from green to red over time, and Timer-CVB3 can be utilized to track virus infection and dissemination in real time. Upon infection with Timer-CVB3, HeLa cells, neural progenitor and stem cells (NPSCs), and C2C12 myoblast cells slowly changed fluorescence from green to red over 72 hours as determined by fluorescence microscopy or flow cytometric analysis. The conversion of "fluorescent timer" protein in HeLa cells infected with Timer-CVB3 could be interrupted by fixation, suggesting that the fluorophore was stabilized by formaldehyde cross-linking reactions. Induction of a type I interferon response or ribavirin treatment reduced the progression of cell-to-cell virus spread in HeLa cells or NPSCs infected with Timer-CVB3. Time lapse photography of partially differentiated NPSCs infected with Timer-CVB3 revealed substantial intracellular membrane remodeling and the assembly of discrete virus replication organelles which changed fluorescence color in an asynchronous fashion within the cell. "Fluorescent timer" protein colocalized closely with viral 3A protein within virus replication organelles. Intriguingly, infection of partially differentiated NPSCs or C2C12 myoblast cells induced the release of abundant extracellular microvesicles (EMVs) containing matured "fluorescent timer" protein and infectious virus representing a novel route of virus dissemination. CVB3 virions were readily observed within purified EMVs by transmission electron microscopy, and infectious virus was identified within low-density isopycnic iodixanol gradient fractions consistent with membrane association. The preferential detection of the lipidated form of LC3 protein (LC3 II) in released EMVs harboring infectious virus suggests that the autophagy pathway plays a crucial role in microvesicle shedding and virus release, similar to a process previously described as autophagosome-mediated exit without lysis (AWOL) observed during poliovirus replication. Through the use of this novel recombinant virus which provides more dynamic information from static fluorescent images, we hope to gain a better understanding of CVB3 tropism, intracellular membrane reorganization, and virus-associated microvesicle dissemination within the host.


Assuntos
Micropartículas Derivadas de Células/virologia , Enterovirus Humano B/fisiologia , Infecções por Enterovirus/metabolismo , Fagossomos/virologia , Eliminação de Partículas Virais/fisiologia , Animais , Micropartículas Derivadas de Células/genética , Micropartículas Derivadas de Células/metabolismo , Infecções por Enterovirus/genética , Células HeLa , Humanos , Camundongos , Proteínas Associadas aos Microtúbulos/genética , Proteínas Associadas aos Microtúbulos/metabolismo , Fagossomos/genética , Fagossomos/metabolismo , Proteínas Virais/genética , Proteínas Virais/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...