Pesquisa | Portal Regional da BVS (teste)

Establishing Institutional Scores With the Rigor and Transparency Index: Large-scale Analysis of Scientific Reporting Quality.

Menke, Joe; Eckmann, Peter; Ozyurt, Ibrahim Burak; Roelandse, Martijn; Anderson, Nathan; Grethe, Jeffrey; Gamst, Anthony; Bandrowski, Anita.

J Med Internet Res ; 24(6): e37324, 2022 06 27.

Artigo em Inglês | MEDLINE | ID: mdl-35759334

RESUMO

BACKGROUND: Improving rigor and transparency measures should lead to improvements in reproducibility across the scientific literature; however, the assessment of measures of transparency tends to be very difficult if performed manually. OBJECTIVE: This study addresses the enhancement of the Rigor and Transparency Index (RTI, version 2.0), which attempts to automatically assess the rigor and transparency of journals, institutions, and countries using manuscripts scored on criteria found in reproducibility guidelines (eg, Materials Design, Analysis, and Reporting checklist criteria). METHODS: The RTI tracks 27 entity types using natural language processing techniques such as Bidirectional Long Short-term Memory Conditional Random Field-based models and regular expressions; this allowed us to assess over 2 million papers accessed through PubMed Central. RESULTS: Between 1997 and 2020 (where data were readily available in our data set), rigor and transparency measures showed general improvement (RTI 2.29 to 4.13), suggesting that authors are taking the need for improved reporting seriously. The top-scoring journals in 2020 were the Journal of Neurochemistry (6.23), British Journal of Pharmacology (6.07), and Nature Neuroscience (5.93). We extracted the institution and country of origin from the author affiliations to expand our analysis beyond journals. Among institutions publishing >1000 papers in 2020 (in the PubMed Central open access set), Capital Medical University (4.75), Yonsei University (4.58), and University of Copenhagen (4.53) were the top performers in terms of RTI. In country-level performance, we found that Ethiopia and Norway consistently topped the RTI charts of countries with 100 or more papers per year. In addition, we tested our assumption that the RTI may serve as a reliable proxy for scientific replicability (ie, a high RTI represents papers containing sufficient information for replication efforts). Using work by the Reproducibility Project: Cancer Biology, we determined that replication papers (RTI 7.61, SD 0.78) scored significantly higher (P<.001) than the original papers (RTI 3.39, SD 1.12), which according to the project required additional information from authors to begin replication efforts. CONCLUSIONS: These results align with our view that RTI may serve as a reliable proxy for scientific replicability. Unfortunately, RTI measures for journals, institutions, and countries fall short of the replicated paper average. If we consider the RTI of these replication studies as a target for future manuscripts, more work will be needed to ensure that the average manuscript contains sufficient information for replication attempts.

Assuntos

Lista de Checagem , Editoração , Humanos , Noruega , Reprodutibilidade dos Testes , Projetos de Pesquisa

Antibody Watch: Text mining antibody specificity from the literature.

Hsu, Chun-Nan; Chang, Chia-Hui; Poopradubsil, Thamolwan; Lo, Amanda; William, Karen A; Lin, Ko-Wei; Bandrowski, Anita; Ozyurt, Ibrahim Burak; Grethe, Jeffrey S; Martone, Maryann E.

PLoS Comput Biol ; 17(5): e1008967, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-34043624

RESUMO

Antibodies are widely used reagents to test for expression of proteins and other antigens. However, they might not always reliably produce results when they do not specifically bind to the target proteins that their providers designed them for, leading to unreliable research results. While many proposals have been developed to deal with the problem of antibody specificity, it is still challenging to cover the millions of antibodies that are available to researchers. In this study, we investigate the feasibility of automatically generating alerts to users of problematic antibodies by extracting statements about antibody specificity reported in the literature. The extracted alerts can be used to construct an "Antibody Watch" knowledge base containing supporting statements of problematic antibodies. We developed a deep neural network system and tested its performance with a corpus of more than two thousand articles that reported uses of antibodies. We divided the problem into two tasks. Given an input article, the first task is to identify snippets about antibody specificity and classify if the snippets report that any antibody exhibits non-specificity, and thus is problematic. The second task is to link each of these snippets to one or more antibodies mentioned in the snippet. The experimental evaluation shows that our system can accurately perform the classification task with 0.925 weighted F1-score, linking with 0.962 accuracy, and 0.914 weighted F1 when combined to complete the joint task. We leveraged Research Resource Identifiers (RRID) to precisely identify antibodies linked to the extracted specificity snippets. The result shows that it is feasible to construct a reliable knowledge base about problematic antibodies by text mining.

Assuntos

Especificidade de Anticorpos , Mineração de Dados , Animais , Humanos , Camundongos , Redes Neurais de Computação

Bio-AnswerFinder: a system to find answers to questions from biomedical texts.

Ozyurt, Ibrahim Burak; Bandrowski, Anita; Grethe, Jeffrey S.

Database (Oxford) ; 20202020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31925435

RESUMO

The ever accelerating pace of biomedical research results in corresponding acceleration in the volume of biomedical literature created. Since new research builds upon existing knowledge, the rate of increase in the available knowledge encoded in biomedical literature makes the easy access to that implicit knowledge more vital over time. Toward the goal of making implicit knowledge in the biomedical literature easily accessible to biomedical researchers, we introduce a question answering system called Bio-AnswerFinder. Bio-AnswerFinder uses a weighted-relaxed word mover's distance based similarity on word/phrase embeddings learned from PubMed abstracts to rank answers after question focus entity type filtering. Our approach retrieves relevant documents iteratively via enhanced keyword queries from a traditional search engine. To improve document retrieval performance, we introduced a supervised long short term memory neural network to select keywords from the question to facilitate iterative keyword search. Our unsupervised baseline system achieves a mean reciprocal rank score of 0.46 and Precision@1 of 0.32 on 936 questions from BioASQ. The answer sentences are further ranked by a fine-tuned bidirectional encoder representation from transformers (BERT) classifier trained using 100 answer candidate sentences per question for 492 BioASQ questions. To test ranking performance, we report a blind test on 100 questions that three independent annotators scored. These experts preferred BERT based reranking with 7% improvement on MRR and 13% improvement on Precision@1 scores on average.

Assuntos

Pesquisa Biomédica , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Redes Neurais de Computação , Mineração de Dados , Humanos

Foundry: a message-oriented, horizontally scalable ETL system for scientific data integration and enhancement.

Ozyurt, Ibrahim Burak; Grethe, Jeffrey S.

Database (Oxford) ; 20182018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30576493

RESUMO

Data generated by scientific research enables further advancement in science through reanalyses and pooling of data for novel analyses. With the increasing amounts of scientific data generated by biomedical research providing researchers with more data than they have ever had access to, finding the data matching the researchers' requirements continues to be a major challenge and will only grow more challenging as more data is produced and shared. In this paper, we introduce a horizontally scalable distributed extract-transform-load system to tackle scientific data aggregation, transformation and enhancement for scientific data discovery and retrieval. We also introduce a data transformation language for biomedical curators allowing for the transformation and combination of data/metadata from heterogeneous data sources. Applicability of the system for scientific data is illustrated in biomedical and earth science domains.

Assuntos

Indexação e Redação de Resumos/métodos , Biologia Computacional/métodos , Curadoria de Dados/métodos , Bases de Dados Factuais , Pesquisa Biomédica , Armazenamento e Recuperação da Informação , Linguagens de Programação

Resource Disambiguator for the Web: Extracting Biomedical Resources and Their Citations from the Scientific Literature.

Ozyurt, Ibrahim Burak; Grethe, Jeffrey S; Martone, Maryann E; Bandrowski, Anita E.

PLoS One ; 11(1): e0146300, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26730820

RESUMO

The NIF Registry developed and maintained by the Neuroscience Information Framework is a cooperative project aimed at cataloging research resources, e.g., software tools, databases and tissue banks, funded largely by governments and available as tools to research scientists. Although originally conceived for neuroscience, the NIF Registry has over the years broadened in the scope to include research resources of general relevance to biomedical research. The current number of research resources listed by the Registry numbers over 13K. The broadening in scope to biomedical science led us to re-christen the NIF Registry platform as SciCrunch. The NIF/SciCrunch Registry has been cataloging the resource landscape since 2006; as such, it serves as a valuable dataset for tracking the breadth, fate and utilization of these resources. Our experience shows research resources like databases are dynamic objects, that can change location and scope over time. Although each record is entered manually and human-curated, the current size of the registry requires tools that can aid in curation efforts to keep content up to date, including when and where such resources are used. To address this challenge, we have developed an open source tool suite, collectively termed RDW: Resource Disambiguator for the (Web). RDW is designed to help in the upkeep and curation of the registry as well as in enhancing the content of the registry by automated extraction of resource candidates from the literature. The RDW toolkit includes a URL extractor from papers, resource candidate screen, resource URL change tracker, resource content change tracker. Curators access these tools via a web based user interface. Several strategies are used to optimize these tools, including supervised and unsupervised learning algorithms as well as statistical text analysis. The complete tool suite is used to enhance and maintain the resource registry as well as track the usage of individual resources through an innovative literature citation index honed for research resources. Here we present an overview of the Registry and show how the RDW tools are used in curation and usage tracking.

Assuntos

Biologia Computacional/métodos , Armazenamento e Recuperação da Informação/métodos , Internet , Neurociências/métodos , Software , Pesquisa Biomédica/métodos , Pesquisa Biomédica/estatística & dados numéricos , Biologia Computacional/estatística & dados numéricos , Bases de Dados Factuais , Humanos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Neurociências/estatística & dados numéricos , Publicações/estatística & dados numéricos , Sistema de Registros/estatística & dados numéricos , Reprodutibilidade dos Testes

Automatic identification and classification of noun argument structures in biomedical literature.

Ozyurt, Ibrahim Burak.

IEEE/ACM Trans Comput Biol Bioinform ; 9(6): 1639-48, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22868678

RESUMO

The accelerating increase in the biomedical literature makes keeping up with recent advances challenging for researchers thus making automatic extraction and discovery of knowledge from this vast literature a necessity. Building such systems requires automatic detection of lexico-semantic event structures governed by the syntactic and semantic constraints of human languages in sentences of biomedical texts. The lexico-semantic event structures in sentences are centered around the predicates and most semantic role labeling (SRL) approaches focus only on the arguments of verb predicates and neglect argument taking nouns which also convey information in a sentence. In this article, a noun argument structure (NAS) annotated corpus named BioNom and a SRL system to identify and classify these structures is introduced. Also, a genetic algorithm-based feature selection (GAFS) method is introduced and global inference is applied to significantly improve the performance of the NAS Bio SRL system.

Assuntos

Biologia Computacional/métodos , Mineração de Dados/métodos , Bases de Dados Factuais , Processamento de Linguagem Natural , Modelos Genéticos , Semântica , Software , Máquina de Vetores de Suporte

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA