Pesquisa | Portal Regional da BVS (teste)

Use it or lose it: citations predict the continued online availability of published bioinformatics resources.

Wren, Jonathan D; Georgescu, Constantin; Giles, Cory B; Hennessey, Jason.

Nucleic Acids Res ; 45(7): 3627-3633, 2017 04 20.

Artigo em Inglês | MEDLINE | ID: mdl-28334982

RESUMO

Scientific Data Analysis Resources (SDARs) such as bioinformatics programs, web servers and databases are integral to modern science, but previous studies have shown that the Uniform Resource Locators (URLs) linking to them decay in a time-dependent manner, with â¼27% decayed to date. Because SDARs are overrepresented among science's most cited papers over the past 20 years, loss of widely used SDARs could be particularly disruptive to scientific research. We identified URLs in MEDLINE abstracts and used crowdsourcing to identify which reported the creation of SDARs. We used the Internet Archive's Wayback Machine to approximate 'death dates' and calculate citations/year over each SDAR's lifespan. At first glance, decayed SDARs did not significantly differ from available SDARs in their average citations per year over their lifespan or journal impact factor (JIF). But the most cited SDARs were 94% likely to be relocated to another URL versus only 34% of uncited ones. Taking relocation into account, we find that citations are the strongest predictors of current online availability after time since publication, and JIF modestly predictive. This suggests that URL decay is a general, persistent phenomenon affecting all URLs, but the most useful/recognized SDARs are more likely to persist.

Assuntos

Biologia Computacional , Internet , Publicações Periódicas como Assunto , Fator de Impacto de Revistas , MEDLINE

Trends in the production of scientific data analysis resources.

Hennessey, Jason; Georgescu, Constantin; Wren, Jonathan D.

BMC Bioinformatics ; 15 Suppl 11: S7, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25350391

RESUMO

BACKGROUND: As the amount of scientific data grows, peer-reviewed Scientific Data Analysis Resources (SDARs) such as published software programs, databases and web servers have had a strong impact on the productivity of scientific research. SDARs are typically linked to using an Internet URL, which have been shown to decay in a time-dependent fashion. What is less clear is whether or not SDAR-producing group size or prior experience in SDAR production correlates with SDAR persistence or whether certain institutions or regions account for a disproportionate number of peer-reviewed resources. METHODS: We first quantified the current availability of over 26,000 unique URLs published in MEDLINE abstracts/titles over the past 20 years, then extracted authorship, institutional and ZIP code data. We estimated which URLs were SDARs by using keyword proximity analysis. RESULTS: We identified 23,820 non-archival URLs produced between 1996 and 2013, out of which 11,977 were classified as SDARs. Production of SDARs as measured with the Gini coefficient is more widely distributed among institutions (.62) and ZIP codes (.65) than scientific research in general, which tends to be disproportionately clustered within elite institutions (.91) and ZIPs (.96). An estimated one percent of institutions produced 68% of published research whereas the top 1% only accounted for 16% of SDARs. Some labs produced many SDARs (maximum detected = 64), but 74% of SDAR-producing authors have only published one SDAR. Interestingly, decayed SDARs have significantly fewer average authors (4.33 +/- 3.06), than available SDARs (4.88 +/- 3.59) (p < 8.32 × 10-4). Approximately 3.4% of URLs, as published, contain errors in their entry/format, including DOIs and links to clinical trials registry numbers. CONCLUSION: SDAR production is less dependent upon institutional location and resources, and SDAR online persistence does not seem to be a function of infrastructure or expertise. Yet, SDAR team size correlates positively with SDAR accessibility, suggesting a possible sociological factor involved. While a detectable URL entry error rate of 3.4% is relatively low, it raises the question of whether or not this is a general error rate that extends to additional published entities.

Assuntos

Biologia Computacional/tendências , Publicações Periódicas como Assunto/tendências , Autoria , Bases de Dados Factuais , Internet , MEDLINE , Pesquisa/tendências , Software

A cross disciplinary study of link decay and the effectiveness of mitigation techniques.

Hennessey, Jason; Ge, Steven.

BMC Bioinformatics ; 14 Suppl 14: S5, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24266891

RESUMO

BACKGROUND: The dynamic, decentralized world-wide-web has become an essential part of scientific research and communication. Researchers create thousands of web sites every year to share software, data and services. These valuable resources tend to disappear over time. The problem has been documented in many subject areas. Our goal is to conduct a cross-disciplinary investigation of the problem and test the effectiveness of existing remedies. RESULTS: We accessed 14,489 unique web pages found in the abstracts within Thomson Reuters' Web of Science citation index that were published between 1996 and 2010 and found that the median lifespan of these web pages was 9.3 years with 62% of them being archived. Survival analysis and logistic regression were used to find significant predictors of URL lifespan. The availability of a web page is most dependent on the time it is published and the top-level domain names. Similar statistical analysis revealed biases in current solutions: the Internet Archive favors web pages with fewer layers in the Universal Resource Locator (URL) while WebCite is significantly influenced by the source of publication. We also created a prototype for a process to submit web pages to the archives and increased coverage of our list of scientific webpages in the Internet Archive and WebCite by 22% and 255%, respectively. CONCLUSION: Our results show that link decay continues to be a problem across different disciplines and that current solutions for static web pages are helping and can be improved.

Assuntos

Bases de Dados Factuais , Internet , Editoração , Arquivos , Bibliografias como Assunto , Humanos , Design de Software , Fatores de Tempo

AraPath: a knowledgebase for pathway analysis in Arabidopsis.

Lai, Liming; Liberzon, Arthur; Hennessey, Jason; Jiang, Gaixin; Qi, Jianli; Mesirov, Jill P; Ge, Steven X.

Bioinformatics ; 28(17): 2291-2, 2012 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-22760305

RESUMO

UNLABELLED: Studying plants using high-throughput genomics technologies is becoming routine, but interpretation of genome-wide expression data in terms of biological pathways remains a challenge, partly due to the lack of pathway databases. To create a knowledgebase for plant pathway analysis, we collected 1683 lists of differentially expressed genes from 397 gene-expression studies, which constitute a molecular signature database of various genetic and environmental perturbations of Arabidopsis. In addition, we extracted 1909 gene sets from various sources such as Gene Ontology, KEGG, AraCyc, Plant Ontology, predicted target genes of microRNAs and transcription factors, and computational gene clusters defined by meta-analysis. With this knowledgebase, we applied Gene Set Enrichment Analysis to an expression profile of cold acclimation and identified expected functional categories and pathways. Our results suggest that the AraPath database can be used to generate specific, testable hypotheses regarding plant molecular pathways from gene expression data. AVAILABILITY: http://bioinformatics.sdstate.edu/arapath/.

Assuntos

Arabidopsis/genética , Bases de Dados Genéticas , Bases de Conhecimento , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Genoma de Planta , Genômica/métodos , Família Multigênica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA