Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Nucleic Acids Res ; 50(D1): D387-D390, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850094

RESUMO

The Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra/) stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis. Here we note changes in storage designed to increase access and highlight analyses that augment metadata with taxonomic insight to help users select data. In addition, we present three unanticipated applications of taxonomic analysis.


Assuntos
Bactérias/genética , Bases de Dados Genéticas , Metadados/estatística & dados numéricos , Software , Vírus/genética , Bactérias/classificação , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Filogenia , Reprodutibilidade dos Testes , SARS-CoV-2/genética , Análise de Sequência de RNA , Vírus/classificação
2.
Nucleic Acids Res ; 50(D1): D380-D386, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34570235

RESUMO

Single-cell bisulfite sequencing methods are widely used to assess epigenomic heterogeneity in cell states. Over the past few years, large amounts of data have been generated and facilitated deeper understanding of the epigenetic regulation of many key biological processes including early embryonic development, cell differentiation and tumor progression. It is an urgent need to build a functional resource platform with the massive amount of data. Here, we present scMethBank, the first open access and comprehensive database dedicated to the collection, integration, analysis and visualization of single-cell DNA methylation data and metadata. Current release of scMethBank includes processed single-cell bisulfite sequencing data and curated metadata of 8328 samples derived from 15 public single-cell datasets, involving two species (human and mouse), 29 cell types and two diseases. In summary, scMethBank aims to assist researchers who are interested in cell heterogeneity to explore and utilize whole genome methylation data at single-cell level by providing browse, search, visualization, download functions and user-friendly online tools. The database is accessible at: https://ngdc.cncb.ac.cn/methbank/scm/.


Assuntos
Metilação de DNA , Bases de Dados Genéticas , Epigênese Genética , Genoma , Metadados/estatística & dados numéricos , Software , Animais , Mapeamento Cromossômico , Conjuntos de Dados como Assunto , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Análise de Célula Única , Sequenciamento Completo do Genoma
3.
Nucleic Acids Res ; 50(D1): D543-D552, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34723319

RESUMO

The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.


Assuntos
Bases de Dados de Proteínas , Metadados/estatística & dados numéricos , Anotação de Sequência Molecular/estatística & dados numéricos , Peptídeos/química , Proteínas/química , Software , Sequência de Aminoácidos , Bibliometria , Conjuntos de Dados como Assunto , Humanos , Armazenamento e Recuperação da Informação , Internet , Espectrometria de Massas , Peptídeos/genética , Peptídeos/metabolismo , Proteínas/genética , Proteínas/metabolismo , Proteômica/instrumentação , Proteômica/métodos , Alinhamento de Sequência
4.
Nucleic Acids Res ; 50(D1): D980-D987, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34791407

RESUMO

The European Genome-phenome Archive (EGA - https://ega-archive.org/) is a resource for long term secure archiving of all types of potentially identifiable genetic, phenotypic, and clinical data resulting from biomedical research projects. Its mission is to foster hosted data reuse, enable reproducibility, and accelerate biomedical and translational research in line with the FAIR principles. Launched in 2008, the EGA has grown quickly, currently archiving over 4,500 studies from nearly one thousand institutions. The EGA operates a distributed data access model in which requests are made to the data controller, not to the EGA, therefore, the submitter keeps control on who has access to the data and under which conditions. Given the size and value of data hosted, the EGA is constantly improving its value chain, that is, how the EGA can contribute to enhancing the value of human health data by facilitating its submission, discovery, access, and distribution, as well as leading the design and implementation of standards and methods necessary to deliver the value chain. The EGA has become a key GA4GH Driver Project, leading multiple development efforts and implementing new standards and tools, and has been appointed as an ELIXIR Core Data Resource.


Assuntos
Confidencialidade/legislação & jurisprudência , Genoma Humano , Disseminação de Informação/métodos , Fenômica/organização & administração , Pesquisa Translacional Biomédica/métodos , Conjuntos de Dados como Assunto , Genótipo , História do Século XX , História do Século XXI , Humanos , Disseminação de Informação/ética , Metadados/ética , Metadados/estatística & dados numéricos , Fenômica/história , Fenótipo
7.
Cancer Res ; 81(23): 5810-5812, 2021 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-34853038

RESUMO

Profound advances in computational methods, including artificial intelligence (AI), present the opportunity to use the exponentially growing volume and complexity of available cancer measurements toward data-driven personalized care. While exciting, this opportunity has highlighted the disconnect between the promise of compute and the supply of high-quality data. The current paradigm of ad-hoc aggregation and curation of data needs to be replaced with a "metadata supply chain" that provides robust data in context with known provenance, that is, lineage and comprehensive data governance that will allow the promise of AI technology to be realized to its full potential in clinical practice.


Assuntos
Inteligência Artificial , Metadados/estatística & dados numéricos , Neoplasias/diagnóstico , Neoplasias/terapia , Humanos
10.
Nat Biomed Eng ; 5(6): 533-545, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34131321

RESUMO

Regular screening for the early detection of common chronic diseases might benefit from the use of deep-learning approaches, particularly in resource-poor or remote settings. Here we show that deep-learning models can be used to identify chronic kidney disease and type 2 diabetes solely from fundus images or in combination with clinical metadata (age, sex, height, weight, body-mass index and blood pressure) with areas under the receiver operating characteristic curve of 0.85-0.93. The models were trained and validated with a total of 115,344 retinal fundus photographs from 57,672 patients and can also be used to predict estimated glomerulal filtration rates and blood-glucose levels, with mean absolute errors of 11.1-13.4 ml min-1 per 1.73 m2 and 0.65-1.1 mmol l-1, and to stratify patients according to disease-progression risk. We evaluated the generalizability of the models for the identification of chronic kidney disease and type 2 diabetes with population-based external validation cohorts and via a prospective study with fundus images captured with smartphones, and assessed the feasibility of predicting disease progression in a longitudinal cohort.


Assuntos
Aprendizado Profundo , Diabetes Mellitus Tipo 2/diagnóstico por imagem , Interpretação de Imagem Assistida por Computador/estatística & dados numéricos , Fotografação/estatística & dados numéricos , Insuficiência Renal Crônica/diagnóstico por imagem , Retina/diagnóstico por imagem , Área Sob a Curva , Glicemia/metabolismo , Estatura , Índice de Massa Corporal , Peso Corporal , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patologia , Progressão da Doença , Feminino , Fundo de Olho , Taxa de Filtração Glomerular , Humanos , Masculino , Metadados/estatística & dados numéricos , Pessoa de Meia-Idade , Redes Neurais de Computação , Fotografação/métodos , Estudos Prospectivos , Curva ROC , Insuficiência Renal Crônica/metabolismo , Insuficiência Renal Crônica/patologia , Retina/metabolismo , Retina/patologia
12.
Can Assoc Radiol J ; 72(4): 694-700, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-32412312

RESUMO

PURPOSE: To determine whether computed tomography radiation dose data could be captured electronically across hospitals to derive regional diagnostic reference levels for quality improvement. METHODS: Data on consecutive computed tomography examinations from 8 hospitals were collected automatically in a central database (Repository) from April 2017 to September 2017. The most frequently performed examinations were used to determine the standard protocols for each hospital. Diagnostic reference levels across hospitals were derived using statistical distribution for 2 radiation dose metrics. These values were compared between hospitals, within and between hospitals by scanner and against national Health Canada achievable doses and diagnostic reference levels. RESULTS: Three master protocol groups, Head, Abdomen-Pelvis, and Chest-Abdomen-Pelvis, accounted for 43% of all valid studies (N = 40 277). For the Repository, 11 of 12 mean values and 75th percentile diagnostic reference levels were below the Health Canada mean and 75th percentile values, and one was the same as the Health Canada value. Mean radiation dose by protocol varied by as much as 97% between hospitals. There was no consistent pattern in the difference between mean doses between large and small hospitals. CONCLUSION: This electronic data acquisition process could be used to continually update achievable doses for frequently used computed tomography examinations in Ontario and eliminate the need for nationwide manual surveys. Results compared across institutions will allow hospitals to maintain achievable doses and lower patient exposure.


Assuntos
Níveis de Referência de Diagnóstico , Informática Médica/métodos , Metadados/estatística & dados numéricos , Tomografia Computadorizada por Raios X/estatística & dados numéricos , Humanos , Ontário
13.
Nucleic Acids Res ; 49(D1): D121-D124, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33166387

RESUMO

The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been the core infrastructure for collecting and providing nucleotide sequence data and metadata for >30 years. Three partner organizations, the DNA Data Bank of Japan (DDBJ) at the National Institute of Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have been collaboratively maintaining the INSDC for the benefit of not only science but all types of community worldwide.


Assuntos
Bases de Dados de Ácidos Nucleicos , Metadados/estatística & dados numéricos , Nucleotídeos/genética , Análise de Sequência de DNA/estatística & dados numéricos , Análise de Sequência de RNA/estatística & dados numéricos , Academias e Institutos , Sequência de Bases , Europa (Continente) , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Cooperação Internacional , Japão , Nucleotídeos/metabolismo , Estados Unidos
14.
BMC Cancer ; 20(1): 486, 2020 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-32471384

RESUMO

BACKGROUND: Thousands of research articles on neuroblastoma have been published over the past few decades; however, the heterogeneity and variable quality of scholarly data may challenge scientists or clinicians to survey all of the available information. Hence, holistic measurement and analyzation of neuroblastoma-related literature with the help of sophisticated mathematical tools could provide deep insights into global research performance and the collaborative architectonical structure within the neuroblastoma scientific community. In this scientometric study, we aim to determine the extent of the scientific output related to neuroblastoma research between 1980 and 2018. METHODS: We applied novel scientometric tools, including Bibliometrix R package, biblioshiny, VOSviewer, and CiteSpace IV for comprehensive science mapping analysis of extensive bibliographic metadata, which was retrieved from the Web of ScienceTM Core Collection database. RESULTS: We demonstrate the enormous proliferation of neuroblastoma research during last the 38 years, including 12,435 documents published in 1828 academic journals by 36,908 authors from 86 different countries. These documents received a total of 316,017 citations with an average citation per document of 28.35 ± 7.7. We determine the proportion of highly cited and never cited papers, "occasional" and prolific authors and journals. Further, we show 12 (13.9%) of 86 countries were responsible for 80.4% of neuroblastoma-related research output. CONCLUSIONS: These findings are crucial for researchers, clinicians, journal editors, and others working in neuroblastoma research to understand the strengths and potential gaps in the current literature and to plan future investments in data collection and science policy. This first scientometric study of global neuroblastoma research performance provides valuable insight into the scientific landscape, co-authorship network architecture, international collaboration, and interaction within the neuroblastoma community.


Assuntos
Bibliometria , Pesquisa Biomédica/estatística & dados numéricos , Metadados/estatística & dados numéricos , Neuroblastoma , Criança , Bases de Dados Factuais/estatística & dados numéricos , Humanos
15.
Nucleic Acids Res ; 48(4): e23, 2020 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-31956905

RESUMO

The diverse and growing omics data in public domains provide researchers with tremendous opportunity to extract hidden, yet undiscovered, knowledge. However, the vast majority of archived data remain unused. Here, we present MetaOmGraph (MOG), a free, open-source, standalone software for exploratory analysis of massive datasets. Researchers, without coding, can interactively visualize and evaluate data in the context of its metadata, honing-in on groups of samples or genes based on attributes such as expression values, statistical associations, metadata terms and ontology annotations. Interaction with data is easy via interactive visualizations such as line charts, box plots, scatter plots, histograms and volcano plots. Statistical analyses include co-expression analysis, differential expression analysis and differential correlation analysis, with significance tests. Researchers can send data subsets to R for additional analyses. Multithreading and indexing enable efficient big data analysis. A researcher can create new MOG projects from any numerical data; or explore an existing MOG project. MOG projects, with history of explorations, can be saved and shared. We illustrate MOG by case studies of large curated datasets from human cancer RNA-Seq, where we identify novel putative biomarker genes in different tumors, and microarray and metabolomics data from Arabidopsis thaliana. MOG executable and code: http://metnetweb.gdcb.iastate.edu/ and https://github.com/urmi-21/MetaOmGraph/.


Assuntos
Big Data , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação da Expressão Gênica/genética , Software , Análise de Dados , Interpretação Estatística de Dados , Humanos , Metadados/estatística & dados numéricos
16.
Clin Res Cardiol ; 109(7): 810-818, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31686209

RESUMO

AIMS: We aimed at developing a structured study protocol utilizing the bibliographic web-application science performance evaluation (SciPE) to perform comprehensive scientometric analyses. METHODS AND RESULTS: Metadata related to publications derived from online databases were processed and visualized by transferring the information to an undirected multipartite graph and distinct partitioned sets of nodes. Also, institution-specific data were normalized and merged allowing precise geocoordinate positioning, to enable heatmapping and valid identification. As a result, verified, processed data regarding articles, institutions, journals, authors gender, nations and subject categories can be obtained. We recommend including the total number of publications, citations, the population, research institutions, gross domestic product, and the country-specific modified Hirsch Index and to form corresponding ratios (e.g., population/publication). Also, our approach includes implementation of bioinformatical methods such as heatmapping based on exact geocoordinates, simple chord diagrams, and the central implementation of specific ratios with plain visualization techniques. CONCLUSION: This protocol allows precise conduction of contemporaneous scientometric analyses based on bioinformatic and meta-analytical techniques, allowing to evaluate and contextualize scientific efforts. Data presentation with the depicted visualization techniques is mandatory for transparent and consistent analyses of research output across different nations and topics. Research performance can then be discussed in a synopsis of all findings.


Assuntos
Bibliometria , Pesquisa Biomédica/estatística & dados numéricos , Cooperação Internacional , Metadados/estatística & dados numéricos , Humanos
17.
Soc Stud Sci ; 49(5): 732-757, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31354073

RESUMO

'Metadata' has received a fraction of the attention that 'data' has received in sociological studies of scientific research. A neglect of 'metadata' reduces the attention on a number of critical aspects of scientific work processes, including documentary work, accountability relations, and collaboration routines. Metadata processes and products are essential components of the work needed to practically accomplish day-to-day scientific research tasks, and are central to ensuring that research findings and products meet externally driven standards or requirements. This article is an attempt to open up the discussion on and conceptualization of metadata within the sociology of science and the sociology of data. It presents ethnographic research of metadata creation within everyday scientific practice, focusing on how researchers document, describe, annotate, organize and manage their data, both for their own use and the use of researchers outside of their project. In particular, this article argues that the role and significance of metadata within scientific research contexts are intimately tied to the nature of evidence and accountability within particular social situations. Studying metadata can (1) provide insight into the production of evidence, that is, how something we might call 'data' becomes able to serve an evidentiary role, and (2) provide a mechanism for revealing what people in research contexts are held accountable for, and what they achieve accountability with.


Assuntos
Metadados/estatística & dados numéricos , Projetos de Pesquisa
19.
PLoS One ; 14(6): e0218789, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31233549

RESUMO

The aim of Jscatter is the processing of experimental data and physical models with the focus to enable the user to develop/modify their own models and use them within experimental data evaluation. The basic structures dataArray and dataList contain matrix-like data of different size including attributes to store corresponding metadata. The attributes are used in fit routines as parameters allowing multidimensional attribute dependent fitting. Several modules provide models mainly applied in neutron and X- ray scattering for small angle scattering (form factors and structure factors) and inelastic neutron scattering. The intention is to provide an environment with fit routines, data handling routines (based on NumPy arrays) and a model library to allow the user to focus onto user-written models for data analysis with the benefit of convenient documentation of scientific data evaluation in a scripting environment.


Assuntos
Interpretação Estatística de Dados , Software , Algoritmos , Difusão Dinâmica da Luz/estatística & dados numéricos , Metadados/estatística & dados numéricos , Modelos Estatísticos , Difração de Nêutrons/estatística & dados numéricos , Espalhamento a Baixo Ângulo , Difração de Raios X/estatística & dados numéricos
20.
J Digit Imaging ; 32(5): 870-879, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31201587

RESUMO

In the last decades, the amount of medical imaging studies and associated metadata has been rapidly increasing. Despite being mostly used for supporting medical diagnosis and treatment, many recent initiatives claim the use of medical imaging studies in clinical research scenarios but also to improve the business practices of medical institutions. However, the continuous production of medical imaging studies coupled with the tremendous amount of associated data, makes the real-time analysis of medical imaging repositories difficult using conventional tools and methodologies. Those archives contain not only the image data itself but also a wide range of valuable metadata describing all the stakeholders involved in the examination. The exploration of such technologies will increase the efficiency and quality of medical practice. In major centers, it represents a big data scenario where Business Intelligence (BI) and Data Analytics (DA) are rare and implemented through data warehousing approaches. This article proposes an Extract, Transform, Load (ETL) framework for medical imaging repositories able to feed, in real-time, a developed BI (Business Intelligence) application. The solution was designed to provide the necessary environment for leading research on top of live institutional repositories without requesting the creation of a data warehouse. It features an extensible dashboard with customizable charts and reports, with an intuitive web-based interface that empowers the usage of novel data mining techniques, namely, a variety of data cleansing tools, filters, and clustering functions. Therefore, the user is not required to master the programming skills commonly needed for data analysts and scientists, such as Python and R.


Assuntos
Mineração de Dados/métodos , Data Warehousing/métodos , Metadados/estatística & dados numéricos , Sistemas de Informação em Radiologia/organização & administração , Sistemas de Informação em Radiologia/estatística & dados numéricos , Mineração de Dados/estatística & dados numéricos , Data Warehousing/estatística & dados numéricos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...