Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Cell Genom ; 3(2): 100246, 2023 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-36819661

RESUMO

The Solve-RD project objectives include solving undiagnosed rare diseases (RD) through collaborative research on shared genome-phenome datasets. The RD-Connect Genome-Phenome Analysis Platform (GPAP), for data collation and analysis, and the European Genome-Phenome Archive (EGA), for file storage, are two key components of the Solve-RD infrastructure. Clinical researchers can identify candidate genetic variants within the RD-Connect GPAP and, thanks to the developments presented here as part of joint ELIXIR activities, are able to remotely visualize the corresponding alignments stored at the EGA. The Global Alliance for Genomics and Health (GA4GH) htsget streaming application programming interface (API) is used to retrieve alignment slices, which are rendered by an integrated genome viewer (IGV) instance embedded in the GPAP. As a result, it is no longer necessary for over 11,000 datasets to download large alignment files to visualize them locally. This work highlights the advantages, from both the user and infrastructure perspectives, of implementing interoperability standards for establishing federated genomics data networks.

2.
Bioinformatics ; 37(17): 2753-2754, 2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-33543751

RESUMO

MOTIVATION: The majority of genome analysis tools and pipelines require data to be decrypted for access. This potentially leaves sensitive genetic data exposed, either because the unencrypted data is not removed after analysis, or because the data leaves traces on the permanent storage medium. RESULTS: : We defined a file container specification enabling direct byte-level compatible random access to encrypted genetic data stored in community standards such as SAM/BAM/CRAM/VCF/BCF. By standardizing this format, we show how it can be added as a native file format to genomic libraries, enabling direct analysis of encrypted data without the need to create a decrypted copy. AVAILABILITY AND IMPLEMENTATION: The Crypt4GH specification can be found at: http://samtools.github.io/hts-specs/crypt4gh.pdf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.
IEEE/ACM Trans Comput Biol Bioinform ; 16(4): 1324-1327, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31095492

RESUMO

Sensitive genomic data should remain secure - whether on disk for storage, or analysis, or in transport. However, secure storage, delivery, and usage of genomic data is complicated by the size of files and diversity of workflows. This paper presents solutions developed by GA4GH and EGA to use custom-ized encryption, encrypted file formats, toolchain integration, and intelligent APIs to help solve this problem.


Assuntos
Segurança Computacional , Genômica/métodos , Informática Médica/métodos , Acesso à Informação , Algoritmos , Computação em Nuvem , Bases de Dados Factuais , Europa (Continente) , Genética Humana , Humanos , Armazenamento e Recuperação da Informação , Reino Unido , Interface Usuário-Computador
4.
Bioinformatics ; 35(1): 119-121, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29931085

RESUMO

Summary: Standardized interfaces for efficiently accessing high-throughput sequencing data are a fundamental requirement for large-scale genomic data sharing. We have developed htsget, a protocol for secure, efficient and reliable access to sequencing read and variation data. We demonstrate four independent client and server implementations, and the results of a comprehensive interoperability demonstration. Availability and implementation: http://samtools.github.io/hts-specs/htsget.html. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Software , Genoma
5.
F1000Res ; 62017.
Artigo em Inglês | MEDLINE | ID: mdl-29123641

RESUMO

The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data.

6.
F1000Res ; 52016.
Artigo em Inglês | MEDLINE | ID: mdl-28232859

RESUMO

High-throughput molecular profiling techniques are routinely generating vast amounts of data for translational medicine studies. Secure access controlled systems are needed to manage, store, transfer and distribute these data due to its personally identifiable nature. The European Genome-phenome Archive (EGA) was created to facilitate access and management to long-term archival of bio-molecular data. Each data provider is responsible for ensuring a Data Access Committee is in place to grant access to data stored in the EGA. Moreover, the transfer of data during upload and download is encrypted. ELIXIR, a European research infrastructure for life-science data, initiated a project (2016 Human Data Implementation Study) to understand and document the ELIXIR requirements for secure management of controlled-access data. As part of this project, a full ecosystem was designed to connect archived raw experimental molecular profiling data with interpreted data and the computational workflows, using the CTMM Translational Research IT (CTMM-TraIT) infrastructure http://www.ctmm-trait.nl as an example. Here we present the first outcomes of this project, a framework to enable the download of EGA data to a Galaxy server in a secure way. Galaxy provides an intuitive user interface for molecular biologists and bioinformaticians to run and design data analysis workflows. More specifically, we developed a tool -- ega_download_streamer - that can download data securely from EGA into a Galaxy server, which can subsequently be further processed. This tool will allow a user within the browser to run an entire analysis containing sensitive data from EGA, and to make this analysis available for other researchers in a reproducible manner, as shown with a proof of concept study.  The tool ega_download_streamer is available in the Galaxy tool shed: https://toolshed.g2.bx.psu.edu/view/yhoogstrate/ega_download_streamer.

8.
Nucleic Acids Res ; 43(Database issue): D23-9, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25404130

RESUMO

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary resource for nucleotide sequence information. With the growing volume and diversity of public sequencing data comes the need for increased sophistication in data organisation, presentation and search services so as to maximise its discoverability and usability. In response to this, ENA has been introducing and improving checklists for use during submission and expanding its search facilities to provide targeted search results. Here, we give a brief update on ENA content and some major developments undertaken in data submission services during 2014. We then describe in more detail the services we offer for data discovery and retrieval.


Assuntos
Bases de Dados de Ácidos Nucleicos , Sequência de Bases , Genômica , Anotação de Sequência Molecular , Análise de Sequência
9.
Nucleic Acids Res ; 42(Database issue): D38-43, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24214989

RESUMO

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the world public domain nucleotide sequence data output. ENA content covers a spectrum of data types including raw reads, assembly data and functional annotation. ENA has faced a dramatic growth in genome assembly submission rates, data volumes and complexity of datasets. This has prompted a broad reworking of assembly submission services, for which we now reach the end of a major programme of work and many enhancements have already been made available over the year to components of the submission service. In this article, we briefly review ENA content and growth over 2013, describe our rapidly developing services for genome assembly information and outline further major developments over the last year.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genômica , Europa (Continente) , Internet
10.
Nucleic Acids Res ; 41(Database issue): D30-5, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23203883

RESUMO

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/) collects, maintains and presents comprehensive nucleic acid sequence and related information as part of the permanent public scientific record. Here, we provide brief updates on ENA content developments and major service enhancements in 2012 and describe in more detail two important areas of development and policy that are driven by ongoing growth in sequencing technologies. First, we describe the ENA data warehouse, a resource for which we provide a programmatic entry point to integrated content across the breadth of ENA. Second, we detail our plans for the deployment of CRAM data compression technology in ENA.


Assuntos
Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Compressão de Dados , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Interface Usuário-Computador
11.
Nucleic Acids Res ; 40(Database issue): D43-7, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22080548

RESUMO

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena), Europe's primary nucleotide sequence resource, captures and presents globally comprehensive nucleic acid sequence and associated information. Covering the spectrum from raw data to assembled and functionally annotated genomes, the ENA has witnessed a dramatic growth resulting from advances in sequencing technology and ever broadening application of the methodology. During 2011, we have continued to operate and extend the broad range of ENA services. In particular, we have released major new functionality in our interactive web submission system, Webin, through developments in template-based submissions for annotated sequences and support for raw next-generation sequence read submissions.


Assuntos
Bases de Dados de Ácidos Nucleicos , Análise de Sequência de DNA , Análise de Sequência de RNA , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Anotação de Sequência Molecular , Software , Interface Usuário-Computador
12.
Bioinformatics ; 25(22): 2945-54, 2009 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-19720676

RESUMO

MOTIVATION: The sequencing of whole genomes from various species has provided us with a wealth of genetic information. To make use of the vast amounts of data available today it is necessary to devise computer-based analysis techniques. RESULTS: We propose a Hidden Markov Model (HMM) based algorithm to detect groups of genes functionally similar to a set of input genes from microarray expression data. A subset of experiments from a microarray is selected based on a set of related input genes. HMMs are trained from the input genes and a group of random gene input sets to provide significance estimates. Every gene in the microarray is scored using all HMMs and significant matches with the input genes are retained. We ran this algorithm on the life cycle of Drosophila microarray data set with KEGG pathways for cell cycle and translation factors as input data sets. Results show high functional similarity in resulting gene sets, increasing our biological insight into gene pathways and KEGG annotations. The algorithm performed very well compared to the Signature Algorithm and a purely correlation-based approach. AVAILABILITY: Java source codes and data sets are available at http://www.ittc.ku.edu/~xwchen/software.htm


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Cadeias de Markov , Algoritmos , Animais , Drosophila/genética , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...