Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Gigascience ; 10(6)2021 06 29.
Article in English | MEDLINE | ID: mdl-34184051

ABSTRACT

BACKGROUND: Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process 1 million GWAS samples. RESULTS: Here we present BIGwas, a portable, fully automated quality control and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high-performance compute (HPC) system with just 1 command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes ∼16 days on a small HPC system with only 7 compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs. CONCLUSIONS: Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with 1 million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource. BIGwas is freely available for download from http://github.com/ikmb/gwas-qc and http://github.com/ikmb/gwas-assoc.


Subject(s)
Biological Specimen Banks , Genome-Wide Association Study , Genome , Humans , Phenotype , Polymorphism, Single Nucleotide , Quality Control , Software
2.
Methods Mol Biol ; 2212: 17-35, 2021.
Article in English | MEDLINE | ID: mdl-33733347

ABSTRACT

We present SNPInt-GPU, a software providing several methods for statistical epistasis testing. SNPInt-GPU supports GPU acceleration using the Nvidia CUDA framework, but can also be used without GPU hardware. The software implements logistic regression (as in PLINK epistasis testing), BOOST, log-linear regression, mutual information (MI), and information gain (IG) for pairwise testing as well as mutual information and information gain for third-order tests. Optionally, r2 scores for testing for linkage disequilibrium (LD) can be calculated on-the-fly. SNPInt-GPU is publicly available at GitHub. The software requires a Linux-based operating system and CUDA libraries. This chapter describes detailed installation and usage instructions as well as examples for basic preliminary quality control and analysis of results.


Subject(s)
Algorithms , Data Curation/statistics & numerical data , Epistasis, Genetic , Software , Entropy , Humans , Linkage Disequilibrium , Logistic Models , Quality Control
3.
Commun Biol ; 4(1): 113, 2021 01 25.
Article in English | MEDLINE | ID: mdl-33495542

ABSTRACT

The Wartberg culture (WBC, 3500-2800 BCE) dates to the Late Neolithic period, a time of important demographic and cultural transformations in western Europe. We performed genome-wide analyses of 42 individuals who were interred in a WBC collective burial in Niedertiefenbach, Germany (3300-3200 cal. BCE). The results showed that the farming population of Niedertiefenbach carried a surprisingly large hunter-gatherer ancestry component (34-58%). This component was most likely introduced during the cultural transformation that led to the WBC. In addition, the Niedertiefenbach individuals exhibited a distinct human leukocyte antigen gene pool, possibly reflecting an immune response that was geared towards detecting viral infections.


Subject(s)
Agriculture , Feeding Behavior/physiology , HLA Antigens/genetics , Predatory Behavior/physiology , Animals , Archaeology , DNA, Ancient/analysis , Europe , Evolution, Molecular , Genetic Variation , Genetics, Population , Genome, Human , Genome-Wide Association Study , Germany , History, Ancient , Human Migration , Humans , Polymorphism, Single Nucleotide , Racial Groups/genetics , Residence Characteristics
4.
J Allergy Clin Immunol ; 145(4): 1208-1218, 2020 04.
Article in English | MEDLINE | ID: mdl-31707051

ABSTRACT

BACKGROUND: Fifteen percent of atopic dermatitis (AD) liability-scale heritability could be attributed to 31 susceptibility loci identified by using genome-wide association studies, with only 3 of them (IL13, IL-6 receptor [IL6R], and filaggrin [FLG]) resolved to protein-coding variants. OBJECTIVE: We examined whether a significant portion of unexplained AD heritability is further explained by low-frequency and rare variants in the gene-coding sequence. METHODS: We evaluated common, low-frequency, and rare protein-coding variants using exome chip and replication genotype data of 15,574 patients and 377,839 control subjects combined with whole-transcriptome data on lesional, nonlesional, and healthy skin samples of 27 patients and 38 control subjects. RESULTS: An additional 12.56% (SE, 0.74%) of AD heritability is explained by rare protein-coding variation. We identified docking protein 2 (DOK2) and CD200 receptor 1 (CD200R1) as novel genome-wide significant susceptibility genes. Rare coding variants associated with AD are further enriched in 5 genes (IL-4 receptor [IL4R], IL13, Janus kinase 1 [JAK1], JAK2, and tyrosine kinase 2 [TYK2]) of the IL13 pathway, all of which are targets for novel systemic AD therapeutics. Multiomics-based network and RNA sequencing analysis revealed DOK2 as a central hub interacting with, among others, CD200R1, IL6R, and signal transducer and activator of transcription 3 (STAT3). Multitissue gene expression profile analysis for 53 tissue types from the Genotype-Tissue Expression project showed that disease-associated protein-coding variants exert their greatest effect in skin tissues. CONCLUSION: Our discoveries highlight a major role of rare coding variants in AD acting independently of common variants. Further extensive functional studies are required to detect all potential causal variants and to specify the contribution of the novel susceptibility genes DOK2 and CD200R1 to overall disease susceptibility.


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , Dermatitis, Atopic/genetics , Genotype , Orexin Receptors/genetics , Phosphoproteins/genetics , Skin/metabolism , Adult , Cohort Studies , Filaggrin Proteins , Gene Frequency , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Organ Specificity , Polymorphism, Genetic , Risk , Transcriptome
5.
Article in English | MEDLINE | ID: mdl-26451813

ABSTRACT

High-throughput genotyping technologies (such as SNP-arrays) allow the rapid collection of up to a few million genetic markers of an individual. Detecting epistasis (based on 2-SNP interactions) in Genome-Wide Association Studies is an important but time consuming operation since statistical computations have to be performed for each pair of measured markers. Computational methods to detect epistasis therefore suffer from prohibitively long runtimes; e.g., processing a moderately-sized dataset consisting of about 500,000 SNPs and 5,000 samples requires several days using state-of-the-art tools on a standard 3 GHz CPU. In this paper, we demonstrate how this task can be accelerated using a combination of fine-grained and coarse-grained parallelism on two different computing systems. The first architecture is based on reconfigurable hardware (FPGAs) while the second architecture uses multiple GPUs connected to the same host. We show that both systems can achieve speedups of around four orders-of-magnitude compared to the sequential implementation. This significantly reduces the runtimes for detecting epistasis to only a few minutes for moderately-sized datasets and to a few hours for large-scale datasets.


Subject(s)
Computer Graphics/instrumentation , DNA Mutational Analysis/instrumentation , Epistasis, Genetic/genetics , Genome-Wide Association Study/instrumentation , High-Throughput Nucleotide Sequencing/instrumentation , Polymorphism, Single Nucleotide/genetics , Chromosome Mapping/instrumentation , Chromosome Mapping/methods , Equipment Design , Equipment Failure Analysis , Genome-Wide Association Study/methods , Reproducibility of Results , Sensitivity and Specificity , Signal Processing, Computer-Assisted/instrumentation
SELECTION OF CITATIONS
SEARCH DETAIL
...