Search | VHL Regional Portal

1.

Genotyping of SNPs in bread wheat at reduced cost from pooled experiments and imputation.

Clouard, Camille; Nettelblad, Carl.

Theor Appl Genet ; 137(1): 26, 2024 Jan 19.

Article in English | MEDLINE | ID: mdl-38243086

ABSTRACT

KEY MESSAGE: Pooling and imputation are computational methods that can be combined for achieving cost-effective and accurate high-density genotyping of both common and rare variants, as demonstrated in a MAGIC wheat population. The plant breeding industry has shown growing interest in using the genotype data of relevant markers for performing selection of new competitive varieties. The selection usually benefits from large amounts of marker data, and it is therefore crucial to dispose of data collection methods that are both cost-effective and reliable. Computational methods such as genotype imputation have been proposed earlier in several plant science studies for addressing the cost challenge. Genotype imputation methods have though been used more frequently and investigated more extensively in human genetics research. The various algorithms that exist have shown lower accuracy at inferring the genotype of genetic variants occurring at low frequency, while these rare variants can have great significance and impact in the genetic studies that underlie selection. In contrast, pooling is a technique that can efficiently identify low-frequency items in a population, and it has been successfully used for detecting the samples that carry rare variants in a population. In this study, we propose to combine pooling and imputation and demonstrate this by simulating a hypothetical microarray for genotyping a population of recombinant inbred lines in a cost-effective and accurate manner, even for rare variants. We show that with an adequate imputation model, it is feasible to accurately predict the individual genotypes at lower cost than sample-wise genotyping and time-effectively. Moreover, we provide code resources for reproducing the results presented in this study in the form of a containerized workflow.

Subject(s)

Polymorphism, Single Nucleotide , Triticum , Humans , Genotype , Triticum/genetics , Bread , Plant Breeding , Genotyping Techniques/methods

2.

Observation of a single protein by ultrafast X-ray diffraction.

Ekeberg, Tomas; Assalauova, Dameli; Bielecki, Johan; Boll, Rebecca; Daurer, Benedikt J; Eichacker, Lutz A; Franken, Linda E; Galli, Davide E; Gelisio, Luca; Gumprecht, Lars; Gunn, Laura H; Hajdu, Janos; Hartmann, Robert; Hasse, Dirk; Ignatenko, Alexandr; Koliyadu, Jayanath; Kulyk, Olena; Kurta, Ruslan; Kuster, Markus; Lugmayr, Wolfgang; Lübke, Jannik; Mancuso, Adrian P; Mazza, Tommaso; Nettelblad, Carl; Ovcharenko, Yevheniy; Rivas, Daniel E; Rose, Max; Samanta, Amit K; Schmidt, Philipp; Sobolev, Egor; Timneanu, Nicusor; Usenko, Sergey; Westphal, Daniel; Wollweber, Tamme; Worbs, Lena; Xavier, Paul Lourdu; Yousef, Hazem; Ayyer, Kartik; Chapman, Henry N; Sellberg, Jonas A; Seuring, Carolin; Vartanyants, Ivan A; Küpper, Jochen; Meyer, Michael; Maia, Filipe R N C.

Light Sci Appl ; 13(1): 15, 2024 Jan 12.

Article in English | MEDLINE | ID: mdl-38216563

ABSTRACT

The idea of using ultrashort X-ray pulses to obtain images of single proteins frozen in time has fascinated and inspired many. It was one of the arguments for building X-ray free-electron lasers. According to theory, the extremely intense pulses provide sufficient signal to dispense with using crystals as an amplifier, and the ultrashort pulse duration permits capturing the diffraction data before the sample inevitably explodes. This was first demonstrated on biological samples a decade ago on the giant mimivirus. Since then, a large collaboration has been pushing the limit of the smallest sample that can be imaged. The ability to capture snapshots on the timescale of atomic vibrations, while keeping the sample at room temperature, may allow probing the entire conformational phase space of macromolecules. Here we show the first observation of an X-ray diffraction pattern from a single protein, that of Escherichia coli GroEL which at 14 nm in diameter is the smallest biological sample ever imaged by X-rays, and demonstrate that the concept of diffraction before destruction extends to single proteins. From the pattern, it is possible to determine the approximate orientation of the protein. Our experiment demonstrates the feasibility of ultrafast imaging of single proteins, opening the way to single-molecule time-resolved studies on the femtosecond timescale.

3.

Achieving improved accuracy for imputation of ancient DNA.

Ausmees, Kristiina; Nettelblad, Carl.

Bioinformatics ; 39(1)2023 01 01.

Article in English | MEDLINE | ID: mdl-36377787

ABSTRACT

MOTIVATION: Genotype imputation has the potential to increase the amount of information that can be gained from the often limited biological material available in ancient samples. As many widely used tools have been developed with modern data in mind, their design is not necessarily reflective of the requirements in studies of ancient DNA. Here, we investigate if an imputation method based on the full probabilistic Li and Stephens model of haplotype frequencies might be beneficial for the particular challenges posed by ancient data. RESULTS: We present an implementation called prophaser and compare imputation performance to two alternative pipelines that have been used in the ancient DNA community based on the Beagle software. Considering empirical ancient data downsampled to lower coverages as well as present-day samples with artificially thinned genotypes, we show that the proposed method is advantageous at lower coverages, where it yields improved accuracy and ability to capture rare variation. The software prophaser is optimized for running in a massively parallel manner and achieved reasonable runtimes on the experiments performed when executed on a GPU. AVAILABILITY AND IMPLEMENTATION: The C++ code for prophaser is available in the GitHub repository https://github.com/scicompuu/prophaser. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.

Subject(s)

DNA, Ancient , Software , Animals , Dogs , Humans , Genotype , Haplotypes , Ethnicity

4.

A joint use of pooling and imputation for genotyping SNPs.

Clouard, Camille; Ausmees, Kristiina; Nettelblad, Carl.

BMC Bioinformatics ; 23(1): 421, 2022 Oct 13.

Article in English | MEDLINE | ID: mdl-36229780

ABSTRACT

BACKGROUND: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding.

Subject(s)

Genome , Polymorphism, Single Nucleotide , Algorithms , Animals , Dogs , Genotype , Genotyping Techniques/methods , Humans

5.

An empirical evaluation of genotype imputation of ancient DNA.

Ausmees, Kristiina; Sanchez-Quinto, Federico; Jakobsson, Mattias; Nettelblad, Carl.

G3 (Bethesda) ; 12(6)2022 05 30.

Article in English | MEDLINE | ID: mdl-35482488

ABSTRACT

With capabilities of sequencing ancient DNA to high coverage often limited by sample quality or cost, imputation of missing genotypes presents a possibility to increase the power of inference as well as cost-effectiveness for the analysis of ancient data. However, the high degree of uncertainty often associated with ancient DNA poses several methodological challenges, and performance of imputation methods in this context has not been fully explored. To gain further insights, we performed a systematic evaluation of imputation of ancient data using Beagle v4.0 and reference data from phase 3 of the 1000 Genomes project, investigating the effects of coverage, phased reference, and study sample size. Making use of five ancient individuals with high-coverage data available, we evaluated imputed data for accuracy, reference bias, and genetic affinities as captured by principal component analysis. We obtained genotype concordance levels of over 99% for data with 1× coverage, and similar levels of accuracy and reference bias at levels as low as 0.75×. Our findings suggest that using imputed data can be a realistic option for various population genetic analyses even for data in coverage ranges below 1×. We also show that a large and varied phased reference panel as well as the inclusion of low- to moderate-coverage ancient individuals in the study sample can increase imputation performance, particularly for rare alleles. In-depth analysis of imputed data with respect to genetic variants and allele frequencies gave further insight into the nature of errors arising during imputation, and can provide practical guidelines for postprocessing and validation prior to downstream analysis.

Subject(s)

DNA, Ancient , Genotype , Alleles , Gene Frequency , Humans , Software

6.

A deep learning framework for characterization of genotype data.

Ausmees, Kristiina; Nettelblad, Carl.

G3 (Bethesda) ; 12(3)2022 03 04.

Article in English | MEDLINE | ID: mdl-35078229

ABSTRACT

Dimensionality reduction is a data transformation technique widely used in various fields of genomics research. The application of dimensionality reduction to genotype data is known to capture genetic similarity between individuals, and is used for visualization of genetic variation, identification of population structure as well as ancestry mapping. Among frequently used methods are principal component analysis, which is a linear transform that often misses more fine-scale structures, and neighbor-graph based methods which focus on local relationships rather than large-scale patterns. Deep learning models are a type of nonlinear machine learning method in which the features used in data transformation are decided by the model in a data-driven manner, rather than by the researcher, and have been shown to present a promising alternative to traditional statistical methods for various applications in omics research. In this study, we propose a deep learning model based on a convolutional autoencoder architecture for dimensionality reduction of genotype data. Using a highly diverse cohort of human samples, we demonstrate that the model can identify population clusters and provide richer visual information in comparison to principal component analysis, while preserving global geometry to a higher extent than t-SNE and UMAP, yielding results that are comparable to an alternative deep learning approach based on variational autoencoders. We also discuss the use of the methodology for more general characterization of genotype data, showing that it preserves spatial properties in the form of decay of linkage disequilibrium with distance along the genome and demonstrating its use as a genetic clustering method, comparing results to the ADMIXTURE software frequently used in population genetic studies.

Subject(s)

Computer Simulation , Deep Learning , Cluster Analysis , Genome, Human/genetics , Genotype , Humans , Software

7.

Training algorithm matters for the performance of neural network potential: A case study of Adam and the Kalman filter optimizers.

Shao, Yunqi; Dietrich, Florian M; Nettelblad, Carl; Zhang, Chao.

J Chem Phys ; 155(20): 204108, 2021 Nov 28.

Article in English | MEDLINE | ID: mdl-34852491

ABSTRACT

One hidden yet important issue for developing neural network potentials (NNPs) is the choice of training algorithm. In this article, we compare the performance of two popular training algorithms, the adaptive moment estimation algorithm (Adam) and the extended Kalman filter algorithm (EKF), using the Behler-Parrinello neural network and two publicly accessible datasets of liquid water [Morawietz et al., Proc. Natl. Acad. Sci. U. S. A. 113, 8368-8373, (2016) and Cheng et al., Proc. Natl. Acad. Sci. U. S. A. 116, 1110-1115, (2019)]. This is achieved by implementing EKF in TensorFlow. It is found that NNPs trained with EKF are more transferable and less sensitive to the value of the learning rate, as compared to Adam. In both cases, error metrics of the validation set do not always serve as a good indicator for the actual performance of NNPs. Instead, we show that their performance correlates well with a Fisher information based similarity measure.

8.

Diffraction data from aerosolized Coliphage PR772 virus particles imaged with the Linac Coherent Light Source.

Li, Haoyuan; Nazari, Reza; Abbey, Brian; Alvarez, Roberto; Aquila, Andrew; Ayyer, Kartik; Barty, Anton; Berntsen, Peter; Bielecki, Johan; Pietrini, Alberto; Bucher, Maximilian; Carini, Gabriella; Chapman, Henry N; Contreras, Alice; Daurer, Benedikt J; DeMirci, Hasan; Fluckiger, Leonie; Frank, Matthias; Hajdu, Janos; Hantke, Max F; Hogue, Brenda G; Hosseinizadeh, Ahmad; Hunter, Mark S; Jönsson, H Olof; Kirian, Richard A; Kurta, Ruslan P; Loh, Duane; Maia, Filipe R N C; Mancuso, Adrian P; Morgan, Andrew J; McFadden, Matthew; Muehlig, Kerstin; Munke, Anna; Reddy, Hemanth Kumar Narayana; Nettelblad, Carl; Ourmazd, Abbas; Rose, Max; Schwander, Peter; Marvin Seibert, M; Sellberg, Jonas A; Sierra, Raymond G; Sun, Zhibin; Svenda, Martin; Vartanyants, Ivan A; Walter, Peter; Westphal, Daniel; Williams, Garth; Xavier, P Lourdu; Yoon, Chun Hong; Zaare, Sahba.

Sci Data ; 7(1): 404, 2020 11 19.

Article in English | MEDLINE | ID: mdl-33214568

ABSTRACT

Single Particle Imaging (SPI) with intense coherent X-ray pulses from X-ray free-electron lasers (XFELs) has the potential to produce molecular structures without the need for crystallization or freezing. Here we present a dataset of 285,944 diffraction patterns from aerosolized Coliphage PR772 virus particles injected into the femtosecond X-ray pulses of the Linac Coherent Light Source (LCLS). Additional exposures with background information are also deposited. The diffraction data were collected at the Atomic, Molecular and Optical Science Instrument (AMO) of the LCLS in 4 experimental beam times during a period of four years. The photon energy was either 1.2 or 1.7 keV and the pulse energy was between 2 and 4 mJ in a focal spot of about 1.3 µm x 1.7 µm full width at half maximum (FWHM). The X-ray laser pulses captured the particles in random orientations. The data offer insight into aerosolised virus particles in the gas phase, contain information relevant to improving experimental parameters, and provide a basis for developing algorithms for image analysis and reconstruction.

Subject(s)

Coliphages , Lasers , Particle Accelerators , Virion , X-Ray Diffraction

9.

Flash X-ray diffraction imaging in 3D: a proposed analysis pipeline.

Liu, Jing; Engblom, Stefan; Nettelblad, Carl.

J Opt Soc Am A Opt Image Sci Vis ; 37(10): 1673-1686, 2020 Oct 01.

Article in English | MEDLINE | ID: mdl-33104615

ABSTRACT

Modern Flash X-ray diffraction Imaging (FXI) acquires diffraction signals from single biomolecules at a high repetition rate from X-ray Free Electron Lasers (XFELs), easily obtaining millions of 2D diffraction patterns from a single experiment. Due to the stochastic nature of FXI experiments and the massive volumes of data, retrieving 3D electron densities from raw 2D diffraction patterns is a challenging and time-consuming task. We propose a semi-automatic data analysis pipeline for FXI experiments, which includes four steps: hit-finding and preliminary filtering, pattern classification, 3D Fourier reconstruction, and post-analysis. We also include a recently developed bootstrap methodology in the post-analysis step for uncertainty analysis and quality control. To achieve the best possible resolution, we further suggest using background subtraction, signal windowing, and convex optimization techniques when retrieving the Fourier phases in the post-analysis step. As an application example, we quantified the 3D electron structure of the PR772 virus using the proposed data analysis pipeline. The retrieved structure was above the detector edge resolution and clearly showed the pseudo-icosahedral capsid of the PR772.

10.

The presence and impact of reference bias on population genomic studies of prehistoric human populations.

Günther, Torsten; Nettelblad, Carl.

PLoS Genet ; 15(7): e1008302, 2019 07.

Article in English | MEDLINE | ID: mdl-31348818

ABSTRACT

Haploid high quality reference genomes are an important resource in genomic research projects. A consequence is that DNA fragments carrying the reference allele will be more likely to map successfully, or receive higher quality scores. This reference bias can have effects on downstream population genomic analysis when heterozygous sites are falsely considered homozygous for the reference allele. In palaeogenomic studies of human populations, mapping against the human reference genome is used to identify endogenous human sequences. Ancient DNA studies usually operate with low sequencing coverages and fragmentation of DNA molecules causes a large proportion of the sequenced fragments to be shorter than 50 bp-reducing the amount of accepted mismatches, and increasing the probability of multiple matching sites in the genome. These ancient DNA specific properties are potentially exacerbating the impact of reference bias on downstream analyses, especially since most studies of ancient human populations use pseudo-haploid data, i.e. they randomly sample only one sequencing read per site. We show that reference bias is pervasive in published ancient DNA sequence data of prehistoric humans with some differences between individual genomic regions. We illustrate that the strength of reference bias is negatively correlated with fragment length. Most genomic regions we investigated show little to no mapping bias but even a small proportion of sites with bias can impact analyses of those particular loci or slightly skew genome-wide estimates. Therefore, reference bias has the potential to cause minor but significant differences in the results of downstream analyses such as population allele sharing, heterozygosity estimates and estimates of archaic ancestry. These spurious results highlight how important it is to be aware of these technical artifacts and that we need strategies to mitigate the effect. Therefore, we suggest some post-mapping filtering strategies to resolve reference bias which help to reduce its impact substantially.

Subject(s)

DNA, Ancient/analysis , Hominidae/genetics , Metagenomics/methods , Animals , Bias , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Humans , Sequence Analysis, DNA/methods , Software

11.

Erratum: Experimental strategies for imaging bioparticles with femtosecond hard X-ray pulses. Corrigendum.

Daurer, Benedikt J; Okamoto, Kenta; Bielecki, Johan; Maia, Filipe R N C; Mühlig, Kerstin; Seibert, M Marvin; Hantke, Max F; Nettelblad, Carl; Benner, W Henry; Svenda, Martin; Tîmneanu, Nicusor; Ekeberg, Tomas; Loh, N Duane; Pietrini, Alberto; Zani, Alessandro; Rath, Asawari D; Westphal, Daniel; Kirian, Richard A; Awel, Salah; Wiedorn, Max O; van der Schot, Gijs; Carlsson, Gunilla H; Hasse, Dirk; Sellberg, Jonas A; Barty, Anton; Andreasson, Jakob; Boutet, Sébastien; Williams, Garth; Koglin, Jason; Andersson, Inger; Hajdu, Janos; Larsson, Daniel S D.

IUCrJ ; 6(Pt 3): 500, 2019 05 01.

Article in English | MEDLINE | ID: mdl-31098030

ABSTRACT

[This corrects the article DOI: 10.1107/S2052252517003591.].

12.

Electrospray sample injection for single-particle imaging with x-ray lasers.

Bielecki, Johan; Hantke, Max F; Daurer, Benedikt J; Reddy, Hemanth K N; Hasse, Dirk; Larsson, Daniel S D; Gunn, Laura H; Svenda, Martin; Munke, Anna; Sellberg, Jonas A; Flueckiger, Leonie; Pietrini, Alberto; Nettelblad, Carl; Lundholm, Ida; Carlsson, Gunilla; Okamoto, Kenta; Timneanu, Nicusor; Westphal, Daniel; Kulyk, Olena; Higashiura, Akifumi; van der Schot, Gijs; Loh, Ne-Te Duane; Wysong, Taylor E; Bostedt, Christoph; Gorkhover, Tais; Iwan, Bianca; Seibert, M Marvin; Osipov, Timur; Walter, Peter; Hart, Philip; Bucher, Maximilian; Ulmer, Anatoli; Ray, Dipanwita; Carini, Gabriella; Ferguson, Ken R; Andersson, Inger; Andreasson, Jakob; Hajdu, Janos; Maia, Filipe R N C.

Sci Adv ; 5(5): eaav8801, 2019 05.

Article in English | MEDLINE | ID: mdl-31058226

ABSTRACT

The possibility of imaging single proteins constitutes an exciting challenge for x-ray lasers. Despite encouraging results on large particles, imaging small particles has proven to be difficult for two reasons: not quite high enough pulse intensity from currently available x-ray lasers and, as we demonstrate here, contamination of the aerosolized molecules by nonvolatile contaminants in the solution. The amount of contamination on the sample depends on the initial droplet size during aerosolization. Here, we show that, with our electrospray injector, we can decrease the size of aerosol droplets and demonstrate virtually contaminant-free sample delivery of organelles, small virions, and proteins. The results presented here, together with the increased performance of next-generation x-ray lasers, constitute an important stepping stone toward the ultimate goal of protein structure determination from imaging at room temperature and high temporal resolution.

13.

Using convex optimization of autocorrelation with constrained support and windowing for improved phase retrieval accuracy.

Pietrini, Alberto; Nettelblad, Carl.

Opt Express ; 26(19): 24422-24443, 2018 Sep 17.

Article in English | MEDLINE | ID: mdl-30469561

ABSTRACT

In imaging modalities recording diffraction data, such as the imaging of viruses at X-ray free electron laser facilities, the original image can be reconstructed assuming known phases. When phases are unknown, oversampling and a constraint on the support region in the original object can be used to solve a non-convex optimization problem using iterative alternating-projection methods. Such schemes are ill-suited for finding the optimum solution for sparse data, since the recorded pattern does not correspond exactly to the original wave function. Different iteration starting points can give rise to different solutions. We construct a convex optimization problem, where the only local optimum is also the global optimum. This is achieved using a modified support constraint and a maximum-likelihood treatment of the recorded data as a sample from the underlying wave function. This relaxed problem is solved in order to provide a new set of most probable "healed" signal intensities, without sparseness and missing data. For these new intensities, it should be possible to satisfy the support constraint and intensity constraint exactly, without conflicts between them. By making both constraints satisfiable, traditional phase retrieval with superior results is made possible. On simulated data, we demonstrate the benefits of our approach visually, and quantify the improvement in terms of the crystallographic R factor for the recovered scalar amplitudes relative to true simulations from .405 to .097, as well as the mean-squared error in the reconstructed image from .233 to .139. We also compare our approach, with regards to theory and simulation results, to other approaches for healing as well as noise-tolerant phase retrieval. These tests indicate that the COACS pre-processing allows for best-in-class results.

14.

Considerations for three-dimensional image reconstruction from experimental data in coherent diffractive imaging.

Lundholm, Ida V; Sellberg, Jonas A; Ekeberg, Tomas; Hantke, Max F; Okamoto, Kenta; van der Schot, Gijs; Andreasson, Jakob; Barty, Anton; Bielecki, Johan; Bruza, Petr; Bucher, Max; Carron, Sebastian; Daurer, Benedikt J; Ferguson, Ken; Hasse, Dirk; Krzywinski, Jacek; Larsson, Daniel S D; Morgan, Andrew; Mühlig, Kerstin; Müller, Maria; Nettelblad, Carl; Pietrini, Alberto; Reddy, Hemanth K N; Rupp, Daniela; Sauppe, Mario; Seibert, Marvin; Svenda, Martin; Swiggers, Michelle; Timneanu, Nicusor; Ulmer, Anatoli; Westphal, Daniel; Williams, Garth; Zani, Alessandro; Faigel, Gyula; Chapman, Henry N; Möller, Thomas; Bostedt, Christoph; Hajdu, Janos; Gorkhover, Tais; Maia, Filipe R N C.

IUCrJ ; 5(Pt 5): 531-541, 2018 Sep 01.

Article in English | MEDLINE | ID: mdl-30224956

ABSTRACT

Diffraction before destruction using X-ray free-electron lasers (XFELs) has the potential to determine radiation-damage-free structures without the need for crystallization. This article presents the three-dimensional reconstruction of the Melbournevirus from single-particle X-ray diffraction patterns collected at the LINAC Coherent Light Source (LCLS) as well as reconstructions from simulated data exploring the consequences of different kinds of experimental sources of noise. The reconstruction from experimental data suffers from a strong artifact in the center of the particle. This could be reproduced with simulated data by adding experimental background to the diffraction patterns. In those simulations, the relative density of the artifact increases linearly with background strength. This suggests that the artifact originates from the Fourier transform of the relatively flat background, concentrating all power in a central feature of limited extent. We support these findings by significantly reducing the artifact through background removal before the phase-retrieval step. Large amounts of blurring in the diffraction patterns were also found to introduce diffuse artifacts, which could easily be mistaken as biologically relevant features. Other sources of noise such as sample heterogeneity and variation of pulse energy did not significantly degrade the quality of the reconstructions. Larger data volumes, made possible by the recent inauguration of high repetition-rate XFELs, allow for increased signal-to-background ratio and provide a way to minimize these artifacts. The anticipated development of three-dimensional Fourier-volume-assembly algorithms which are background aware is an alternative and complementary solution, which maximizes the use of data.

15.

Assessing uncertainties in x-ray single-particle three-dimensional reconstruction.

Liu, Jing; Engblom, Stefan; Nettelblad, Carl.

Phys Rev E ; 98(1-1): 013303, 2018 Jul.

Article in English | MEDLINE | ID: mdl-30110794

ABSTRACT

Modern technology for producing extremely bright and coherent x-ray laser pulses provides the possibility to acquire a large number of diffraction patterns from individual biological nanoparticles, including proteins, viruses, and DNA. These two-dimensional diffraction patterns can be practically reconstructed and retrieved down to a resolution of a few angstroms. In principle, a sufficiently large collection of diffraction patterns will contain the required information for a full three-dimensional reconstruction of the biomolecule. The computational methodology for this reconstruction task is still under development and highly resolved reconstructions have not yet been produced. We analyze the expansion-maximization-compression scheme, the current state of the art approach for this very challenging application, by isolating different sources of resolution-limiting factors. Through numerical experiments on synthetic data we evaluate their respective impact. We reach conclusions of relevance for handling actual experimental data, and we also point out certain improvements to the underlying estimation algorithm. We also introduce a practically applicable computational methodology in the form of bootstrap procedures for assessing reconstruction uncertainty in the real data case. We evaluate the sharpness of this approach and argue that this type of procedure will be critical in the near future when handling the increasing amount of data.

16.

BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data.

Ausmees, Kristiina; John, Aji; Toor, Salman Z; Hellander, Andreas; Nettelblad, Carl.

BMC Bioinformatics ; 19(1): 240, 2018 06 26.

Article in English | MEDLINE | ID: mdl-29940842

ABSTRACT

BACKGROUND: The advent of next-generation sequencing (NGS) has made whole-genome sequencing of cohorts of individuals a reality. Primary datasets of raw or aligned reads of this sort can get very large. For scientific questions where curated called variants are not sufficient, the sheer size of the datasets makes analysis prohibitively expensive. In order to make re-analysis of such data feasible without the need to have access to a large-scale computing facility, we have developed a highly scalable, storage-agnostic framework, an associated API and an easy-to-use web user interface to execute custom filters on large genomic datasets. RESULTS: We present BAMSI, a Software as-a Service (SaaS) solution for filtering of the 1000 Genomes phase 3 set of aligned reads, with the possibility of extension and customization to other sets of files. Unique to our solution is the capability of simultaneously utilizing many different mirrors of the data to increase the speed of the analysis. In particular, if the data is available in private or public clouds - an increasingly common scenario for both academic and commercial cloud providers - our framework allows for seamless deployment of filtering workers close to data. We show results indicating that such a setup improves the horizontal scalability of the system, and present a possible use case of the framework by performing an analysis of structural variation in the 1000 Genomes data set. CONCLUSIONS: BAMSI constitutes a framework for efficient filtering of large genomic data sets that is flexible in the use of compute as well as storage resources. The data resulting from the filter is assumed to be greatly reduced in size, and can easily be downloaded or routed into e.g. a Hadoop cluster for subsequent interactive analysis using Hive, Spark or similar tools. In this respect, our framework also suggests a general model for making very large datasets of high scientific value more accessible by offering the possibility for organizations to share the cost of hosting data on hot storage, without compromising the scalability of downstream analysis.

Subject(s)

Cloud Computing/standards , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans

17.

Correlations in Scattered X-Ray Laser Pulses Reveal Nanoscale Structural Features of Viruses.

Kurta, Ruslan P; Donatelli, Jeffrey J; Yoon, Chun Hong; Berntsen, Peter; Bielecki, Johan; Daurer, Benedikt J; DeMirci, Hasan; Fromme, Petra; Hantke, Max Felix; Maia, Filipe R N C; Munke, Anna; Nettelblad, Carl; Pande, Kanupriya; Reddy, Hemanth K N; Sellberg, Jonas A; Sierra, Raymond G; Svenda, Martin; van der Schot, Gijs; Vartanyants, Ivan A; Williams, Garth J; Xavier, P Lourdu; Aquila, Andrew; Zwart, Peter H; Mancuso, Adrian P.

Phys Rev Lett ; 119(15): 158102, 2017 Oct 13.

Article in English | MEDLINE | ID: mdl-29077445

ABSTRACT

We use extremely bright and ultrashort pulses from an x-ray free-electron laser (XFEL) to measure correlations in x rays scattered from individual bioparticles. This allows us to go beyond the traditional crystallography and single-particle imaging approaches for structure investigations. We employ angular correlations to recover the three-dimensional (3D) structure of nanoscale viruses from x-ray diffraction data measured at the Linac Coherent Light Source. Correlations provide us with a comprehensive structural fingerprint of a 3D virus, which we use both for model-based and ab initio structure recovery. The analyses reveal a clear indication that the structure of the viruses deviates from the expected perfect icosahedral symmetry. Our results anticipate exciting opportunities for XFEL studies of the structure and dynamics of nanoscale objects by means of angular correlations.

Subject(s)

Viruses/ultrastructure , X-Ray Diffraction , Lasers , Radiography , Viruses/chemistry

18.

Artifact reduction in the CSPAD detectors used for LCLS experiments.

Pietrini, Alberto; Nettelblad, Carl.

J Synchrotron Radiat ; 24(Pt 5): 1092-1097, 2017 Sep 01.

Article in English | MEDLINE | ID: mdl-28862634

ABSTRACT

The existence of noise and column-wise artifacts in the CSPAD-140K detector and in a module of the CSPAD-2.3M large camera, respectively, is reported for the L730 and L867 experiments performed at the CXI Instrument at the Linac Coherent Light Source (LCLS), in low-flux and low signal-to-noise ratio regime. Possible remedies are discussed and an additional step in the preprocessing of data is introduced, which consists of performing a median subtraction along the columns of the detector modules. Thus, we reduce the overall variation in the photon count distribution, lowering the mean false-positive photon detection rate by about 4% (from 5.57 × 10-5 to 5.32 × 10-5âphoton counts pixel-1 frame-1 in L867, cxi86715) and 7% (from 1.70 × 10-3 to 1.58 × 10-3âphoton counts pixel-1 frame-1 in L730, cxi73013), and the standard deviation in false-positive photon count per shot by 15% and 35%, while not making our average photon detection threshold more stringent. Such improvements in detector noise reduction and artifact removal constitute a step forward in the development of flash X-ray imaging techniques for high-resolution, low-signal and in serial nano-crystallography experiments at X-ray free-electron laser facilities.

19.

Coherent soft X-ray diffraction imaging of coliphage PR772 at the Linac coherent light source.

Reddy, Hemanth K N; Yoon, Chun Hong; Aquila, Andrew; Awel, Salah; Ayyer, Kartik; Barty, Anton; Berntsen, Peter; Bielecki, Johan; Bobkov, Sergey; Bucher, Maximilian; Carini, Gabriella A; Carron, Sebastian; Chapman, Henry; Daurer, Benedikt; DeMirci, Hasan; Ekeberg, Tomas; Fromme, Petra; Hajdu, Janos; Hanke, Max Felix; Hart, Philip; Hogue, Brenda G; Hosseinizadeh, Ahmad; Kim, Yoonhee; Kirian, Richard A; Kurta, Ruslan P; Larsson, Daniel S D; Duane Loh, N; Maia, Filipe R N C; Mancuso, Adrian P; Mühlig, Kerstin; Munke, Anna; Nam, Daewoong; Nettelblad, Carl; Ourmazd, Abbas; Rose, Max; Schwander, Peter; Seibert, Marvin; Sellberg, Jonas A; Song, Changyong; Spence, John C H; Svenda, Martin; Van der Schot, Gijs; Vartanyants, Ivan A; Williams, Garth J; Xavier, P Lourdu.

Sci Data ; 4: 170079, 2017 06 27.

Article in English | MEDLINE | ID: mdl-28654088

ABSTRACT

Single-particle diffraction from X-ray Free Electron Lasers offers the potential for molecular structure determination without the need for crystallization. In an effort to further develop the technique, we present a dataset of coherent soft X-ray diffraction images of Coliphage PR772 virus, collected at the Atomic Molecular Optics (AMO) beamline with pnCCD detectors in the LAMP instrument at the Linac Coherent Light Source. The diameter of PR772 ranges from 65-70 nm, which is considerably smaller than the previously reported ~600 nm diameter Mimivirus. This reflects continued progress in XFEL-based single-particle imaging towards the single molecular imaging regime. The data set contains significantly more single particle hits than collected in previous experiments, enabling the development of improved statistical analysis, reconstruction algorithms, and quantitative metrics to determine resolution and self-consistency.

Subject(s)

Coliphages , Algorithms , Molecular Structure , X-Ray Diffraction

20.

Experimental strategies for imaging bioparticles with femtosecond hard X-ray pulses.

Daurer, Benedikt J; Okamoto, Kenta; Bielecki, Johan; Maia, Filipe R N C; Mühlig, Kerstin; Seibert, M Marvin; Hantke, Max F; Nettelblad, Carl; Benner, W Henry; Svenda, Martin; Tîmneanu, Nicusor; Ekeberg, Tomas; Loh, N Duane; Pietrini, Alberto; Zani, Alessandro; Rath, Asawari D; Westphal, Daniel; Kirian, Richard A; Awel, Salah; Wiedorn, Max O; van der Schot, Gijs; Carlsson, Gunilla H; Hasse, Dirk; Sellberg, Jonas A; Barty, Anton; Andreasson, Jakob; Boutet, Sébastien; Williams, Garth; Koglin, Jason; Andersson, Inger; Hajdu, Janos; Larsson, Daniel S D.

IUCrJ ; 4(Pt 3): 251-262, 2017 May 01.

Article in English | MEDLINE | ID: mdl-28512572

ABSTRACT

This study explores the capabilities of the Coherent X-ray Imaging Instrument at the Linac Coherent Light Source to image small biological samples. The weak signal from small samples puts a significant demand on the experiment. Aerosolized Omono River virus particles of â¼40ânm in diameter were injected into the submicrometre X-ray focus at a reduced pressure. Diffraction patterns were recorded on two area detectors. The statistical nature of the measurements from many individual particles provided information about the intensity profile of the X-ray beam, phase variations in the wavefront and the size distribution of the injected particles. The results point to a wider than expected size distribution (from â¼35 to â¼300ânm in diameter). This is likely to be owing to nonvolatile contaminants from larger droplets during aerosolization and droplet evaporation. The results suggest that the concentration of nonvolatile contaminants and the ratio between the volumes of the initial droplet and the sample particles is critical in such studies. The maximum beam intensity in the focus was found to be 1.9 × 1012 photons per µm2 per pulse. The full-width of the focus at half-maximum was estimated to be 500ânm (assuming 20% beamline transmission), and this width is larger than expected. Under these conditions, the diffraction signal from a sample-sized particle remained above the average background to a resolution of 4.25ânm. The results suggest that reducing the size of the initial droplets during aerosolization is necessary to bring small particles into the scope of detailed structural studies with X-ray lasers.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL