Search | VHL Regional Portal

1.

Multisubstituted pyrimidines effectively inhibit bacterial growth and biofilm formation of Staphylococcus aureus.

Provenzani, Riccardo; San-Martin-Galindo, Paola; Hassan, Ghada; Legehar, Ashenafi; Kallio, Aleksi; Xhaard, Henri; Fallarero, Adyary; Yli-Kauhaluoma, Jari.

Sci Rep ; 11(1): 7931, 2021 04 12.

Article in English | MEDLINE | ID: mdl-33846401

ABSTRACT

Biofilms are multicellular communities of microorganisms that generally attach to surfaces in a self-produced matrix. Unlike planktonic cells, biofilms can withstand conventional antibiotics, causing significant challenges in the healthcare system. Currently, new chemical entities are urgently needed to develop novel anti-biofilm agents. In this study, we designed and synthesized a set of 2,4,5,6-tetrasubstituted pyrimidines and assessed their antibacterial activity against planktonic cells and biofilms formed by Staphylococcus aureus. Compounds 9e, 10d, and 10e displayed potent activity for inhibiting the onset of biofilm formation as well as for killing pre-formed biofilms of S. aureus ATCC 25923 and Newman strains, with half-maximal inhibitory concentration (IC50) values ranging from 11.6 to 62.0 µM. These pyrimidines, at 100 µM, not only decreased the number of viable bacteria within the pre-formed biofilm by 2-3 log10 but also reduced the amount of total biomass by 30-50%. Furthermore, these compounds were effective against planktonic cells with minimum inhibitory concentration (MIC) values lower than 60 µM for both staphylococcal strains. Compound 10d inhibited the growth of S. aureus ATCC 25923 in a concentration-dependent manner and displayed a bactericidal anti-staphylococcal activity. Taken together, our study highlights the value of multisubstituted pyrimidines to develop novel anti-biofilm agents.

Subject(s)

Biofilms/growth & development , Pyrimidines/pharmacology , Staphylococcus aureus/growth & development , Staphylococcus aureus/physiology , Anti-Infective Agents/chemical synthesis , Anti-Infective Agents/chemistry , Anti-Infective Agents/pharmacology , Biomass , Cell Death/drug effects , Cell Line , Drug Design , Humans , Microbial Sensitivity Tests , Plankton/drug effects , Pyrimidines/chemical synthesis , Pyrimidines/chemistry , Staphylococcus aureus/drug effects , Structure-Activity Relationship

2.

Properties of Fixed-Fixed Models and Alternatives in Presence-Absence Data Analysis.

Kallio, Aleksi.

PLoS One ; 11(11): e0165456, 2016.

Article in English | MEDLINE | ID: mdl-27812126

ABSTRACT

Assessing the significance of patterns in presence-absence data is an important question in ecological data analysis, e.g., when studying nestedness. Significance testing can be performed with the commonly used fixed-fixed models, which preserve the row and column sums while permuting the data. The manuscript considers the properties of fixed-fixed models and points out how their strict constraints can lead to limited randomizability. The manuscript considers the question of relaxing row and column sun constraints of the fixed-fixed models. The Rasch models are presented as an alternative with relaxed constraints and sound statistical properties. Models are compared on presence-absence data and surprisingly the fixed-fixed models are observed to produce unreasonably optimistic measures of statistical significance, giving interesting insight into practical effects of limited randomizability.

Subject(s)

Data Interpretation, Statistical , Ecological and Environmental Phenomena , Models, Statistical , Stochastic Processes

3.

Recommendations on e-infrastructures for next-generation sequencing.

Spjuth, Ola; Bongcam-Rudloff, Erik; Dahlberg, Johan; Dahlö, Martin; Kallio, Aleksi; Pireddu, Luca; Vezzi, Francesco; Korpelainen, Eija.

Gigascience ; 5: 26, 2016 06 07.

Article in English | MEDLINE | ID: mdl-27267963

ABSTRACT

With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Computational Biology/methods , Humans , Information Storage and Retrieval , Internet , Software

4.

Individual FEV1 Trajectories Can Be Identified from a COPD Cohort.

Koskela, Jukka; Katajisto, Milla; Kallio, Aleksi; Kilpeläinen, Maritta; Lindqvist, Ari; Laitinen, Tarja.

COPD ; 13(4): 425-30, 2016 08.

Article in English | MEDLINE | ID: mdl-26807738

ABSTRACT

OBJECTIVE: We aim to make use of clinical spirometry data in order to identify individual COPD-patients with divergent trajectories of lung function over time. STUDY DESIGN AND SETTING: Hospital-based COPD cohort (N = 607) was followed on average 4.6 years. Each patient had a mean of 8.4 spirometries available. We used a Hierarchical Bayesian Model (HBM) to identify the individuals presenting constant trends in lung function. RESULTS: At a probability level of 95%, one third of the patients (180/607) presented rapidly declining FEV1 (mean -78 ml/year, 95% CI -73 to -83 ml) compared to that in the rest of the patients (mean -26 ml/year, 95% CI -23 to -29 ml, p ≤ 2.2 × 10(-16)). Constant improvement of FEV1 was very rare. The rapid decliners more frequently suffered from exacerbations measured by various outcome markers. CONCLUSION: Clinical data of unique patients can be utilized to identify diverging trajectories of FEV1 with a high probability. Frequent exacerbations were more prevalent in FEV1-decliners than in the rest of the patients. The result confirmed previously reported association between FEV1 decline and exacerbation rate and further suggested that in clinical practice HBM could improve the identification of high-risk individuals at early stages of the disease.

Subject(s)

Pulmonary Disease, Chronic Obstructive/physiopathology , Aged , Bayes Theorem , Disease Progression , Female , Follow-Up Studies , Forced Expiratory Volume , Humans , Male , Middle Aged , Retrospective Studies , Spirometry

5.

BioImg.org: A Catalog of Virtual Machine Images for the Life Sciences.

Dahlö, Martin; Haziza, Frédéric; Kallio, Aleksi; Korpelainen, Eija; Bongcam-Rudloff, Erik; Spjuth, Ola.

Bioinform Biol Insights ; 9: 125-8, 2015.

Article in English | MEDLINE | ID: mdl-26401099

ABSTRACT

Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education. BioImg.org is freely available at: https://bioimg.org.

6.

Experiences with workflows for automating data-intensive bioinformatics.

Spjuth, Ola; Bongcam-Rudloff, Erik; Hernández, Guillermo Carrasco; Forer, Lukas; Giovacchini, Mario; Guimera, Roman Valls; Kallio, Aleksi; Korpelainen, Eija; Kandula, Maciej M; Krachunov, Milko; Kreil, David P; Kulev, Ognyan; Labaj, Pawel P; Lampa, Samuel; Pireddu, Luca; Schönherr, Sebastian; Siretskiy, Alexey; Vassilev, Dimitar.

Biol Direct ; 10: 43, 2015 Aug 19.

Article in English | MEDLINE | ID: mdl-26282399

ABSTRACT

High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.

Subject(s)

Computational Biology/methods , Electronic Data Processing/methods , Workflow , High-Throughput Nucleotide Sequencing , Reproducibility of Results

7.

Quantitative analysis of colony morphology in yeast.

Ruusuvuori, Pekka; Lin, Jake; Scott, Adrian C; Tan, Zhihao; Sorsa, Saija; Kallio, Aleksi; Nykter, Matti; Yli-Harja, Olli; Shmulevich, Ilya; Dudley, Aimée M.

Biotechniques ; 56(1): 18-27, 2014 Jan.

Article in English | MEDLINE | ID: mdl-24447135

ABSTRACT

Microorganisms often form multicellular structures such as biofilms and structured colonies that can influence the organism's virulence, drug resistance, and adherence to medical devices. Phenotypic classification of these structures has traditionally relied on qualitative scoring systems that limit detailed phenotypic comparisons between strains. Automated imaging and quantitative analysis have the potential to improve the speed and accuracy of experiments designed to study the genetic and molecular networks underlying different morphological traits. For this reason, we have developed a platform that uses automated image analysis and pattern recognition to quantify phenotypic signatures of yeast colonies. Our strategy enables quantitative analysis of individual colonies, measured at a single time point or over a series of time-lapse images, as well as the classification of distinct colony shapes based on image-derived features. Phenotypic changes in colony morphology can be expressed as changes in feature space trajectories over time, thereby enabling the visualization and quantitative analysis of morphological development. To facilitate data exploration, results are plotted dynamically through an interactive Yeast Image Analysis web application (YIMAA; http://yimaa.cs.tut.fi) that integrates the raw and processed images across all time points, allowing exploration of the image-based features and principal components associated with morphological development.

Subject(s)

Image Processing, Computer-Assisted , Saccharomyces cerevisiae/genetics , Software , Algorithms , Internet , Saccharomyces cerevisiae/growth & development

8.

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.

Schumacher, André; Pireddu, Luca; Niemenmaa, Matti; Kallio, Aleksi; Korpelainen, Eija; Zanetti, Gianluigi; Heljanko, Keijo.

Bioinformatics ; 30(1): 119-20, 2014 Jan 01.

Article in English | MEDLINE | ID: mdl-24149054

ABSTRACT

SUMMARY: Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts. AVAILABILITY AND IMPLEMENTATION: Available under the open source MIT license at http://sourceforge.net/projects/seqpig/

Subject(s)

High-Throughput Screening Assays/methods , Software Design

9.

POMO--Plotting Omics analysis results for Multiple Organisms.

Lin, Jake; Kreisberg, Richard; Kallio, Aleksi; Dudley, Aimée M; Nykter, Matti; Shmulevich, Ilya; May, Patrick; Autio, Reija.

BMC Genomics ; 14: 918, 2013 Dec 24.

Article in English | MEDLINE | ID: mdl-24365393

ABSTRACT

BACKGROUND: Systems biology experiments studying different topics and organisms produce thousands of data values across different types of genomic data. Further, data mining analyses are yielding ranked and heterogeneous results and association networks distributed over the entire genome. The visualization of these results is often difficult and standalone web tools allowing for custom inputs and dynamic filtering are limited. RESULTS: We have developed POMO (http://pomo.cs.tut.fi), an interactive web-based application to visually explore omics data analysis results and associations in circular, network and grid views. The circular graph represents the chromosome lengths as perimeter segments, as a reference outer ring, such as cytoband for human. The inner arcs between nodes represent the uploaded network. Further, multiple annotation rings, for example depiction of gene copy number changes, can be uploaded as text files and represented as bar, histogram or heatmap rings. POMO has built-in references for human, mouse, nematode, fly, yeast, zebrafish, rice, tomato, Arabidopsis, and Escherichia coli. In addition, POMO provides custom options that allow integrated plotting of unsupported strains or closely related species associations, such as human and mouse orthologs or two yeast wild types, studied together within a single analysis. The web application also supports interactive label and weight filtering. Every iterative filtered result in POMO can be exported as image file and text file for sharing or direct future input. CONCLUSIONS: The POMO web application is a unique tool for omics data analysis, which can be used to visualize and filter the genome-wide networks in the context of chromosomal locations as well as multiple network layouts. With the several illustration and filtering options the tool supports the analysis and visualization of any heterogeneous omics data analysis association results for many organisms. POMO is freely available and does not require any installation or registration.

Subject(s)

Computational Biology/methods , Genomics/methods , Software , Systems Biology , Internet

10.

Optimizing detection of transcription factor-binding sites in ChIP-seq experiments.

Kallio, Aleksi; Elo, Laura L.

Methods Mol Biol ; 1038: 181-91, 2013.

Article in English | MEDLINE | ID: mdl-23872976

ABSTRACT

Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) offers a powerful means to study transcription factor binding on a genome-wide scale. While a number of advanced software packages have already become available for identifying ChIP-seq-binding sites, it has become evident that the choice of the package together with its adjustable parameters can considerably affect the biological conclusions made from the data. Therefore, to aid these choices, we have recently introduced a reproducibility-optimization procedure, which computationally adjusts the parameters of the popular peak detection algorithms for each ChIP-seq data separately. Here, we provide a detailed description of the procedure together with practical guidelines on how to apply its implementation, the peakROTS R-package, in a given ChIP-seq experiment.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Transcription Factors/metabolism , Animals , Binding Sites , Chromatin Immunoprecipitation/methods , Genome , Humans , Reproducibility of Results , Software

11.

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.

Niemenmaa, Matti; Kallio, Aleksi; Schumacher, André; Klemelä, Petri; Korpelainen, Eija; Heljanko, Keijo.

Bioinformatics ; 28(6): 876-7, 2012 Mar 15.

Article in English | MEDLINE | ID: mdl-22302568

ABSTRACT

Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.

Subject(s)

High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Genome , User-Computer Interface

12.

Optimized detection of transcription factor-binding sites in ChIP-seq experiments.

Elo, Laura L; Kallio, Aleksi; Laajala, Teemu D; Hawkins, R David; Korpelainen, Eija; Aittokallio, Tero.

Nucleic Acids Res ; 40(1): e1, 2012 Jan.

Article in English | MEDLINE | ID: mdl-22009681

ABSTRACT

We developed a computational procedure for optimizing the binding site detections in a given ChIP-seq experiment by maximizing their reproducibility under bootstrap sampling. We demonstrate how the procedure can improve the detection accuracies beyond those obtained with the default settings of popular peak calling software, or inform the user whether the peak detection results are compromised, circumventing the need for arbitrary re-iterative peak calling under varying parameter settings. The generic, open-source implementation is easily extendable to accommodate additional features and to promote its widespread application in future ChIP-seq studies. The peakROTS R-package and user guide are freely available at http://www.nic.funet.fi/pub/sci/molbio/peakROTS.

Subject(s)

Chromatin Immunoprecipitation/methods , Transcription Factors/analysis , Animals , Binding Sites , High-Throughput Nucleotide Sequencing , Humans , Mice , Sequence Analysis, DNA , Software

13.

Randomization techniques for assessing the significance of gene periodicity results.

Kallio, Aleksi; Vuokko, Niko; Ojala, Markus; Haiminen, Niina; Mannila, Heikki.

BMC Bioinformatics ; 12: 330, 2011 Aug 09.

Article in English | MEDLINE | ID: mdl-21827656

ABSTRACT

BACKGROUND: Modern high-throughput measurement technologies such as DNA microarrays and next generation sequencers produce extensive datasets. With large datasets the emphasis has been moving from traditional statistical tests to new data mining methods that are capable of detecting complex patterns, such as clusters, regulatory networks, or time series periodicity. Study of periodic gene expression is an interesting research question that also is a good example of challenges involved in the analysis of high-throughput data in general. Unlike for classical statistical tests, the distribution of test statistic for data mining methods cannot be derived analytically. RESULTS: We describe the randomization based approach to significance testing, and show how it can be applied to detect periodically expressed genes. We present four randomization methods, three of which have previously been used for gene cycle data. We propose a new method for testing significance of periodicity in gene expression short time series data, such as from gene cycle and circadian clock studies. We argue that the underlying assumptions behind existing significance testing approaches are problematic and some of them unrealistic. We analyze the theoretical properties of the existing and proposed methods, showing how our method can be robustly used to detect genes with exceptionally high periodicity. We also demonstrate the large differences in the number of significant results depending on the chosen randomization methods and parameters of the testing framework.By reanalyzing gene cycle data from various sources, we show how previous estimates on the number of gene cycle controlled genes are not supported by the data. Our randomization approach combined with widely adopted Benjamini-Hochberg multiple testing method yields better predictive power and produces more accurate null distributions than previous methods. CONCLUSIONS: Existing methods for testing significance of periodic gene expression patterns are simplistic and optimistic. Our testing framework allows strict levels of statistical significance with more realistic underlying assumptions, without losing predictive power. As DNA microarrays have now become mainstream and new high-throughput methods are rapidly being adopted, we argue that not only there will be need for data mining methods capable of coping with immense datasets, but there will also be need for solid methods for significance testing.

Subject(s)

Data Mining/methods , Gene Expression Regulation , Periodicity , Circadian Clocks , Cluster Analysis , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis

14.

MicroRNA expression profiling reveals miRNA families regulating specific biological pathways in mouse frontal cortex and hippocampus.

Juhila, Juuso; Sipilä, Tessa; Icay, Katherine; Nicorici, Daniel; Ellonen, Pekka; Kallio, Aleksi; Korpelainen, Eija; Greco, Dario; Hovatta, Iiris.

PLoS One ; 6(6): e21495, 2011.

Article in English | MEDLINE | ID: mdl-21731767

ABSTRACT

MicroRNAs (miRNAs) are small regulatory molecules that cause post-transcriptional gene silencing. Although some miRNAs are known to have region-specific expression patterns in the adult brain, the functional consequences of the region-specificity to the gene regulatory networks of the brain nuclei are not clear. Therefore, we studied miRNA expression patterns by miRNA-Seq and microarrays in two brain regions, frontal cortex (FCx) and hippocampus (HP), which have separate biological functions. We identified 354 miRNAs from FCx and 408 from HP using miRNA-Seq, and 245 from FCx and 238 from HP with microarrays. Several miRNA families and clusters were differentially expressed between FCx and HP, including the miR-8 family, miR-182|miR-96|miR-183 cluster, and miR-212|miR-312 cluster overexpressed in FCx and miR-34 family overexpressed in HP. To visualize the clusters, we developed support for viewing genomic alignments of miRNA-Seq reads in the Chipster genome browser. We carried out pathway analysis of the predicted target genes of differentially expressed miRNA families and clusters to assess their putative biological functions. Interestingly, several miRNAs from the same family/cluster were predicted to regulate specific biological pathways. We have developed a miRNA-Seq approach with a bioinformatic analysis workflow that is suitable for studying miRNA expression patterns from specific brain nuclei. FCx and HP were shown to have distinct miRNA expression patterns which were reflected in the predicted gene regulatory pathways. This methodology can be applied for the identification of brain region-specific and phenotype-specific miRNA-mRNA-regulatory networks from the adult and developing rodent brain.

Subject(s)

Frontal Lobe/metabolism , Gene Expression Profiling , Hippocampus/metabolism , MicroRNAs/genetics , Signal Transduction/genetics , Animals , Cluster Analysis , Computational Biology , Gene Expression Regulation , Genome/genetics , Mice , MicroRNAs/metabolism , Oligonucleotide Array Sequence Analysis , Organ Specificity/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Analysis, RNA

15.

Optimized detection of differential expression in global profiling experiments: case studies in clinical transcriptomic and quantitative proteomic datasets.

Elo, Laura L; Hiissa, Jukka; Tuimala, Jarno; Kallio, Aleksi; Korpelainen, Eija; Aittokallio, Tero.

Brief Bioinform ; 10(5): 547-55, 2009 Sep.

Article in English | MEDLINE | ID: mdl-19549804

ABSTRACT

Identification of reliable molecular markers that show differential expression between distinct groups of samples has remained a fundamental research problem in many large-scale profiling studies, such as those based on DNA microarray or mass-spectrometry technologies. Despite the availability of a wide spectrum of statistical procedures, the users of the high-throughput platforms are still facing the crucial challenge of deciding which test statistic is best adapted to the intrinsic properties of their own datasets. To meet this challenge, we recently introduced an adaptive procedure, named ROTS (Reproducibility-Optimized Test Statistic), which learns an optimal statistic directly from the given data, and whose relative benefits have previously been shown in comparison with state-of-the-art procedures for detecting differential expression. Using gene expression microarray and mass-spectrometry (MS)-based protein expression datasets as case studies, we illustrate here the practical usage and advantages of ROTS toward detecting reliable marker lists in clinical transcriptomic and proteomic studies. In a public leukemia microarray dataset, the procedure could improve the sensitivity of the gene marker lists detected with high specificity. When applied to a recent LC-MS dataset, involving plasma samples from severe burn patients, the procedure could identify several peptide markers that remained undetected in the conventional analysis, thus demonstrating the effectiveness of ROTS also for global quantitative proteomic studies. To promote its widespread usage, we have made freely available efficient implementations of ROTS, which are easily accessible either as a stand-alone R-package or as integrated in the open-source data analysis software Chipster.

Subject(s)

Biomarkers/metabolism , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Proteomics/methods , Software , Humans , Models, Statistical , Reproducibility of Results , Sensitivity and Specificity

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL