Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
PLoS One ; 7(7): e33088, 2012.
Article in English | MEDLINE | ID: mdl-22815673

ABSTRACT

BACKGROUND: The object of this study was to identify temperament patterns in the Finnish population, and to determine the relationship between these profiles and life habits, socioeconomic status, and health. METHODS/PRINCIPAL FINDINGS: A cluster analysis of the Temperament and Character Inventory subscales was performed on 3,761 individuals from the Northern Finland Birth Cohort 1966 and replicated on 2,097 individuals from the Cardiovascular Risk in Young Finns study. Clusters were formed using the k-means method and their relationship with 115 variables from the areas of life habits, socioeconomic status and health was examined. RESULTS: Four clusters were identified for both genders. Individuals from Cluster I are characterized by high persistence, low extravagance and disorderliness. They have healthy life habits, and lowest scores in most of the measures for psychiatric disorders. Cluster II individuals are characterized by low harm avoidance and high novelty seeking. They report the best physical capacity and highest level of income, but also high rate of divorce, smoking, and alcohol consumption. Individuals from Cluster III are not characterized by any extreme characteristic. Individuals from Cluster IV are characterized by high levels of harm avoidance, low levels of exploratory excitability and attachment, and score the lowest in most measures of health and well-being. CONCLUSIONS: This study shows that the temperament subscales do not distribute randomly but have an endogenous structure, and that these patterns have strong associations to health, life events, and well-being.


Subject(s)
Disease , Health , Temperament , Adolescent , Adult , Child , Child, Preschool , Cluster Analysis , Cohort Studies , Female , Finland , Habits , Humans , Longitudinal Studies , Male , Middle Aged , Social Class , Young Adult
2.
PLoS One ; 7(7): e38065, 2012.
Article in English | MEDLINE | ID: mdl-22815688

ABSTRACT

BACKGROUND: Investigation of the environmental influences on human behavioral phenotypes is important for our understanding of the causation of psychiatric disorders. However, there are complexities associated with the assessment of environmental influences on behavior. METHODS/PRINCIPAL FINDINGS: We conducted a series of analyses using a prospective, longitudinal study of a nationally representative birth cohort from Finland (the Northern Finland 1966 Birth Cohort). Participants included a total of 3,761 male and female cohort members who were living in Finland at the age of 16 years and who had complete temperament scores. Our initial analyses (Wessman et al., in press) provide evidence in support of four stable and robust temperament clusters. Using these temperament clusters, as well as independent temperament dimensions for comparison, we conducted a data-driven analysis to assess the influence of a broad set of life course measures, assessed pre-natally, in infancy, and during adolescence, on adult temperament. RESULTS: Measures of early environment, neurobehavioral development, and adolescent behavior significantly predict adult temperament, classified by both cluster membership and temperament dimensions. Specifically, our results suggest that a relatively consistent set of life course measures are associated with adult temperament profiles, including maternal education, characteristics of the family's location and residence, adolescent academic performance, and adolescent smoking. CONCLUSIONS: Our finding that a consistent set of life course measures predict temperament clusters indicate that these clusters represent distinct developmental temperament trajectories and that information about a subset of life course measures has implications for adult health outcomes.


Subject(s)
Behavior/physiology , Environment , Nervous System Physiological Phenomena , Temperament/physiology , Adolescent , Adult , Child , Child, Preschool , Cluster Analysis , Female , Humans , Infant , Infant, Newborn , Male , Pregnancy , Time Factors , Young Adult
3.
BMC Bioinformatics ; 12: 330, 2011 Aug 09.
Article in English | MEDLINE | ID: mdl-21827656

ABSTRACT

BACKGROUND: Modern high-throughput measurement technologies such as DNA microarrays and next generation sequencers produce extensive datasets. With large datasets the emphasis has been moving from traditional statistical tests to new data mining methods that are capable of detecting complex patterns, such as clusters, regulatory networks, or time series periodicity. Study of periodic gene expression is an interesting research question that also is a good example of challenges involved in the analysis of high-throughput data in general. Unlike for classical statistical tests, the distribution of test statistic for data mining methods cannot be derived analytically. RESULTS: We describe the randomization based approach to significance testing, and show how it can be applied to detect periodically expressed genes. We present four randomization methods, three of which have previously been used for gene cycle data. We propose a new method for testing significance of periodicity in gene expression short time series data, such as from gene cycle and circadian clock studies. We argue that the underlying assumptions behind existing significance testing approaches are problematic and some of them unrealistic. We analyze the theoretical properties of the existing and proposed methods, showing how our method can be robustly used to detect genes with exceptionally high periodicity. We also demonstrate the large differences in the number of significant results depending on the chosen randomization methods and parameters of the testing framework.By reanalyzing gene cycle data from various sources, we show how previous estimates on the number of gene cycle controlled genes are not supported by the data. Our randomization approach combined with widely adopted Benjamini-Hochberg multiple testing method yields better predictive power and produces more accurate null distributions than previous methods. CONCLUSIONS: Existing methods for testing significance of periodic gene expression patterns are simplistic and optimistic. Our testing framework allows strict levels of statistical significance with more realistic underlying assumptions, without losing predictive power. As DNA microarrays have now become mainstream and new high-throughput methods are rapidly being adopted, we argue that not only there will be need for data mining methods capable of coping with immense datasets, but there will also be need for solid methods for significance testing.


Subject(s)
Data Mining/methods , Gene Expression Regulation , Periodicity , Circadian Clocks , Cluster Analysis , Gene Expression Profiling , Oligonucleotide Array Sequence Analysis
4.
Int J Data Min Bioinform ; 4(6): 675-700, 2010.
Article in English | MEDLINE | ID: mdl-21355501

ABSTRACT

Segmentation is a general data mining technique for summarising and analysing sequential data. Segmentation can be applied, e.g., when studying large-scale genomic structures such as isochores. Choosing the number of segments remains a challenging question. We present extensive experimental studies on model selection techniques, Bayesian Information Criterion (BIC) and Cross Validation (CV). We successfully identify segments with different means or variances, and demonstrate the effect of linear trends and outliers, frequently occurring in real data. Results are given for real DNA sequences with respect to changes in their codon, G + C, and bigram frequencies, and copy-number variation from CGH data.


Subject(s)
Data Mining/methods , Genomics/methods , Base Sequence , Codon/genetics , Comparative Genomic Hybridization/methods , DNA/chemistry , Genome
5.
Biol Psychiatry ; 66(11): 990-6, 2009 Dec 01.
Article in English | MEDLINE | ID: mdl-19782967

ABSTRACT

BACKGROUND: While DTNBP1, DISC1, and NRG1 have been extensively studied as candidate genes of schizophrenia, results remain inconclusive. Possible explanations for this are that the genes might be relevant only to certain subtypes of the disease and/or only in certain populations. METHODS: We performed unsupervised clustering of individuals from Finnish schizophrenia families, based on extensive clinical and neuropsychological data, including Structured Clinical Interview for DSM-IV (SCID) information. Families with at least one affected member with DSM-IV diagnosis of a schizophrenia spectrum psychosis were included in a register-based ascertainment. Final sample consisted of 904 individuals from 288 families. We then used the cluster phenotypes in a genetic association study of candidate genes. RESULTS: A robust three-class clustering of individuals emerged: 1) psychotic disorder with mood symptoms (n = 172), 2) core schizophrenia (n = 223), and 3) absence of psychotic disorder (n = 509). One third of the individuals diagnosed with schizophrenia were assigned to cluster 1. These individuals had fewer negative and positive psychotic symptoms and cognitive deficits but more depressive symptoms than individuals in cluster 2. There was a significant association of cluster 2 cases with the DTNBP1 gene, while the DISC1 gene indicated a significant association with schizophrenia spectrum disorders based on the DSM-IV criteria. CONCLUSIONS: In the Finnish population, DTNBP1 gene is associated with a schizophrenia phenotype characterized by prominent negative symptoms, generalized cognitive impairment, and few mood symptoms. Identification of genes and pathways related to schizophrenia necessitates novel definitions of disease phenotypes associated more directly with underlying biology.


Subject(s)
Carrier Proteins/genetics , Cluster Analysis , Phenotype , Schizophrenia/diagnosis , Schizophrenia/genetics , Adult , Aged , Alleles , Dysbindin , Dystrophin-Associated Proteins , Genetic Association Studies/methods , Humans , Middle Aged , Nerve Tissue Proteins/genetics , Neuregulin-1/genetics , Polymorphism, Single Nucleotide , Psychotic Disorders/diagnosis , Psychotic Disorders/genetics , Schizophrenia/classification , Schizophrenic Psychology
6.
Am Nat ; 173(2): 264-72, 2009 Feb.
Article in English | MEDLINE | ID: mdl-20374142

ABSTRACT

An ever larger proportion of Earth's biota is affected by the current accelerating environmental change. The mismatches between organisms and their environments are now increasing in both magnitude and frequency, resulting in lowered fitness and hence the decline of populations. Under this scenario, species with behavioral and/or physiological traits that provide them shelter from the environment are predicted to be less vulnerable to population declines than species that are always exposed to the elements. Here, we coded 4,536 living mammal species for sleep-or-hide (SLOH) behavior, including hibernation, torpor, and the use of burrows, among other related traits. We demonstrate that species that exhibit SLOH behavior are underrepresented in high-risk International Union for Conservation of Nature Red List categories. We found that SLOH behavior contributes to lowering extinction risk even after we accounted for other factors that directly or indirectly buffer species against extinction, such as larger geographic ranges and smaller body sizes. This result is robust to analyses using phylogenetically independent contrasts. Sleep-or-hide behavior, made possible by a related suite of physiological adaptations, allows mammals to function at lower metabolic rates and/or buffer them from changing physical elements. Mammals with SLOH behavior have a greater propensity to survive in the current extinction crisis and probably also in past crises because of reduced exposure to environmental stress.


Subject(s)
Adaptation, Physiological/physiology , Behavior, Animal/physiology , Environment , Extinction, Biological , Mammals/physiology , Animals , Hibernation/physiology , Models, Statistical , Phylogeny , Regression Analysis , Risk Factors , Species Specificity
7.
BMC Bioinformatics ; 9: 336, 2008 Aug 08.
Article in English | MEDLINE | ID: mdl-18691400

ABSTRACT

BACKGROUND: Event sequences where different types of events often occur close together arise, e.g., when studying potential transcription factor binding sites (TFBS, events) of certain transcription factors (TF, types) in a DNA sequence. These events tend to occur in bursts: in some genomic regions there are more genes and therefore potentially more binding sites, while in some, possibly very long regions, hardly any events occur. Also some types of events may occur in the sequence more often than others. Tendencies of co-occurrence of binding sites of two or more TFs are interesting, as they may imply a co-operative role between the TFs in regulatory processes. Determining a numerical value to summarize the tendency for co-occurrence between two TFs can be done in a number of ways. However, testing for the significance of such values should be done with respect to a relevant null model that takes into account the global sequence structure. RESULTS: We extend the existing techniques that have been considered for determining the significance of co-occurrence patterns between a pair of event types under different null models. These models range from very simple ones to more complex models that take the burstiness of sequences into account. We evaluate the models and techniques on synthetic event sequences, and on real data consisting of potential transcription factor binding sites. CONCLUSION: We show that simple null models are poorly suited for bursty data, and they yield many false positives. More sophisticated models give better results in our experiments. We also demonstrate the effect of the window size, i.e., maximum co-occurrence distance, on the significance results.


Subject(s)
Algorithms , Data Interpretation, Statistical , Sequence Alignment/methods , Sequence Analysis/methods , Reproducibility of Results , Sensitivity and Specificity
8.
Eur J Hum Genet ; 16(9): 1142-50, 2008 Sep.
Article in English | MEDLINE | ID: mdl-18398430

ABSTRACT

We studied how well the European CEU samples used in the Haplotype Mapping Project (HapMap) represent five European populations by analyzing nuclear family samples from the Swedish, Finnish, Dutch, British and Australian (European ancestry) populations. The number of samples from each population (about 30 parent-offspring trios) was similar to that in the HapMap sample sets. A panel of 186 single nucleotide polymorphisms (SNPs) distributed over the 1.5 Mb region of the GRID2 gene on chromosome 4 was genotyped. The genotype data were compared pair-wise between the HapMap sample and the other population samples. Principal component analysis (PCA) was used to cluster the data from different populations with respect to allele frequencies and to define the markers responsible for observed variance. The only sample with detectable differences in allele frequencies was that from Kuusamo, Finland. This sample also separated from the others, including the other Finnish sample, in the PCA analysis. A set of tagSNPs was defined based on the HapMap data and applied to the samples. The tagSNPs were found to capture the genetic variation in the analyzed region at r(2)>0.8 at levels ranging from 95% in the Kuusamo sample to 87% in the Australian sample. To capture the maximal genetic variation in the region, the Kuusamo, HapMap and Australian samples required 58, 63 and 73 native tagSNPs, respectively. The HapMap CEU sample represents the European samples well for tagSNP selection, with some caution regarding estimation of allele frequencies in the Finnish Kuusamo sample, and a slight reduction in tagging efficiency in the Australian sample.


Subject(s)
Chromosome Mapping , Databases, Genetic , Haplotypes , White People/genetics , Gene Frequency , Humans , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Principal Component Analysis/methods , Receptors, Glutamate/genetics
9.
Proc Natl Acad Sci U S A ; 105(16): 6097-102, 2008 Apr 22.
Article in English | MEDLINE | ID: mdl-18417455

ABSTRACT

Do large mammals evolve faster than small mammals or vice versa? Because the answer to this question contributes to our understanding of how life-history affects long-term and large-scale evolutionary patterns, and how microevolutionary rates scale-up to macroevolutionary rates, it has received much attention. A satisfactory or consistent answer to this question is lacking, however. Here, we take a fresh look at this problem using a large fossil dataset of mammals from the Neogene of the Old World (NOW). Controlling for sampling biases, calculating per capita origination and extinction rates of boundary-crossers and estimating survival probabilities using capture-mark-recapture (CMR) methods, we found the recurring pattern that large mammal genera and species have higher origination and extinction rates, and therefore shorter durations. This pattern is surprising in the light of molecular studies, which show that smaller animals, with their shorter generation times and higher metabolic rates, have greater absolute rates of evolution. However, higher molecular rates do not necessarily translate to higher taxon rates because both the biotic and physical environments interact with phenotypic variation, in part fueled by mutations, to affect origination and extinction rates. To explain the observed pattern, we propose that the ability to evolve and maintain behavior such as hibernation, torpor and burrowing, collectively termed "sleep-or-hide" (SLOH) behavior, serves as a means of environmental buffering during expected and unexpected environmental change. SLOH behavior is more common in some small mammals, and, as a result, SLOH small mammals contribute to higher average survivorship and lower origination probabilities among small mammals.


Subject(s)
Biological Evolution , Extinction, Biological , Fossils , Mammals , Animals , Databases, Factual
10.
BMC Bioinformatics ; 8 Suppl 2: S9, 2007 May 03.
Article in English | MEDLINE | ID: mdl-17493258

ABSTRACT

BACKGROUND: Haplotype Reconstruction is the problem of resolving the hidden phase information in genotype data obtained from laboratory measurements. Solving this problem is an important intermediate step in gene association studies, which seek to uncover the genetic basis of complex diseases. We propose a novel approach for haplotype reconstruction based on constrained hidden Markov models. Models are constructed by incrementally refining and regularizing the structure of a simple generative model for genotype data under Hardy-Weinberg equilibrium. RESULTS: The proposed method is evaluated on real-world and simulated population data. Results show that it is competitive with other recently proposed methods in terms of reconstruction accuracy, while offering a particularly good trade-off between computational costs and quality of results for large datasets. CONCLUSION: Relatively simple probabilistic approaches for haplotype reconstruction based on structured hidden Markov models are competitive with more complex, well-established techniques in this field.


Subject(s)
Artificial Intelligence , Chromosome Mapping/methods , DNA Mutational Analysis/methods , Genetics, Population , Models, Genetic , Pattern Recognition, Automated/methods , Sequence Analysis, DNA/methods , Algorithms , Base Sequence , Genetic Linkage/genetics , Haplotypes , Markov Chains , Models, Statistical , Molecular Sequence Data , Polymorphism, Single Nucleotide/genetics
11.
BMC Bioinformatics ; 8: 171, 2007 May 23.
Article in English | MEDLINE | ID: mdl-17521423

ABSTRACT

BACKGROUND: There exist many segmentation techniques for genomic sequences, and the segmentations can also be based on many different biological features. We show how to evaluate and compare the quality of segmentations obtained by different techniques and alternative biological features. RESULTS: We apply randomization techniques for evaluating the quality of a given segmentation. Our example applications include isochore detection and the discovery of coding-noncoding structure. We obtain segmentations of relevant sequences by applying different techniques, and use alternative features to segment on. We show that some of the obtained segmentations are very similar to the underlying true segmentations, and this similarity is statistically significant. For some other segmentations, we show that equally good results are likely to appear by chance. CONCLUSION: We introduce a framework for evaluating segmentation quality, and demonstrate its use on two examples of segmental genomic structures. We transform the process of quality evaluation from simply viewing the segmentations, to obtaining p-values denoting significance of segmentation similarity.


Subject(s)
Algorithms , Chromosome Mapping/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Base Sequence , Computer Simulation , Data Interpretation, Statistical , Models, Genetic , Models, Statistical , Molecular Sequence Data , Sequence Homology, Nucleic Acid
12.
Gene ; 394(1-2): 53-60, 2007 Jun 01.
Article in English | MEDLINE | ID: mdl-17389148

ABSTRACT

The isochore structure of a genome is observable by variation in the G+C (guanine and cytosine) content within and between the chromosomes. Describing the isochore structure of vertebrate genomes is a challenging task, and many computational methods have been developed and applied to it. Here we apply a well-known least-squares optimal segmentation algorithm to isochore discovery. The algorithm finds the best division of the sequence into k pieces, such that the segments are internally as homogeneous as possible. We show how this simple segmentation method can be applied to isochore discovery using as input the G+C content of sliding windows on the sequence. To evaluate the performance of this segmentation technique on isochore detection, we present results from segmenting previously studied isochore regions of the human genome. Detailed results on the MHC locus, on parts of chromosomes 21 and 22, and on a 100 Mb region from chromosome 1 are similar to previously suggested isochore structures. We also give results on segmenting all 22 autosomal human chromosomes. An advantage of this technique is that oversegmentation of G+C rich regions can generally be avoided. This is because the technique concentrates on greater global, instead of smaller local, differences in the sequence composition. The effect is further emphasized by a log-transformation of the data that lowers the high variance that is observed in G+C rich regions. We conclude that the least-squares optimal segmentation method is computationally efficient and yields results close to previous biologically motivated isochore structures.


Subject(s)
Isochores/genetics , Algorithms , Chromosomes, Human/genetics , Chromosomes, Human, Pair 1/genetics , Chromosomes, Human, Pair 21/genetics , Chromosomes, Human, Pair 22/genetics , Chromosomes, Human, Pair 6/genetics , GC Rich Sequence , Genome, Human , Genomics/statistics & numerical data , Humans , Isochores/chemistry , Least-Squares Analysis , Major Histocompatibility Complex
13.
PLoS Comput Biol ; 2(2): e6, 2006 Feb.
Article in English | MEDLINE | ID: mdl-16477311

ABSTRACT

Given a collection of fossil sites with data about the taxa that occur in each site, the task in biochronology is to find good estimates for the ages or ordering of sites. We describe a full probabilistic model for fossil data. The parameters of the model are natural: the ordering of the sites, the origination and extinction times for each taxon, and the probabilities of different types of errors. We show that the posterior distributions of these parameters can be estimated reliably by using Markov chain Monte Carlo techniques. The posterior distributions of the model parameters can be used to answer many different questions about the data, including seriation (finding the best ordering of the sites) and outlier detection. We demonstrate the usefulness of the model and estimation method on synthetic data and on real data on large late Cenozoic mammals. As an example, for the sites with large number of occurrences of common genera, our methods give orderings, whose correlation with geochronologic ages is 0.95.


Subject(s)
Computational Biology/methods , Paleontology/methods , Animals , Fossils , Likelihood Functions , Mammals , Markov Chains , Models, Biological , Models, Statistical , Models, Theoretical , Monte Carlo Method , Phylogeny , Population Dynamics , Probability , Time Factors
14.
Br J Haematol ; 119(4): 905-15, 2002 Dec.
Article in English | MEDLINE | ID: mdl-12472567

ABSTRACT

Mantle cell lymphoma (MCL) is a non-Hodgkin's lymphoma of B-cell lineage. The blastoid variant of MCL, characterized by high mitotic rate, is clinically more aggressive than common MCL. We used the cDNA array technology to examine the gene expression profiles of both blastoid variant and common MCL. The data was analysed by regression analysis, principal component analysis and the naive Bayes' classifier. Eight genes were identified as differentially deregulated between the two groups. Oncogenes CMYC, BCL2 and PIM1 were upregulated more frequently in the blastoid variant than in common MCL. This implied that the gp130-mediated signal transducer and activator of transcription 3 (STAT3) signalling pathway was involved in the blastoid variant transformation of MCL. Other differentially deregulated genes were TOP1, CD23, CD45, CD70 and NFATC. By using the eight differentially deregulated genes, we created a classifier to distinguish the blastoid variant from common MCL with high accuracy. We also identified 18 genes that were deregulated in both groups. Among them, BCL1, CALLA/CD10 and GRN were suggested to be oncogenes. The products of RGS1, RGS2, ANX2 and CD44H were suggested to promote tumour metastasis. CD66D was suggested to be a tumour suppressor gene.


Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Lymphoma, Mantle-Cell/genetics , Aged , Aged, 80 and over , Bayes Theorem , DNA, Complementary/genetics , DNA, Neoplasm/genetics , Female , Humans , Immunophenotyping , Lymphoma, Mantle-Cell/immunology , Male , Middle Aged , Principal Component Analysis , Regression Analysis , Reverse Transcriptase Polymerase Chain Reaction
15.
Bioinformatics ; 18 Suppl 2: S211-8, 2002.
Article in English | MEDLINE | ID: mdl-12386005

ABSTRACT

The existence of whole genome sequences makes it possible to search for global structure in the genome. We consider modeling the occurrence frequencies of discrete patterns (such as starting points of ORFs or other interesting phenomena) along the genome. We use piecewise constant intensity models with varying number of pieces, and show how a reversible jump Markov Chain Monte Carlo (RJMCMC) method can be used to obtain a posteriori distribution on the intensity of the patterns along the genome. We apply the method to modeling the occurrence of ORFs in the human genome. The results show that the chromosomes consist of 5-35 clearly distinct segments, and that the posteriori number and length of the segments shows significant variation. On the other hand, for the yeast genome the intensity of ORFs is nearly constant.


Subject(s)
Algorithms , Chromosome Mapping/methods , DNA Mutational Analysis/methods , Models, Genetic , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Computer Simulation , Genetic Variation/genetics , Markov Chains , Models, Statistical , Monte Carlo Method
SELECTION OF CITATIONS
SEARCH DETAIL
...