Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Cytometry A ; 89(1): 44-58, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26097104

ABSTRACT

Many methods have been described for automated clustering analysis of complex flow cytometry data, but so far the goal to efficiently estimate multivariate densities and their modes for a moderate number of dimensions and potentially millions of data points has not been attained. We have devised a novel approach to describing modes using second order polynomial histogram estimators (SOPHE). The method divides the data into multivariate bins and determines the shape of the data in each bin based on second order polynomials, which is an efficient computation. These calculations yield local maxima and allow joining of adjacent bins to identify clusters. The use of second order polynomials also optimally uses wide bins, such that in most cases each parameter (dimension) need only be divided into 4-8 bins, again reducing computational load. We have validated this method using defined mixtures of up to 17 fluorescent beads in 16 dimensions, correctly identifying all populations in data files of 100,000 beads in <10 s, on a standard laptop. The method also correctly clustered granulocytes, lymphocytes, including standard T, B, and NK cell subsets, and monocytes in 9-color stained peripheral blood, within seconds. SOPHE successfully clustered up to 36 subsets of memory CD4 T cells using differentiation and trafficking markers, in 14-color flow analysis, and up to 65 subpopulations of PBMC in 33-dimensional CyTOF data, showing its usefulness in discovery research. SOPHE has the potential to greatly increase efficiency of analysing complex mixtures of cells in higher dimensions.


Subject(s)
Cluster Analysis , Computational Biology/methods , Flow Cytometry/methods , Adult , Algorithms , B-Lymphocytes/cytology , Biomarkers/analysis , Data Interpretation, Statistical , Electronic Data Processing/methods , Granulocytes/cytology , Humans , Killer Cells, Natural/cytology , T-Lymphocyte Subsets/cytology
2.
Stat Appl Genet Mol Biol ; 11(1): Article 3, 2012.
Article in English | MEDLINE | ID: mdl-22624182

ABSTRACT

The D(2) statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as the variability in D(2) may be dominated by the terms that reflect the noise in each of the single sequences only. We examine the extent of the problem and the effectiveness of overcoming it by using two mean-centred variants of this statistic, D(2)* and D(2c). We conclude that all three statistics are potentially useful measures of sequence similarity, for which reasonably accurate p-values can be estimated under a null hypothesis of sequences composed of identically and independently distributed letters. We show that D(2) and D(2)c, and to a somewhat lesser extent D(2)*, perform well in tests to classify moderate length query sequences as putative cis-regulatory modules.


Subject(s)
Sequence Alignment , Sequence Analysis, DNA/methods , Base Sequence , Databases, Factual , Sequence Analysis, DNA/statistics & numerical data
3.
J Theor Biol ; 262(3): 383-90, 2010 Feb 07.
Article in English | MEDLINE | ID: mdl-19854205

ABSTRACT

We derive a new continuous free energy formula for protein folding. We obtain the formula first by adding hydrophobic effect to a classical free energy formula for cavities in water. We then obtain the same formula by geometrically pursuing the structure that fits best the well-known global geometric features of native structures of globular proteins: 1. high density; 2. small surface area; 3. hydrophobic core; 4. forming domains for long polypeptide chains. Conformations of a protein are presented as an all atom CPK model P= union or logical sum(i=1)(N)B(x(i),r(i)) where each atom is a ball B(x(i),r(i)). All conformations satisfy generally defined steric conditions. For each conformation P of a globular protein, there is a closed thermodynamic system Omega(P) supersetP bounded by the molecular surface M(P). Both methods derive the same free energy aV(P)+bA(P)+cW(P), where a,b,c>0, V(P), A(P), and W(P) are volume of Omega(P), area of M(P), and area of the hydrophobic surface W(P) subsetM(P), which quantifies hydrophobic effect. Minimizing W(P) is sufficient to produce statistically significant native like secondary structures and hydrogen bonds in the proteins we simulated.


Subject(s)
Protein Folding , Proteins/chemistry , Hydrophobic and Hydrophilic Interactions , Models, Chemical , Molecular Dynamics Simulation , Software , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...