ABSTRACT
In this paper, we compare the performance of two iterative clustering methods when applied to an extensive data set describing strains of the bacterial family Enterobacteriaceae. In both methods, the classification (i.e. the number of classes and the partitioning) is determined by minimizing stochastic complexity. The first method performs the minimization by repeated application of the generalized Lloyd algorithm (GLA). The second method uses an optimization technique known as local search (LS). The method modifies the current solution by making global changes to the class structure and it, then, performs local fine-tuning to find a local optimum. It is observed that if we fix the number of classes, the LS finds a classification with a lower stochastic complexity value than GLA. In addition, the variance of the solutions is much smaller for the LS due to its more systematic method of searching. Overall, the two algorithms produce similar classifications but they merge certain natural classes with microbiological relevance in different ways.
Subject(s)
Algorithms , Bacteria/classification , Cluster Analysis , Enterobacteriaceae/classification , Stochastic ProcessesABSTRACT
In this paper we give a mathematically precise formulation of an old idea in bacterial taxonomy, namely cumulative classification, where the taxonomy is continuously updated and possibly augmented as new strains are identified. Our formulation is based on Bayesian predictive probability distributions. The criterion for founding a new taxon is given a firm theoretical foundation based on prediction and it is given a clear-cut interpretation. We formulate an algorithm for cumulative classification and apply it to a large database of bacteria belonging to the family Enterobacteriaceae. The resulting taxonomy makes microbiological sense.
Subject(s)
Bacteria/classification , Bayes Theorem , Databases, Factual , Enterobacteriaceae/classification , Models, StatisticalABSTRACT
In this paper we propose a method of constructing a hierarchical classification based on the notion of stochastic complexity. Minimization of stochastic complexity amounts to maximization of the information content of the classification. A dendrogram is obtained by first finding the classification which minimizes stochastic complexity and then by step-wise merging of groups such that at each step there is a minimum loss of information. The method was applied to a database containing 5313 strains of Enterobacteriaceae. The results are in reasonable accordance with present-day views on the taxonomy of Enterobacteriaceae.
Subject(s)
Enterobacteriaceae/classification , Stochastic ProcessesABSTRACT
A new method for classifying bacteria is presented and applied to a large set of biochemical data for the Enterobacteriaceae. The method minimizes the bits needed to encode the classes and the items or, equivalently, maximizes the information content of the classification. The resulting taxonomy of Enterobacteriaceae corresponds well to the general structure of earlier classifications. Minimization of stochastic complexity can be considered as a useful tool to create bacterial classifications that are optimal from the point of view of information theory.
ABSTRACT
The bacterial species concept has different bearings. It is used to define "natural" entities with low intra-group variation, but also to serve more subjective purposes. One of problems in Streptomyces taxonomy is that it applies the species concept in both ways, i.e. both to clarify natural relationships and to protect potential (bio)technological inventions. The latter usage has introduced a streptomycetal "technospecies" which may require definition and description in other terms and by other tools than the "nomentaxospecies" which represent a more objective approach to Streptomyces taxonomy. Genetic engineering creates "man-made" microorganisms which are characterized by completely different sets of criteria as compared to their natural counterparts, which may imply needs for different taxonomies for both kinds of organisms. However, since they may occur side by side in one environment both "man-made" and "natural" streptomycetes have to be identified and classified by the same methods and tools, but in such a way which allows their separation.
Subject(s)
Streptomyces/classificationABSTRACT
The reported work describes experiments on automatic identification based on a material of 636 clinical isolates of bacteria. It is shown that three different and mutually independent automatic identification principles produce almost identical identification conclusions. It is also shown that an automatic procedure which continuously corrects the basic classification gives results which are independent of the classification background. Automatic correction logics develop conventionally-based, as well as numerically-based initial reference systems toward almost similar solutions. This may indicate that automated identification methods based on numerical classifications possess general validity.