ABSTRACT
This article discusses the results in Boys and Henderson (2004, Biometrics 60, 573-581) in which the authors propose a new approach to the classification of genomic DNA into a number of hidden Markov states with a variable order of dependency, potentially allowing for the high-throughput detection of structure within genomic DNA. This article is likely to be an important point of departure for further modeling of this type. We question whether the genome of the bacteriophage lambda is the most appropriate example with which to demonstrate the method's effectiveness, whether it can be expected that the method will carry over to genomes where there is only one direction of transcription and no operon structure, and suggest a graphical display that seems to offer insight into the results. It would be interesting to see an analysis that uses the codon alphabet.
Subject(s)
Bayes Theorem , Sequence Analysis, DNA/statistics & numerical data , Algorithms , Bacteriophage lambda/genetics , Biometry , DNA, Viral/genetics , Genome, Viral , Markov Chains , Models, Genetic , Models, Statistical , Monte Carlo Method , Principal Component AnalysisABSTRACT
We describe an alternative method for scoring of the pairwise alignment of two biological sequences. Designed to overcome the bias due to the composition of the alignment, it measures the distance (in standard deviations) between the given alignment and the mean value of all other alignments that can be obtained by a permutation of either sequence. We demonstrate that the standard deviation can be calculated efficiently. By concentrating upon the ungapped case, the mean and standard deviation can be calculated exactly and in two steps, the first being O(N) time, where N is the length of the sequence, the second in a fixed number of calculations, i.e., in O(1) time. We argue that this statistic is a more consistent measure than a similarity score based upon a standard scoring matrix. Even in the ungapped case, the statistic proves in many cases to be more accurate than the commonly used (FASTA) (Pearson and Lipman, 1988) gapped Z-score in which the sequence is matched against a random sample of the database. We demonstrate the use of the POZ-score as a secondary filter which screens out several well-known types of false positive, reducing the amount of manual screening to be done by the biologist.
Subject(s)
Algorithms , Sequence Alignment/statistics & numerical data , Amino Acid Sequence , Computational Biology , Databases, Protein , Proteins/genetics , Sequence Homology, Amino AcidABSTRACT
The Kappa class of GSTs (glutathione transferases) comprises soluble enzymes originally isolated from the mitochondrial matrix of rats. We have characterized a Kappa class cDNA from human breast. The cDNA is derived from a single gene comprising eight exons and seven introns located on chromosome 7q34-35. Recombinant hGSTK1-1 was expressed in Escherichia coli as a homodimer (subunit molecular mass approximately 25.5 kDa). Significant glutathione-conjugating activity was found only with the model substrate CDNB (1-chloro-2,4-ditnitrobenzene). Hyperbolic kinetics were obtained for GSH (parameters: K(m)app, 3.3+/-0.95 mM; V(max)app, 21.4+/-1.8 micromol/min per mg of enzyme), while sigmoidal kinetics were obtained for CDNB (parameters: S0.5app, 1.5+/-1.0 mM; V(max)app, 40.3+/-0.3 micromol/min per mg of enzyme; Hill coefficient, 1.3), reflecting low affinities for both substrates. Sequence analyses, homology modelling and secondary structure predictions show that hGSTK1 has (a) most similarity to bacterial HCCA (2-hydroxychromene-2-carboxylate) isomerases and (b) a predicted C-terminal domain structure that is almost identical to that of bacterial disulphide-bond-forming DsbA oxidoreductase (root mean square deviation 0.5-0.6 A). The structures of hGSTK1 and HCCA isomerase are predicted to possess a thioredoxin fold with a polyhelical domain (alpha(x)) embedded between the beta-strands (betaalphabetaalpha(x)betabetaalpha, where the underlined elements represent the N and C motifs of the thioredoxin fold), as occurs in the bacterial disulphide-bond-forming oxidoreductases. This is in contrast with the cytosolic GSTs, where the helical domain occurs exclusively at the C-terminus (betaalphabetaalphabetabetaalphaalpha(x)). Although hGSTK1-1 catalyses some typical GST reactions, we propose that it is structurally distinct from other classes of cytosolic GSTs. The present study suggests that the Kappa class may have arisen in prokaryotes well before the divergence of the cytosolic GSTs.