Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Neural Comput ; 13(12): 2681-708, 2001 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-11705407

RESUMO

The detection of a specific stochastic pattern embedded in an unknown background noise is a difficult pattern recognition problem, encountered in many applications such as word spotting in speech. A similar problem emerges when trying to detect a multineural spike pattern in a single electrical recording, embedded in the complex cortical activity of a behaving animal. Solving this problem is crucial for the identification of neuronal code words with specific meaning. The technical difficulty of this detection is due to the lack of a good statistical model for the background activity, which rapidly changes with the recording conditions and activity of the animal. This work introduces the use of an adversary background model. This model assumes that the background "knows" the pattern sought, up to a first-order statistics, and this "knowledge" creates a background composed of all the permutations of our pattern. We show that this background model is tightly connected to the type-based information-theoretic approach. Furthermore, we show that computing the likelihood ratio is actually decomposing the log-likelihood distribution according to types of the empirical counts. We demonstrate the application of this method for detection of the reward patterns in the basal ganglia of behaving monkeys, yielding some unexpected biological results.


Assuntos
Algoritmos , Gânglios da Base/fisiologia , Modelos Neurológicos , Reconhecimento Automatizado de Padrão , Recompensa , Potenciais de Ação , Animais , Chlorocebus aethiops , Funções Verossimilhança , Distribuição de Poisson , Processos Estocásticos
2.
Neural Comput ; 13(11): 2409-63, 2001 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-11674845

RESUMO

We define predictive information I(pred)(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T:I(pred)(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then I(pred)(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of I(pred)(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.


Assuntos
Teoria da Informação , Aprendizagem/fisiologia , Modelos Psicológicos , Previsões , Humanos
3.
Bioinformatics ; 17(10): 927-34, 2001 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-11673237

RESUMO

MOTIVATION: Characterization of a protein family by its distinct sequence domains is crucial for functional annotation and correct classification of newly discovered proteins. Conventional Multiple Sequence Alignment (MSA) based methods find difficulties when faced with heterogeneous groups of proteins. However, even many families of proteins that do share a common domain contain instances of several other domains, without any common underlying linear ordering. Ignoring this modularity may lead to poor or even false classification results. An automated method that can analyze a group of proteins into the sequence domains it contains is therefore highly desirable. RESULTS: We apply a novel method to the problem of protein domain detection. The method takes as input an unaligned group of protein sequences. It segments them and clusters the segments into groups sharing the same underlying statistics. A Variable Memory Markov (VMM) model is built using a Prediction Suffix Tree (PST) data structure for each group of segments. Refinement is achieved by letting the PSTs compete over the segments, and a deterministic annealing framework infers the number of underlying PST models while avoiding many inferior solutions. We show that regions of similar statistics correlate well with protein sequence domains, by matching a unique signature to each domain. This is done in a fully automated manner, and does not require or attempt an MSA. Several representative cases are analyzed. We identify a protein fusion event, refine an HMM superfamily classification into the underlying families the HMM cannot separate, and detect all 12 instances of a short domain in a group of 396 sequences. CONTACT: jill@cs.huji.ac.il; tishby@cs.huji.ac.il.


Assuntos
Mapeamento de Peptídeos/estatística & dados numéricos , Proteínas/genética , Algoritmos , Biologia Computacional , DNA Topoisomerases Tipo II/genética , Glutationa Transferase/genética , Proteínas de Homeodomínio/genética , Cadeias de Markov , Estrutura Terciária de Proteína , Alinhamento de Sequência/estatística & dados numéricos , Fatores de Transcrição/genética
4.
Artigo em Inglês | MEDLINE | ID: mdl-9783227

RESUMO

We investigate the space of all protein sequences. We combine the standard measures of similarity (SW, FASTA, BLAST), to associate with each sequence an exhaustive list of neighboring sequences. These lists induce a (weighted directed) graph whose vertices are the sequences. The weight of an edge connecting two sequences represents their degree of similarity. This graph encodes much of the fundamental properties of the sequence space. We look for clusters of related proteins in this graph. These clusters correspond to strongly connected sets of vertices. Two main ideas underlie our work: i) Interesting homologies among proteins can be deduced by transitivity. ii) Transitivity should be applied restrictively in order to prevent unrelated proteins from clustering together. Our analysis starts from a very conservative classification, based on very significant similarities, that has many classes. Subsequently, classes are merged to include less significant similarities. Merging is performed via a novel two phase algorithm. First, the algorithm identifies groups of possibly related clusters (based on transitivity and strong connectivity) using local considerations, and merges them. Then, a global test is applied to identify nuclei of strong relationships within these groups of clusters, and the classification is refined accordingly. This process takes place at varying thresholds of statistical significance, where at each step the algorithm is applied on the classes of the previous classification, to obtain the next one, at the more permissive threshold. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the space of all protein sequences into well defined groups of proteins. The results show that the automatically induced sets of proteins are closely correlated with natural biological families and super families. The hierarchical organization reveals finer sub-families that make up known families of proteins as well as many interesting relations between protein families. The hierarchical organization proposed may be considered as the first map of the space of all protein sequences. An interactive web site including the results of our analysis has been constructed, and is now accessible through http:/(/)www.protomap.cs.huji.ac.il


Assuntos
Proteínas/classificação , Proteínas/genética , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Bases de Dados Factuais , Proteínas/química , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
5.
J Mol Biol ; 268(2): 539-56, 1997 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-9159489

RESUMO

A global classification of all currently known protein sequences is performed. Every protein sequence is partitioned into segments of 50 amino acid residues and a dynamic programming distance is calculated between each pair of segments. This space of segments is initially embedded into Euclidean space. The algorithm that we apply embeds every finite metric space into Euclidean space so that (1) the dimension of the host space is small, (2) the metric distortion is small. A novel self-organized, cross-validated clustering algorithm is then applied to the embedded space with Euclidean distances. We monitor the validity of our clustering by randomly splitting the data into two parts and performing an hierarchical clustering algorithm independently on each part. At every level of the hierarchy we cross-validate the clusters in one part with the clusters in the other. The resulting hierarchical tree of clusters offers a new representation of protein sequences and families, which compares favorably with the most updated classifications based on functional and structural data about proteins. Some of the known families clustered into well distinct clusters. Motifs and domains such as the zinc finger, EF hand, homeobox, EGF-like and others are automatically correctly identified, and relations between protein families are revealed by examining the splits along the tree. This clustering leads to a novel representation of protein families, from which functional biological kinship of protein families can be deduced, as demonstrated for the transporter family. Finally, we introduce a new concise representation for complete proteins that is very useful in presenting multiple alignments, and in searching for close relatives in the database. The self-organization method presented is very general and applies to any data with a consistent and computable measure of similarity between data items.


Assuntos
Análise por Conglomerados , Proteínas/classificação , Análise de Sequência/métodos , Sequência de Aminoácidos , Animais , Proteínas de Ligação a DNA/classificação , Hemeproteínas/classificação , Proteínas de Homeodomínio/classificação , Humanos , Metaloproteínas/classificação , Dados de Sequência Molecular , Dedos de Zinco
6.
Proc Natl Acad Sci U S A ; 92(19): 8616-20, 1995 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-7567985

RESUMO

Parallel recordings of spike trains of several single cortical neurons in behaving monkeys were analyzed as a hidden Markov process. The parallel spike trains were considered as a multivariate Poisson process whose vector firing rates change with time. As a consequence of this approach, the complete recording can be segmented into a sequence of a few statistically discriminated hidden states, whose dynamics are modeled as a first-order Markov chain. The biological validity and benefits of this approach were examined in several independent ways: (i) the statistical consistency of the segmentation and its correspondence to the behavior of the animals; (ii) direct measurement of the collective flips of activity, obtained by the model; and (iii) the relation between the segmentation and the pair-wise short-term cross-correlations between the recorded spike trains. Comparison with surrogate data was also carried out for each of the above examinations to assure their significance. Our results indicated the existence of well-separated states of activity, within which the firing rates were approximately stationary. With our present data we could reliably discriminate six to eight such states. The transitions between states were fast and were associated with concomitant changes of firing rates of several neurons. Different behavioral modes and stimuli were consistently reflected by different states of neural activity. Moreover, the pair-wise correlations between neurons varied considerably between the different states, supporting the hypothesis that these distinct states were brought about by the cooperative action of many neurons.


Assuntos
Córtex Cerebral/fisiologia , Modelos Neurológicos , Rede Nervosa , Animais , Comportamento Animal/fisiologia , Córtex Cerebral/citologia , Haplorrinos , Atividade Nervosa Superior , Cadeias de Markov , Neurônios/fisiologia
7.
Biol Cybern ; 71(3): 227-37, 1994.
Artigo em Inglês | MEDLINE | ID: mdl-7918801

RESUMO

A model-based approach to on-line cursive handwriting analysis and recognition is presented and evaluated. In this model, on-line handwriting is considered as a modulation of a simple cycloidal pen motion, described by two coupled oscillations with a constant linear drift along the line of the writing. By slow modulations of the amplitudes and phase lags of the two oscillators, a general pen trajectory can be efficiently encoded. These parameters are then quantized into a small number of values without altering the writing intelligibility. A general procedure for the estimation and quantization of these cycloidal motion parameters for arbitrary handwriting is presented. The result is a discrete motor control representation of the continuous pen motion, via the quantized levels of the model parameters. This motor control representation enables successful word spotting and matching of cursive scripts. Our experiments clearly indicate the potential of this dynamic representation for complete cursive handwriting recognition.


Assuntos
Simulação por Computador , Escrita Manual , Modelos Neurológicos , Reconhecimento Automatizado de Padrão , Humanos , Movimento (Física) , Atividade Motora/fisiologia , Fatores de Tempo
8.
Phys Rev A ; 45(8): 6056-6091, 1992 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-9907706
9.
Phys Rev Lett ; 65(13): 1683-1686, 1990 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-10042332
11.
Phys Rev A Gen Phys ; 36(10): 4957-4967, 1987 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-9898755
12.
Phys Rev Lett ; 58(6): 527-530, 1987 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-10034964
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...