Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
J Assoc Inf Sci Technol ; 66(9): 1847-1856, 2015 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-26478903

RESUMO

We analyze access statistics of 150 blog entries and news articles for periods of up to 3 years. Access rate falls as an inverse power of time passed since publication. The power law holds for periods of up to 1,000 days. The exponents are different for different blogs and are distributed between 0.6 and 3.2. We argue that the decay of attention to a web article is caused by the link to it first dropping down the list of links on the website's front page and then disappearing from the front page and its subsequent movement further into background. The other proposed explanations that use a decaying with time novelty factor, or some intricate theory of human dynamics, cannot explain all of the experimental observations.

2.
Phys Rev Lett ; 106(17): 176801, 2011 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-21635055

RESUMO

The error rate in complementary transistor circuits is suppressed exponentially in electron number, arising from an intrinsic physical implementation of fault-tolerant error correction. Contrariwise, explicit assembly of gates into the most efficient known fault-tolerant architecture is characterized by a subexponential suppression of error rate with electron number, and incurs significant overhead in wiring and complexity. We conclude that it is more efficient to prevent logical errors with physical fault tolerance than to correct logical errors with fault-tolerant architecture.

3.
Proc Natl Acad Sci U S A ; 105(37): 13724-9, 2008 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-18779560

RESUMO

We use sequential large-scale crawl data to empirically investigate and validate the dynamics that underlie the evolution of the structure of the web. We find that the overall structure of the web is defined by an intricate interplay between experience or entitlement of the pages (as measured by the number of inbound hyperlinks a page already has), inherent talent or fitness of the pages (as measured by the likelihood that someone visiting the page would give a hyperlink to it), and the continual high rates of birth and death of pages on the web. We find that the web is conservative in judging talent and the overall fitness distribution is exponential, showing low variability. The small variance in talent, however, is enough to lead to experience distributions with high variance: The preferential attachment mechanism amplifies these small biases and leads to heavy-tailed power-law (PL) inbound degree distributions over all pages, as well as over pages that are of the same age. The balancing act between experience and talent on the web allows newly introduced pages with novel and interesting content to grow quickly and surpass older pages. In this regard, it is much like what we observe in high-mobility and meritocratic societies: People with entitlement continue to have access to the best resources, but there is just enough screening for fitness that allows for talented winners to emerge and join the ranks of the leaders. Finally, we show that the fitness estimates have potential practical applications in ranking query results.


Assuntos
Informática/métodos , Internet/tendências , Computadores
4.
Artigo em Inglês | MEDLINE | ID: mdl-18245872

RESUMO

In this article, we introduce an exploratory framework for learning patterns of conditional co-expression in gene expression data. The main idea behind the proposed approach consists of estimating how the information content shared by a set of M nodes in a network (where each node is associated to an expression profile) varies upon conditioning on a set of L conditioning variables (in the simplest case represented by a separate set of expression profiles). The method is non-parametric and it is based on the concept of statistical co-information, which, unlike conventional correlation based techniques, is not restricted in scope to linear conditional dependency patterns. Moreover, such conditional co-expression relationships can potentially indicate regulatory interactions that do not manifest themselves when only pair-wise relationships are considered. A moment based approximation of the co-information measure is derived that efficiently gets around the problem of estimating high-dimensional multi-variate probability density functions from the data, a task usually not viable due to the intrinsic sample size limitations that characterize expression level measurements. By applying the proposed exploratory method, we analyzed a whole genome microarray assay of the eukaryote Saccharomices cerevisiae and were able to learn statistically significant patterns of conditional co-expression. A selection of such interactions that carry a meaningful biological interpretation are discussed.


Assuntos
Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Proteínas de Saccharomyces cerevisiae/genética , Algoritmos , Inteligência Artificial , Biologia Computacional/métodos , Regulação Fúngica da Expressão Gênica , Internet , Software , Estatísticas não Paramétricas
5.
Phys Rev E Stat Nonlin Soft Matter Phys ; 73(1 Pt 2): 016117, 2006 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-16486226

RESUMO

Due to the ubiquity of time series with long-range correlation in many areas of science and engineering, analysis and modeling of such data is an important problem. While the field seems to be mature, three major issues have not been satisfactorily resolved. (i) Many methods have been proposed to assess long-range correlation in time series. Under what circumstances do they yield consistent results? (ii) The mathematical theory of long-range correlation concerns the behavior of the correlation of the time series for very large times. A measured time series is finite, however. How can we relate the fractal scaling break at a specific time scale to important parameters of the data? (iii) An important technique in assessing long-range correlation in a time series is to construct a random walk process from the data, under the assumption that the data are like a stationary noise process. Due to the difficulty in determining whether a time series is stationary or not, however, one cannot be 100% sure whether the data should be treated as a noise or a random walk process. Is there any penalty if the data are interpreted as a noise process while in fact they are a random walk process, and vice versa? In this paper, we seek to gain important insights into these issues by examining three model systems, the autoregressive process of order 1, on-off intermittency, and Lévy motions, and considering an important engineering problem, target detection within sea-clutter radar returns. We also provide a few rules of thumb to safeguard against misinterpretations of long-range correlation in a time series, and discuss relevance of this study to pattern recognition.

6.
Phys Rev E Stat Nonlin Soft Matter Phys ; 71(4 Pt 2): 046133, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15903752

RESUMO

The maximum entropy principle from statistical mechanics states that a closed system attains an equilibrium distribution that maximizes its entropy. We first show that for graphs with fixed number of edges one can define a stochastic edge dynamic that can serve as an effective thermalization scheme, and hence, the underlying graphs are expected to attain their maximum-entropy states, which turn out to be Erdös-Rényi (ER) random graphs. We next show that (i) a rate-equation-based analysis of node degree distribution does indeed confirm the maximum-entropy principle, and (ii) the edge dynamic can be effectively implemented using short random walks on the underlying graphs, leading to a local algorithm for the generation of ER random graphs. The resulting statistical mechanical system can be adapted to provide a distributed and local (i.e., without any centralized monitoring) mechanism for load balancing, which can have a significant impact in increasing the efficiency and utilization of both the Internet (e.g., efficient web mirroring), and large-scale computing infrastructure (e.g., cluster and grid computing).

7.
Artigo em Inglês | MEDLINE | ID: mdl-17044167

RESUMO

The authors recently introduced a framework, named Network Component Analysis (NCA), for the reconstruction of the dynamics of transcriptional regulators' activities from gene expression assays. The original formulation had certain shortcomings that limited NCA's application to a wide class of network dynamics reconstruction problems, either because of limitations in the sample size or because of the stringent requirements imposed by the set of identifiability conditions. In addition, the performance characteristics of the method for various levels of data noise or in the presence of model inaccuracies were never investigated. In this article, the following aspects of NCA have been addressed, resulting in a set of extensions to the original framework: 1) The sufficient conditions on the a priori connectivity information (required for successful reconstructions via NCA) are made less stringent, allowing easier verification of whether a network topology is identifiable, as well as extending the class of identifiable systems. Such a result is accomplished by introducing a set of identifiability requirements that can be directly tested on the regulatory architecture, rather than on specific instances of the system matrix. 2) The two-stage least square iterative procedure used in NCA is proven to identify stationary points of the likelihood function, under Gaussian noise assumption, thus reinforcing the statistical foundations of the method. 3) A framework for the simultaneous reconstruction of multiple regulatory subnetworks is introduced, thus overcoming one of the critical limitations of the original formulation of the decomposition, for example, occurring for poorly sampled data (typical of microarray experiments). A set of monte carlo simulations we conducted with synthetic data suggests that the approach is indeed capable of accurately reconstructing regulatory signals when these are the input of large-scale networks that satisfy the suggested identifiability criteria, even under fairly noisy conditions. The sensitivity of the reconstructed signals to inaccuracies in the hypothesized network topology is also investigated. We demonstrate the feasibility of our approach for the simultaneous reconstruction of multiple regulatory subnetworks from the same data set with a successful application of the technique to gene expression measurements of the bacterium Escherichia coli.


Assuntos
Regulação Bacteriana da Expressão Gênica , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Transcrição/metabolismo , Algoritmos , Biologia Computacional , Simulação por Computador , Escherichia coli/genética , Perfilação da Expressão Gênica , Método de Monte Carlo , Transcrição Gênica
8.
IEEE Trans Neural Netw ; 15(1): 55-65, 2004 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15387247

RESUMO

In this paper, we introduce a novel independent component analysis (ICA) algorithm, which is truly blind to the particular underlying distribution of the mixed signals. Using a nonparametric kernel density estimation technique, the algorithm performs simultaneously the estimation of the unknown probability density functions of the source signals and the estimation of the unmixing matrix. Following the proposed approach, the blind signal separation framework can be posed as a nonlinear optimization problem, where a closed form expression of the cost function is available, and only the elements of the unmixing matrix appear as unknowns. We conducted a series of Monte Carlo simulations, involving linear mixtures of various source signals with different statistical characteristics and sample sizes. The new algorithm not only consistently outperformed all state-of-the-art ICA methods, but also demonstrated the following properties: 1) Only a flexible model, capable of learning the source statistics, can consistently achieve an accurate separation of all the mixed signals. 2) Adopting a suitably designed optimization framework, it is possible to derive a flexible ICA algorithm that matches the stability and convergence properties of conventional algorithms. 3) A nonparametric approach does not necessarily require large sample sizes in order to outperform methods with fixed or partially adaptive contrast functions.


Assuntos
Análise de Componente Principal/métodos , Estatísticas não Paramétricas , Algoritmos
9.
Proc Natl Acad Sci U S A ; 100(26): 15522-7, 2003 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-14673099

RESUMO

High-dimensional data sets generated by high-throughput technologies, such as DNA microarray, are often the outputs of complex networked systems driven by hidden regulatory signals. Traditional statistical methods for computing low-dimensional or hidden representations of these data sets, such as principal component analysis and independent component analysis, ignore the underlying network structures and provide decompositions based purely on a priori statistical constraints on the computed component signals. The resulting decomposition thus provides a phenomenological model for the observed data and does not necessarily contain physically or biologically meaningful signals. Here, we develop a method, called network component analysis, for uncovering hidden regulatory signals from outputs of networked systems, when only a partial knowledge of the underlying network topology is available. The a priori network structure information is first tested for compliance with a set of identifiability criteria. For networks that satisfy the criteria, the signals from the regulatory nodes and their strengths of influence on each output node can be faithfully reconstructed. This method is first validated experimentally by using the absorbance spectra of a network of various hemoglobin species. The method is then applied to microarray data generated from yeast Saccharamyces cerevisiae and the activities of various transcription factors during cell cycle are reconstructed by using recently discovered connectivity information for the underlying transcriptional regulatory networks.


Assuntos
Biologia/métodos , Processamento de Imagem Assistida por Computador , Modelos Genéticos , Rede Nervosa , Regulação da Expressão Gênica , Hemoglobinas/química , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reprodutibilidade dos Testes , Transdução de Sinais
10.
Neural Netw ; 10(4): 705-720, 1997 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12662865

RESUMO

We developed a new method to relate the choice of system parameters to the outcomes of the unsupervised learning process in Linsker's multi-layer network model. The behavior of this model is determined by the underlying nonlinear dynamics that are parameterized by a set of parameters originating from the Hebb rule and the arbor density of the synapses. These parameters determine the presence or absence of a specific receptive field (or connection pattern) as a saturated fixed point attractor of the model. We derived a necessary and sufficient condition to test whether a given saturated weight vector is stable or not for any given set of system parameters, and used this condition to determine the whole regime in the parameter space over which the given connection pattern is stable. The parameter space approach allows us to investigate the relative stability of the major receptive fields reported in Linsker's simulation, and to demonstrate the crucial role played by the localized arbor density of synapses between adjacent layers. The method presented here can be employed to analyze other learning and retrieval models that use the limiter function as the constraint controlling the magnitude of the weight or state vectors. Copyright 1997 Elsevier Science Ltd.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...