Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE Trans Neural Netw Learn Syst ; 30(11): 3326-3337, 2019 11.
Article in English | MEDLINE | ID: mdl-30951479

ABSTRACT

Long short-term memory (LSTM) networks have recently shown remarkable performance in several tasks that are dealing with natural language generation, such as image captioning or poetry composition. Yet, only few works have analyzed text generated by LSTMs in order to quantitatively evaluate to which extent such artificial texts resemble those generated by humans. We compared the statistical structure of LSTM-generated language to that of written natural language, and to those produced by Markov models of various orders. In particular, we characterized the statistical structure of language by assessing word-frequency statistics, long-range correlations, and entropy measures. Our main finding is that while both LSTM- and Markov-generated texts can exhibit features similar to real ones in their word-frequency statistics and entropy measures, LSTM-texts are shown to reproduce long-range correlations at scales comparable to those found in natural language. Moreover, for LSTM networks, a temperature-like parameter controlling the generation process shows an optimal value-for which the produced texts are closest to real language-consistent across different statistical features investigated.


Subject(s)
Markov Chains , Memory, Long-Term , Natural Language Processing , Neural Networks, Computer , Humans
2.
Sci Rep ; 8(1): 15817, 2018 10 25.
Article in English | MEDLINE | ID: mdl-30361485

ABSTRACT

Biologists have long sought a way to explain how statistical properties of genetic sequences emerged and are maintained through evolution. On the one hand, non-random structures at different scales indicate a complex genome organisation. On the other hand, single-strand symmetry has been scrutinised using neutral models in which correlations are not considered or irrelevant, contrary to empirical evidence. Different studies investigated these two statistical features separately, reaching minimal consensus despite sustained efforts. Here we unravel previously unknown symmetries in genetic sequences, which are organized hierarchically through scales in which non-random structures are known to be present. These observations are confirmed through the statistical analysis of the human genome and explained through a simple domain model. These results suggest that domain models which account for the cumulative action of mobile elements can explain simultaneously non-random structures and symmetries in genetic sequences.


Subject(s)
Base Sequence/genetics , Algorithms , Chromosomes, Human, Pair 1/genetics , Humans , Models, Genetic , Statistics as Topic
3.
Philos Trans A Math Phys Eng Sci ; 374(2063)2016 03 13.
Article in English | MEDLINE | ID: mdl-26857665

ABSTRACT

We perform a statistical study of the distances between successive occurrences of a given dinucleotide in the DNA sequence for a number of organisms of different complexity. Our analysis highlights peculiar features of the CG dinucleotide distribution in mammalian DNA, pointing towards a connection with the role of such dinucleotide in DNA methylation. While the CG distributions of mammals exhibit exponential tails with comparable parameters, the picture for the other organisms studied (e.g. fish, insects, bacteria and viruses) is more heterogeneous, possibly because in these organisms DNA methylation has different functional roles. Our analysis suggests that the distribution of the distances between CG dinucleotides provides useful insights into characterizing and classifying organisms in terms of methylation functionalities.


Subject(s)
DNA Methylation , Models, Genetic , Nucleotides/genetics , Animals , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...