Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters











Database
Publication year range
1.
Neural Netw ; 84: 28-38, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27639721

ABSTRACT

Most of machine learning approaches have stemmed from the application of minimizing the mean squared distance principle, based on the computationally efficient quadratic optimization methods. However, when faced with high-dimensional and noisy data, the quadratic error functionals demonstrated many weaknesses including high sensitivity to contaminating factors and dimensionality curse. Therefore, a lot of recent applications in machine learning exploited properties of non-quadratic error functionals based on L1 norm or even sub-linear potentials corresponding to quasinorms Lp (0

Subject(s)
Machine Learning , Models, Theoretical , Algorithms , Databases, Factual , Time Factors
2.
Comput Biol Med ; 75: 203-16, 2016 08 01.
Article in English | MEDLINE | ID: mdl-27318570

ABSTRACT

Handling of missed data is one of the main tasks in data preprocessing especially in large public service datasets. We have analysed data from the Trauma Audit and Research Network (TARN) database, the largest trauma database in Europe. For the analysis we used 165,559 trauma cases. Among them, there are 19,289 cases (11.35%) with unknown outcome. We have demonstrated that these outcomes are not missed 'completely at random' and, hence, it is impossible just to exclude these cases from analysis despite the large amount of available data. We have developed a system of non-stationary Markov models for the handling of missed outcomes and validated these models on the data of 15,437 patients which arrived into TARN hospitals later than 24h but within 30days from injury. We used these Markov models for the analysis of mortality. In particular, we corrected the observed fraction of death. Two naïve approaches give 7.20% (available case study) or 6.36% (if we assume that all unknown outcomes are 'alive'). The corrected value is 6.78%. Following the seminal paper of Trunkey (1983 [15]) the multimodality of mortality curves has become a much discussed idea. For the whole analysed TARN dataset the coefficient of mortality monotonically decreases in time but the stratified analysis of the mortality gives a different result: for lower severities the coefficient of mortality is a non-monotonic function of the time after injury and may have maxima at the second and third weeks. The approach developed here can be applied to various healthcare datasets which experience the problem of lost patients and missed outcomes.


Subject(s)
Databases, Factual , Electronic Data Processing/methods , Wounds and Injuries/mortality , Europe/epidemiology , Female , Humans , Male , Markov Chains
3.
Comput Biol Med ; 53: 279-90, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25194257

ABSTRACT

The canine lymphoma blood test detects the levels of two biomarkers, the acute phase proteins (C-Reactive Protein and Haptoglobin). This test can be used for diagnostics, for screening, and for remission monitoring as well. We analyze clinical data, test various machine learning methods and select the best approach to these oblems. Three families of methods, decision trees, kNN (including advanced and adaptive kNN) and probability density evaluation with radial basis functions, are used for classification and risk estimation. Several pre-processing approaches were implemented and compared. The best of them are used to create the diagnostic system. For the differential diagnosis the best solution gives the sensitivity and specificity of 83.5% and 77%, respectively (using three input features, CRP, Haptoglobin and standard clinical symptom). For the screening task, the decision tree method provides the best result, with sensitivity and specificity of 81.4% and >99%, respectively (using the same input features). If the clinical symptoms (Lymphadenopathy) are considered as unknown then a decision tree with CRP and Hapt only provides sensitivity 69% and specificity 83.5%. The lymphoma risk evaluation problem is formulated and solved. The best models are selected as the system for computational lymphoma diagnosis and evaluation of the risk of lymphoma as well. These methods are implemented into a special web-accessed software and are applied to the problem of monitoring dogs with lymphoma after treatment. It detects recurrence of lymphoma up to two months prior to the appearance of clinical signs. The risk map visualization provides a friendly tool for exploratory data analysis.


Subject(s)
Diagnosis, Computer-Assisted/methods , Lymphoma/diagnosis , Algorithms , Animals , Data Mining , Decision Trees , Dogs , Female , Lymphoma/epidemiology , Lymphoma/veterinary , Male , Risk Assessment , Sensitivity and Specificity
4.
Biofizika ; 38(5): 762-7, 1993.
Article in Russian | MEDLINE | ID: mdl-8241308

ABSTRACT

An approach to the study of the properties of genetic texts is proposed. It is based on the investigation of the frequencies of all possible words (subsequences) in a text. The most important effect is that the original text could be reconstructed completely without deletions and/or mistakes using the set of words which are met in the text as a single copy. The length of words for which the effect occurs is a measure of the text redundancy. Some real genetic sequences were studied as well.


Subject(s)
Sequence Analysis, DNA/statistics & numerical data , Humans , Models, Statistical
5.
Genetika ; 29(9): 1413-9, 1993 Sep.
Article in Russian | MEDLINE | ID: mdl-8276214

ABSTRACT

This paper is devoted to the comparative study of redundancy of genetic texts of various organisms and viruses. To determine the redundance of a gene, we have introduced the strict measure for that latter. The measure for a text's redundance is the length of restriction of Frequency/Correlation Dictionary of a given genetic text. Frequency/Correlation Dictionary is the ser of all subsequences belong to a given genetic text, accompanied by the frequencies of their occurrence. The restriction length is defined as that one, for which all the subsequences (of that length) are unique. We have found, that genes of human viruses are less redundant, in comparison to those of human genes. Other aspects of a comparative redundance investigations of the genes are discussed. The problem of the determination of "truet" intron could be treated by this methodology, as well, as the evolution of genome.


Subject(s)
Gene Frequency , Genes, Viral , Amino Acid Sequence , Base Sequence , Humans , Molecular Sequence Data , Restriction Mapping
SELECTION OF CITATIONS
SEARCH DETAIL