Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 11 de 11
Filter
Add more filters










Publication year range
1.
J Biomed Inform ; 34(1): 28-36, 2001 Feb.
Article in English | MEDLINE | ID: mdl-11376540

ABSTRACT

We analyze the discriminatory power of k-nearest neighbors, logistic regression, artificial neural networks (ANNs), decision tress, and support vector machines (SVMs) on the task of classifying pigmented skin lesions as common nevi, dysplastic nevi, or melanoma. Three different classification tasks were used as benchmarks: the dichotomous problem of distinguishing common nevi from dysplastic nevi and melanoma, the dichotomous problem of distinguishing melanoma from common and dysplastic nevi, and the trichotomous problem of correctly distinguishing all three classes. Using ROC analysis to measure the discriminatory power of the methods shows that excellent results for specific classification problems in the domain of pigmented skin lesions can be achieved with machine-learning methods. On both dichotomous and trichotomous tasks, logistic regression, ANNs, and SVMs performed on about the same level, with k-nearest neighbors and decision trees performing worse.


Subject(s)
Algorithms , Diagnosis, Computer-Assisted , Skin Diseases/diagnosis , Decision Trees , Humans , Logistic Models , Melanoma/diagnosis , Neural Networks, Computer , Nevus/diagnosis , Nevus, Pigmented/diagnosis , Skin Diseases/classification , Skin Neoplasms/diagnosis , Skin Pigmentation
2.
Methods Inf Med ; 40(1): 32-8, 2001 Mar.
Article in English | MEDLINE | ID: mdl-11310157

ABSTRACT

Constructing and updating prognostic models that learn from training cases is a time-consuming task. The more compact, and yet informative, the training sets are, the faster one can build and properly evaluate such models. We have compared different regression diagnostic methods for selection and removal of training cases in prognostic models. Univariate determinations were performed using classical regression diagnostic statistics. Multivariate determinations were performed using (1) a sequential "backward" selection of cases, and (2) a non-sequential genetic algorithm. The genetic algorithm produced final models that kept few cases and retained predictive capability. A genetic algorithm approach to case selection may be better suited for guiding removal of cases in training sets than a univariate or a sequential multivariate approach, possibly because of its ability to detect sets of cases that are influential en bloc but may not be sufficiently influential when considered in isolation.


Subject(s)
Artificial Intelligence , Models, Statistical , Prognosis , Algorithms , Humans , Models, Genetic , Myocardial Infarction/diagnosis , Wounds and Injuries/diagnosis
3.
Proc AMIA Symp ; : 144-8, 2001.
Article in English | MEDLINE | ID: mdl-11825171

ABSTRACT

Privacy protection is an important consideration when releasing medical databases to the research community. We show that while recent advances in anonymization algorithms provide increased levels of protection, it is still possible to calculate approximations to the original data set. In some cases, one can even uniquely reconstruct entries in a table before anonymization. In this paper, we demonstrate how knowledge of an anonymization algorithm based on ambiguating data cell entries can be used to undo the anonymization process. We investigate the effect of this algorithm and its reversal on data sets of varying sizes and distributions. It is shown that by using a computationally complex disambiguation process, information on individuals can be extracted from an anonymized data set.


Subject(s)
Algorithms , Medical Records Systems, Computerized/organization & administration , Privacy , Adult , Confidentiality , Demography , Female , Humans , Male , Middle Aged
4.
Proc AMIA Symp ; : 503-7, 2001.
Article in English | MEDLINE | ID: mdl-11825239

ABSTRACT

Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings.


Subject(s)
Confidentiality , Medical Records Systems, Computerized , Algorithms , Humans , Logistic Models , ROC Curve , Statistics as Topic
5.
Proc AMIA Symp ; : 726-30, 2001.
Article in English | MEDLINE | ID: mdl-11825281

ABSTRACT

Joining relational data can jeopardize patient confidentiality if disseminated data for research can be joined with publicly available data containing, for example, explicit identifiers. Ambiguity in data hinders the construction of primary keys that are of importance when joining data tables. We define two values to be indiscernible if they are the same or at least one of them is a special value. Two rows in a data table are indiscernible if their corresponding entries are indiscernible. We further define a table to be k-ambiguous if each row is indiscernible from at least k rows in the same table. We present two simple heuristics to make a table k-ambiguous by cell suppression, and compare them on example data.


Subject(s)
Confidentiality , Medical Records Systems, Computerized , Algorithms
6.
Proc AMIA Symp ; : 305-9, 2000.
Article in English | MEDLINE | ID: mdl-11079894

ABSTRACT

Data mining methods used a racially diverse sample (n = 19,970) of pregnant women and 1,622 variables that were collected in Duke's TMR electronic patient record over a 10-year period. Different statistical and data mining methods were similar when compared using receiver operating characteristic (ROC) curves. Best results found that seven demographic variables yielded .72 and addition of hundreds of other clinical variables added only .03 to the area under the curve (AUC). Similar results across methods suggest that results were data-driven and not method-dependent, and that demographic variables may offer a small set of parsimonious variables with predictive accuracy in a racially diverse population. Work to determine relevant variables for improved predictive accuracy is ongoing.


Subject(s)
Decision Support Techniques , Infant, Premature , Risk Assessment/methods , Area Under Curve , Artificial Intelligence , Female , Humans , Infant, Newborn , Information Storage and Retrieval , Logistic Models , Neural Networks, Computer , Obstetric Labor, Premature/prevention & control , Pregnancy , ROC Curve , Statistics as Topic
7.
Proc AMIA Symp ; : 384-8, 2000.
Article in English | MEDLINE | ID: mdl-11079910

ABSTRACT

With the advent of the cDNA microarray and oligonucleotide array technologies it has become possible to study a large number of genes in a single experiment. While experiments with thousands of genes are routinely performed, searching for literature about several genes by traditional methods is time consuming and error-prone. In addition to the inherent limitations of free text search, use of the conventional Boolean operators often result in either none (when AND'ing terms) or far too many (when OR'ing terms) hits. We have created a two-step procedure as an approach to meeting the challenge of multi-gene queries. Our results so far shows that the returned sets of articles scores high on relevance.


Subject(s)
Algorithms , Genes , Information Storage and Retrieval/methods , Humans , Inflammation/genetics , MEDLINE , Neovascularization, Pathologic/genetics , Signal Transduction/genetics
8.
Artif Intell Med ; 18(2): 117-32, 2000 Feb.
Article in English | MEDLINE | ID: mdl-10648846

ABSTRACT

One of the common limitations of expert systems for medical diagnosis is that they make an implicit assumption that multiple disorders do not co-occur in a single patient. The need for this simplifying assumption stems from the fact that finding minimal sets of disorders that cover all symptoms for a given patient is generally computationally intractable (NP-hard). In this paper, we explain the need for performing multi-disorder diagnosis, review previous approaches, formulate the problem using set theory notation, and propose the use of a search method based on a genetic algorithm. We test the algorithm and compare it to another approach using a simple example. The genetic algorithm performs well independently of the order of symptoms, and has the potential to perform multi-disorder diagnosis using existing or newly developed knowledge bases.


Subject(s)
Algorithms , Genetic Diseases, Inborn/diagnosis , Genetics, Medical , Humans
9.
Proc AMIA Symp ; : 246-50, 1999.
Article in English | MEDLINE | ID: mdl-10566358

ABSTRACT

This paper evaluates the variable selection performed by several machine-learning techniques on a myocardial infarction data set. The focus of this work is to determine which of 43 input variables are considered relevant for prediction of myocardial infarction. The algorithms investigated were logistic regression (with stepwise, forward, and backward selection), backpropagation for multilayer perceptrons (input relevance determination), Bayesian neural networks (automatic relevance determination), and rough sets. An independent method (self-organizing maps) was then used to evaluate and visualize the different subsets of predictor variables. Results show good agreement on some predictors, but also variability among different methods; only one variable was selected by all models.


Subject(s)
Algorithms , Artificial Intelligence , Myocardial Infarction/diagnosis , Chest Pain/etiology , Diagnosis, Computer-Assisted , Evaluation Studies as Topic , Humans , Logistic Models , Mathematics , Neural Networks, Computer
10.
Proc AMIA Symp ; : 984-8, 1999.
Article in English | MEDLINE | ID: mdl-10566508

ABSTRACT

Actual use of regression models in clinical practice depends on model simplicity. Reducing the number of variables in a model contributes to this goal. The quality of a particular selection of variables for a logistic regression model can be defined in terms of the number of variables selected and the model's discriminatory performance, as measured by the area under the ROC curve. A genetic algorithm was applied to search for the best variable combinations for modeling presence of myocardial infarction in a data set of patients with chest pain. Using an external validation set, the resulting model was compared with models constructed with standard backward, forward and stepwise methods of variable selection. The improvement in discriminatory ability yielded by the genetic algorithm variable selection method was statistically significant (p < 0.02).


Subject(s)
Algorithms , Genetics , Logistic Models , Myocardial Infarction/diagnosis , Chest Pain/etiology , Evaluation Studies as Topic , Humans , Models, Biological , ROC Curve
11.
Article in English | MEDLINE | ID: mdl-9357617

ABSTRACT

Many medical studies deal with the assessment of the prognostic or diagnostic power of some particular test with respect to some particular medical condition. However, even though a test is deemed to be powerful in this respect, the test may not be strictly needed to perform for everyone. If the test is costly or invasive, this issue is of particular interest. This paper presents a methodology based on rough set theory and Boolean reasoning that can be used to identify those patients for whom performing the test is redundant or superfluous. Furthermore, the methodology enables one to automatically construct a set of descriptive and minimal if-then rules that model the patient group in need of the test. A reanalysis of a previously published real-world dataset of patients with chest pain is used as a case study.


Subject(s)
Coronary Disease/diagnosis , Decision Making, Computer-Assisted , Coronary Disease/complications , Coronary Disease/mortality , Decision Support Techniques , Humans , Information Theory , Myocardial Infarction/etiology , Probability , Prognosis , Sensitivity and Specificity , Tomography, Emission-Computed, Single-Photon
SELECTION OF CITATIONS
SEARCH DETAIL
...