Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
AJOB Empir Bioeth ; 10(3): 201-213, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31050604

RESUMO

Background: Molecular epidemiology (ME) is a technique used to study the dynamics of pathogen transmission through a population. When used to study HIV infections, ME generates powerful information about how HIV is transmitted, including epidemiologic patterns of linkage and, potentially, transmission direction. Thus, ME raises challenging questions about the most responsible way to protect individual privacy while acquiring and using these data to advance public health and inform HIV intervention strategies. Here, we report on stakeholders' expectations for how researchers and public health agencies might use HIV ME. Methods: We conducted in-depth semistructured interviews with 40 key stakeholders to find out how these individuals respond to the proposed risks and benefits of HIV ME. Transcripts were coded and analyzed using Atlas.ti. Expectations were assessed through analysis of responses to hypothetical scenarios designed to help interviewees think through the implications of this emerging technique in the contexts of research and public health. Results: Our analysis reveals a wide range of imagined responsibilities, capabilities, and trustworthiness of researchers and public health agencies. Specifically, many respondents expect researchers and public health agencies to use HIV ME carefully and maintain transparency about how data will be used. Informed consent was discussed as an important opportunity for notification of privacy risks. Furthermore, some respondents wished that public health agencies were held to the same form of oversight and accountability represented by informed consent in research. Conclusions: To prevent HIV ME from becoming a barrier to testing or a source of public mistrust, the sense of vulnerability expressed by some respondents must be addressed. In research, informed consent is an obvious opportunity for this. Without giving specimen donors a similar opportunity to opt out, public health agencies may find it difficult to adopt HIV ME without deterring testing and treatment.


Assuntos
Infecções por HIV/epidemiologia , HIV/genética , Epidemiologia Molecular , Motivação , Administração em Saúde Pública , Pesquisadores , Confiança , Adulto , Idoso , Confidencialidade/ética , Feminino , Infecções por HIV/transmissão , Humanos , Consentimento Livre e Esclarecido/ética , Entrevistas como Assunto , Masculino , Pessoa de Meia-Idade , Epidemiologia Molecular/métodos , Epidemiologia Molecular/organização & administração , Pesquisadores/psicologia , Medição de Risco , Adulto Jovem
2.
BMC Med Inform Decis Mak ; 19(1): 93, 2019 04 27.
Artigo em Inglês | MEDLINE | ID: mdl-31029130

RESUMO

INTRODUCTION: While early diagnostic decision support systems were built around knowledge bases, more recent systems employ machine learning to consume large amounts of health data. We argue curated knowledge bases will remain an important component of future diagnostic decision support systems by providing ground truth and facilitating explainable human-computer interaction, but that prototype development is hampered by the lack of freely available computable knowledge bases. METHODS: We constructed an open access knowledge base and evaluated its potential in the context of a prototype decision support system. We developed a modified set-covering algorithm to benchmark the performance of our knowledge base compared to existing platforms. Testing was based on case reports from selected literature and medical student preparatory material. RESULTS: The knowledge base contains over 2000 ICD-10 coded diseases and 450 RX-Norm coded medications, with over 8000 unique observations encoded as SNOMED or LOINC semantic terms. Using 117 medical cases, we found the accuracy of the knowledge base and test algorithm to be comparable to established diagnostic tools such as Isabel and DXplain. Our prototype, as well as DXplain, showed the correct answer as "best suggestion" in 33% of the cases. While we identified shortcomings during development and evaluation, we found the knowledge base to be a promising platform for decision support systems. CONCLUSION: We built and successfully evaluated an open access knowledge base to facilitate the development of new medical diagnostic assistants. This knowledge base can be expanded and curated by users and serve as a starting point to facilitate new technology development and system improvement in many contexts.


Assuntos
Acesso à Informação , Sistemas de Apoio a Decisões Clínicas , Bases de Conhecimento , Sistemas Inteligentes , Humanos , Classificação Internacional de Doenças , Aprendizado de Máquina , Semântica , Software , Vocabulário Controlado
3.
J Public Health Res ; 6(3): 992, 2017 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-29291190

RESUMO

Background: Advances in viral sequence analysis make it possible to track the spread of infectious pathogens, such as HIV, within a population. When used to study HIV, these analyses (i.e., molecular epidemiology) potentially allow inference of the identity of individual research subjects. Current privacy standards are likely insufficient for this type of public health research. To address this challenge, it will be important to understand how stakeholders feel about the benefits and risks of such research. Design and Methods: To better understand perceived benefits and risks of these research methods, in-depth qualitative interviews were conducted with HIV-infected individuals, individuals at high-risk for contracting HIV, and professionals in HIV care and prevention. To gather additional perspectives, attendees to a public lecture on molecular epidemiology were asked to complete an informal questionnaire. Results: Among those interviewed and polled, there was near unanimous support for using molecular epidemiology to study HIV. Questionnaires showed strong agreement about benefits of molecular epidemiology, but diverse attitudes regarding risks. Interviewees acknowledged several risks, including privacy breaches and provocation of anti-gay sentiment. The interviews also demonstrated a possibility that misunderstandings about molecular epidemiology may affect how risks and benefits are evaluated. Conclusions: While nearly all study participants agree that the benefits of HIV molecular epidemiology outweigh the risks, concerns about privacy must be addressed to ensure continued trust in research institutions and willingness to participate in research.

4.
Proc Natl Acad Sci U S A ; 111(45): 15981-6, 2014 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-25349383

RESUMO

All organisms have evolved mechanisms to manage the stalling of ribosomes upon translation of aberrant mRNA. In eukaryotes, the large ribosomal subunit-associated quality control complex (RQC), composed of the listerin/Ltn1 E3 ubiquitin ligase and cofactors, mediates the ubiquitylation and extraction of ribosome-stalled nascent polypeptide chains for proteasomal degradation. How RQC recognizes stalled ribosomes and performs its functions has not been understood. Using single-particle cryoelectron microscopy, we have determined the structure of the RQC complex bound to stalled 60S ribosomal subunits. The structure establishes how Ltn1 associates with the large ribosomal subunit and properly positions its E3-catalytic RING domain to mediate nascent chain ubiquitylation. The structure also reveals that a distinguishing feature of stalled 60S particles is an exposed, nascent chain-conjugated tRNA, and that the Tae2 subunit of RQC, which facilitates Ltn1 binding, is responsible for selective recognition of stalled 60S subunits. RQC components are engaged in interactions across a large span of the 60S subunit surface, connecting the tRNA in the peptidyl transferase center to the distally located nascent chain tunnel exit. This work provides insights into a mechanism linking translation and protein degradation that targets defective proteins immediately after synthesis, while ignoring nascent chains in normally translating ribosomes.


Assuntos
Complexo de Endopeptidases do Proteassoma/metabolismo , Biossíntese de Proteínas/fisiologia , Proteólise , Subunidades Ribossômicas Maiores de Eucariotos/metabolismo , Saccharomyces cerevisiae/metabolismo , Ubiquitinação/fisiologia , Estrutura Terciária de Proteína , Aminoacil-RNA de Transferência/genética , Aminoacil-RNA de Transferência/metabolismo , Proteínas de Ligação a RNA , Subunidades Ribossômicas Maiores de Eucariotos/genética , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Relação Estrutura-Atividade , Ubiquitina-Proteína Ligases/genética , Ubiquitina-Proteína Ligases/metabolismo
5.
Lancet Infect Dis ; 14(8): 773-777, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24721230

RESUMO

Rapid growth in the genetic sequencing of pathogens in recent years has led to the creation of large sequence databases. This aggregated sequence data can be very useful for tracking and predicting epidemics of infectious diseases. However, the balance between the potential public health benefit and the risk to personal privacy for individuals whose genetic data (personal or pathogen) are included in such work has been difficult to delineate, because neither the true benefit nor the actual risk to participants has been adequately defined. Existing approaches to minimise the risk of privacy loss to participants are based on de-identification of data by removal of a predefined set of identifiers. These approaches neither guarantee privacy nor protect the usefulness of the data. We propose a new approach to privacy protection that will quantify the risk to participants, while still maximising the usefulness of the data to researchers. This emerging standard in privacy protection and disclosure control, which is known as differential privacy, uses a process-driven rather than data-centred approach to protecting privacy.


Assuntos
Pesquisa Biomédica/legislação & jurisprudência , Doenças Transmissíveis/epidemiologia , Doenças Transmissíveis/etiologia , Confidencialidade/normas , Bases de Dados Factuais/legislação & jurisprudência , Humanos
6.
J Struct Biol ; 184(3): 417-26, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24161732

RESUMO

Single-particle cryo-electron microscopy is now well established as a technique for the structural characterization of large macromolecules and macromolecular complexes. The raw data is very noisy and consists of two-dimensional projections, from which the 3D biological object must be reconstructed. The 3D object depends upon knowledge of proper angular orientations assigned to the 2D projection images. Numerous algorithms have been developed for determining relative angular orientations between 2D images, but the transition from 2D to 3D remains challenging and can result in erroneous and conflicting results. Here we describe a general, automated procedure, called OptiMod, for reconstructing and optimizing 3D models using common-lines methodologies. OptiMod approximates orientation angles and reconstructs independent maps from 2D class averages. It then iterates the procedure, while considering each map as a raw solution that needs to be compared with other possible outcomes. We incorporate procedures for 3D alignment, clustering, and refinement to optimize each map, as well as standard scoring metrics to facilitate the selection of the optimal model. We also show that small angle tilt-pair data can be included as one of the scoring metrics to improve the selection of the optimal initial model, and also to provide a validation check. The overall approach is demonstrated using two experimental cryo-EM data sets--the 80S ribosome that represents a relatively straightforward case for ab initio reconstruction, and the Tf-TfR complex that represents a challenging case in that it has previously been shown to provide multiple equally plausible solutions to the initial model problem.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Microscopia Eletrônica/métodos , Algoritmos , Microscopia Crioeletrônica/métodos , Imageamento Tridimensional/métodos , Substâncias Macromoleculares , Modelos Teóricos , Receptores da Transferrina/química , Ribossomos/ultraestrutura , Transferrina/química
7.
J Am Med Inform Assoc ; 19(5): 750-7, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22511018

RESUMO

OBJECTIVE: Today's clinical research institutions provide tools for researchers to query their data warehouses for counts of patients. To protect patient privacy, counts are perturbed before reporting; this compromises their utility for increased privacy. The goal of this study is to extend current query answer systems to guarantee a quantifiable level of privacy and allow users to tailor perturbations to maximize the usefulness according to their needs. METHODS: A perturbation mechanism was designed in which users are given options with respect to scale and direction of the perturbation. The mechanism translates the true count, user preferences, and a privacy level within administrator-specified bounds into a probability distribution from which the perturbed count is drawn. RESULTS: Users can significantly impact the scale and direction of the count perturbation and can receive more accurate final cohort estimates. Strong and semantically meaningful differential privacy is guaranteed, providing for a unified privacy accounting system that can support role-based trust levels. This study provides an open source web-enabled tool to investigate visually and numerically the interaction between system parameters, including required privacy level and user preference settings. CONCLUSIONS: Quantifying privacy allows system administrators to provide users with a privacy budget and to monitor its expenditure, enabling users to control the inevitable loss of utility. While current measures of privacy are conservative, this system can take advantage of future advances in privacy measurement. The system provides new ways of trading off privacy and utility that are not provided in current study design systems.


Assuntos
Pesquisa Biomédica , Confidencialidade , Armazenamento e Recuperação da Informação/métodos , Sistemas Computadorizados de Registros Médicos/estatística & dados numéricos , Humanos , Modelos Estatísticos , Projetos de Pesquisa , Software , Interface Usuário-Computador
8.
J Am Med Inform Assoc ; 19(2): 196-201, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22081224

RESUMO

iDASH (integrating data for analysis, anonymization, and sharing) is the newest National Center for Biomedical Computing funded by the NIH. It focuses on algorithms and tools for sharing data in a privacy-preserving manner. Foundational privacy technology research performed within iDASH is coupled with innovative engineering for collaborative tool development and data-sharing capabilities in a private Health Insurance Portability and Accountability Act (HIPAA)-certified cloud. Driving Biological Projects, which span different biological levels (from molecules to individuals to populations) and focus on various health conditions, help guide research and development within this Center. Furthermore, training and dissemination efforts connect the Center with its stakeholders and educate data owners and data consumers on how to share and use clinical and biological data. Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.


Assuntos
Algoritmos , Confidencialidade , Disseminação de Informação , Informática Médica , Previsões , Objetivos , Health Insurance Portability and Accountability Act , Armazenamento e Recuperação da Informação , Estados Unidos
9.
AMIA Annu Symp Proc ; 2011: 723-31, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22195129

RESUMO

Our objective is to facilitate semi-automated detection of suspicious access to EHRs. Previously we have shown that a machine learning method can play a role in identifying potentially inappropriate access to EHRs. However, the problem of sampling informative instances to build a classifier still remained. We developed an integrated filtering method leveraging both anomaly detection based on symbolic clustering and signature detection, a rule-based technique. We applied the integrated filtering to 25.5 million access records in an intervention arm, and compared this with 8.6 million access records in a control arm where no filtering was applied. On the training set with cross-validation, the AUC was 0.960 in the control arm and 0.998 in the intervention arm. The difference in false negative rates on the independent test set was significant, P=1.6×10(-6). Our study suggests that utilization of integrated filtering strategies to facilitate the construction of classifiers can be helpful.


Assuntos
Inteligência Artificial , Segurança Computacional , Registros Eletrônicos de Saúde , Humanos , Modelos Logísticos , Privacidade , Sensibilidade e Especificidade
10.
Artif Intell Med ; 50(3): 175-80, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20466526

RESUMO

OBJECTIVE: To evaluate and compare the performance of different rule-ranking algorithms for rule-based classifiers on biomedical datasets. METHODOLOGY: Empirical evaluation of five rule ranking algorithms on two biomedical datasets, with performance evaluation based on ROC analysis and 5 × 2 cross-validation. RESULTS: On a lung cancer dataset, the area under the ROC curve (AUC) of, on average, 14267.1 rules was 0.862. Multi-rule ranking found 13.3 rules with an AUC of 0.852. Four single-rule ranking algorithms, using the same number of rules, achieved average AUC values of 0.830, 0.823, 0.823, and 0.822, respectively. On a prostate cancer dataset, an average of 339265.3 rules had an AUC of 0.934, while 9.4 rules obtained from multi-rule and single-rule rankings had average AUCs of 0.932, 0.926, 0.925, 0.902 and 0.902, respectively. CONCLUSION: Multi-variate rule ranking performs better than the single-rule ranking algorithms. Both single-rule and multi-rule methods are able to substantially reduce the number of rules while keeping classification performance at a level comparable to the full rule set.


Assuntos
Algoritmos , Inteligência Artificial , Área Sob a Curva , Neoplasias da Mama/patologia , Feminino , Humanos , Neoplasias Pulmonares/patologia , Masculino , Neoplasias da Próstata/patologia
11.
IEEE Trans Knowl Data Eng ; 22(3): 437-446, 2010 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-21373375

RESUMO

The goal of data anonymization is to allow the release of scientifically useful data in a form that protects the privacy of its subjects. This requires more than simply removing personal identifiers from the data, because an attacker can still use auxiliary information to infer sensitive individual information. Additional perturbation is necessary to prevent these inferences, and the challenge is to perturb the data in a way that preserves its analytic utility.No existing anonymization algorithm provides both perfect privacy protection and perfect analytic utility. We make the new observation that anonymization algorithms are not required to operate in the original vector-space basis of the data, and many algorithms can be improved by operating in a judiciously chosen alternate basis. A spectral basis derived from the data's eigenvectors is one that can provide substantial improvement. We introduce the term spectral anonymization to refer to an algorithm that uses a spectral basis for anonymization, and we give two illustrative examples.We also propose new measures of privacy protection that are more general and more informative than existing measures, and a principled reference standard with which to define adequate privacy protection.

12.
J Am Med Inform Assoc ; 15(1): 44-53, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-17947629

RESUMO

Monitoring vital signs and locations of certain classes of ambulatory patients can be useful in overcrowded emergency departments and at disaster scenes, both on-site and during transportation. To be useful, such monitoring needs to be portable and low cost, and have minimal adverse impact on emergency personnel, e.g., by not raising an excessive number of alarms. The SMART (Scalable Medical Alert Response Technology) system integrates wireless patient monitoring (ECG, SpO(2)), geo-positioning, signal processing, targeted alerting, and a wireless interface for caregivers. A prototype implementation of SMART was piloted in the waiting area of an emergency department and evaluated with 145 post-triage patients. System deployment aspects were also evaluated during a small-scale disaster-drill exercise.


Assuntos
Computadores de Mão , Medicina de Desastres/instrumentação , Monitorização Ambulatorial/instrumentação , Telemetria , Redes de Comunicação de Computadores , Desenho de Equipamento , Humanos , Monitorização Ambulatorial/métodos , Projetos Piloto , Integração de Sistemas , Telecomunicações
13.
AMIA Annu Symp Proc ; : 191-5, 2007 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-18693824

RESUMO

The work reported in this paper investigates the use of a decision-support tool for the diagnosis of pigmented skin lesions in a real-world clinical trial with 511 patients and 3827 lesion evaluations. We analyzed a number of outcomes of the trial, such as direct comparison of system performance in laboratory and clinical setting, the performance of physicians using the system compared to a control dermatologist without the system, and repeatability of system recommendations. The results show that system performance was significantly less in the real-world setting compared to the laboratory setting (c-index of 0.87 vs. 0.94, p = 0.01). Dermatologists using the system achieved a combined sensitivity of 85% and combined specificity of 95%. We also show that the process of acquiring lesion images using digital dermoscopy devices needs to be standardized before sufficiently high repeatability of measurements can be assured.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Dermoscopia/métodos , Diagnóstico por Computador , Melanoma/diagnóstico , Neoplasias Cutâneas/diagnóstico , Humanos , Sensibilidade e Especificidade
14.
BMC Bioinformatics ; 7: 8, 2006 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-16401341

RESUMO

BACKGROUND: Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties. RESULTS: It is shown that the tagging problem is NP-hard but approximable within 1 + ln((n2 - n)/2) for n haplotypes but not approximable within (1-epsilon) ln(n/2) for any epsilon > 0 unless NP subset DTIME(n(log log n)). A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time O(np/2(2m-p+1)) < or = O(m(n2-n)/2) where p < or = min(n, m) for n haplotypes of size m. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound. CONCLUSION: The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computational efforts expended can only be expected if the computational effort is distributed and done in parallel.


Assuntos
Biologia Computacional/métodos , Haplótipos , Algoritmos , Animais , Mapeamento Cromossômico , Genoma Humano , Humanos , Modelos Genéticos , Modelos Estatísticos , Modelos Teóricos , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes , Alinhamento de Sequência , Software
15.
Bioinformatics ; 21(9): 1964-70, 2005 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-15661797

RESUMO

MOTIVATION: Interpretation of classification models derived from gene-expression data is usually not simple, yet it is an important aspect in the analytical process. We investigate the performance of small rule-based classifiers based on fuzzy logic in five datasets that are different in size, laboratory origin and biomedical domain. RESULTS: The classifiers resulted in rules that can be readily examined by biomedical researchers. The fuzzy-logic-based classifiers compare favorably with logistic regression in all datasets. AVAILABILITY: Prototype available upon request.


Assuntos
Algoritmos , Biomarcadores Tumorais/metabolismo , Lógica Fuzzy , Perfilação da Expressão Gênica/métodos , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Biomarcadores Tumorais/genética , Análise por Conglomerados , Humanos , Proteínas de Neoplasias/genética , Neoplasias/genética , Software
16.
J Biomed Inform ; 37(4): 293-303, 2004 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-15465482

RESUMO

Data originating from biomedical experiments has provided machine learning researchers with an important source of motivation for developing and evaluating new algorithms. A new wave of algorithmic development has been initiated with the publication of gene expression data derived from microarrays. Microarray data analysis is particularly challenging given the large number of measurements (typically in the order of thousands) that are reported for relatively few samples (typically in the order of dozens). Many data sets are now available on the web. It is important that machine learning researchers understand how data are obtained and which assumptions are necessary in the analysis. Microarray data have the potential to cause significant impact in machine learning research, not just as a rich and realistic source of cases for testing new algorithms, as has been the UCI machine learning repository in the past decades, but also as a main motivation for their development. In this article, we briefly review the biology underlying microarrays, the process of obtaining gene expression measurements, and the rationale behind the common types of analyses involved in a microarray experiment. We outline the main challenges and reiterate critical considerations regarding the construction of supervised learning models that use this type of data. The goal of this article is to familiarize machine learning researchers with data originated from gene expression microarrays.


Assuntos
Algoritmos , Inteligência Artificial , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica/fisiologia , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Projetos de Pesquisa , Humanos , Reconhecimento Automatizado de Padrão/métodos
17.
Int J Med Inform ; 73(7-8): 599-606, 2004 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-15246040

RESUMO

One of the fundamental rights of patients is to have their privacy protected by health care organizations, so that information that can be used to identify a particular individual is not used to reveal sensitive patient data such as diagnoses, reasons for ordering tests, test results, etc. A common practice is to remove sensitive data from databases that are disseminated to the public, but this can make the disseminated database useless for important public health purposes. If the degree of anonymity of a disseminated data set could be measured, it would be possible to design algorithms that can assure that the desired level of confidentiality is achieved. Privacy protection in disseminated databases can be facilitated by the use of special ambiguation algorithms. Most of these algorithms are aimed at making one individual indistinguishable from one or more of his peers. However, even in databases considered "anonymous", it may still be possible to obtain sensitive information about some individuals or groups of individuals with the use of pattern recognition algorithms. In this article, we study the problem of determining the degree of ambiguation in disseminated databases and discuss its implications in the development and testing of "anonymization" algorithms.


Assuntos
Algoritmos , Testes Anônimos , Confidencialidade , Bases de Dados como Assunto , Revelação , Sistemas Computadorizados de Registros Médicos , Humanos , Privacidade
18.
Artif Intell Med ; 31(2): 155-67, 2004 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15219292

RESUMO

Analysis of gene expression data obtained from microarrays presents a new set of challenges to machine learning modeling. In this domain, in which the number of variables far exceeds the number of cases, identifying relevant genes or groups of genes that are good markers for a particular classification is as important as achieving good classification performance. Although several machine learning algorithms have been proposed to address the latter, identification of gene markers has not been systematically pursued. In this article, we investigate several algorithms for selecting gene markers for classification. We test these algorithms using logistic regression, as this is a simple and efficient supervised learning algorithm. We demonstrate, using 10 different data sets, that a conditionally univariate algorithm constitutes a viable choice if a researcher is interested in quickly determining a set of gene expression levels that can serve as markers for disease. We show that the classification performance of logistic regression is not very different from that of more sophisticated algorithms that have been applied in previous studies, and that the gene selection in the logistic regression algorithm is reasonable in both cases. Furthermore, the algorithm is simple, its theoretical basis is well established, and our user-friendly implementation is now freely available on the internet, serving as a benchmarking tool for the development of new algorithms.


Assuntos
Algoritmos , Inteligência Artificial , Perfilação da Expressão Gênica , Marcadores Genéticos , Análise Multivariada , Análise de Sequência com Séries de Oligonucleotídeos , Diagnóstico , Diagnóstico Diferencial , Humanos
19.
Artif Intell Med ; 28(1): 75-87, 2003 May.
Artigo em Inglês | MEDLINE | ID: mdl-12850314

RESUMO

We investigate the use of perceptrons for classification of microarray data where we use two datasets that were published in [Nat. Med. 7 (6) (2001) 673] and [Science 286 (1999) 531]. The classification problem studied by Khan et al. is related to the diagnosis of small round blue cell tumours (SRBCT) of childhood which are difficult to classify both clinically and via routine histology. Golub et al. study acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). We used a simulated annealing-based method in learning a system of perceptrons, each obtained by resampling of the training set. Our results are comparable to those of Khan et al. and Golub et al., indicating that there is a role for perceptrons in the classification of tumours based on gene-expression data. We also show that it is critical to perform feature selection in this type of models, i.e. we propose a method for identifying genes that might be significant for the particular tumour types. For SRBCTs, zero error on test data has been obtained for only 13 out of 2308 genes; for the ALL/AML problem, we have zero error for 9 out of 7129 genes that are used for the classification procedure. Furthermore, we provide evidence that Epicurean-style learning and simulated annealing-based search are both essential for obtaining the best classification results.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Redes Neurais de Computação , Análise de Sequência com Séries de Oligonucleotídeos , Carcinoma de Células Pequenas/genética , Criança , Humanos , Neoplasias Pulmonares/genética , Rabdomiossarcoma/genética , Sarcoma de Células Pequenas/genética
20.
Proc AMIA Symp ; : 572-6, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12463888

RESUMO

Several problems in medicine and biology involve the comparison of two measurements made on the same set of cases. The problem differs from a calibration problem because no gold standard can be identified. Testing the null hypothesis of no relationship using measures of association is not optimal since the measurements are made on the same cases, and therefore correlation coefficients will tend to be significant. The descriptive Bland-Altman method can be used in exploratory analysis of this problem, allowing the visualization of gross systematic differences between the two sets of measurements. We utilize the method on three sets of matched observations and demonstrate its usefulness in detecting systematic variations between two measurement technologies to assess gene expression.


Assuntos
Biologia Computacional/métodos , Interpretação Estatística de Dados , Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Viés , RNA Mensageiro
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...