Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
NPJ Digit Med ; 6(1): 92, 2023 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217691

RESUMO

In machine learning (ML), association patterns in the data, paths in decision trees, and weights between layers of the neural network are often entangled due to multiple underlying causes, thus masking the pattern-to-source relation, weakening prediction, and defying explanation. This paper presents a revolutionary ML paradigm: pattern discovery and disentanglement (PDD) that disentangles associations and provides an all-in-one knowledge system capable of (a) disentangling patterns to associate with distinct primary sources; (b) discovering rare/imbalanced groups, detecting anomalies and rectifying discrepancies to improve class association, pattern and entity clustering; and (c) organizing knowledge for statistically supported interpretability for causal exploration. Results from case studies have validated such capabilities. The explainable knowledge reveals pattern-source relations on entities, and underlying factors for causal inference, and clinical study and practice; thus, addressing the major concern of interpretability, trust, and reliability when applying ML to healthcare, which is a step towards closing the AI chasm.

2.
Sci Rep ; 11(1): 5688, 2021 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-33707478

RESUMO

Machine Learning has made impressive advances in many applications akin to human cognition for discernment. However, success has been limited in the areas of relational datasets, particularly for data with low volume, imbalanced groups, and mislabeled cases, with outputs that typically lack transparency and interpretability. The difficulties arise from the subtle overlapping and entanglement of functional and statistical relations at the source level. Hence, we have developed Pattern Discovery and Disentanglement System (PDD), which is able to discover explicit patterns from the data with various sizes, imbalanced groups, and screen out anomalies. We present herein four case studies on biomedical datasets to substantiate the efficacy of PDD. It improves prediction accuracy and facilitates transparent interpretation of discovered knowledge in an explicit representation framework PDD Knowledge Base that links the sources, the patterns, and individual patients. Hence, PDD promises broad and ground-breaking applications in genomic and biomedical machine learning.

3.
BMC Med Inform Decis Mak ; 21(1): 16, 2021 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-33422088

RESUMO

BACKGROUND: Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it needs methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups. METHODS: In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare. RESULTS: Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover a smaller set of succinct significant patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches. CONCLUSIONS: In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.


Assuntos
Algoritmos , Aprendizado de Máquina , Interpretação Estatística de Dados , Humanos
4.
BMC Med Genomics ; 11(Suppl 5): 103, 2018 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-30453949

RESUMO

BACKGROUND: A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors. METHODS: To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV. RESULTS: Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities. CONCLUSIONS: E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine.


Assuntos
Biologia Computacional/métodos , Algoritmos , Análise por Conglomerados , Citocromos c/química , Citocromos c/metabolismo , Análise de Componente Principal
5.
Proteomes ; 6(1)2018 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-29419792

RESUMO

A protein family has similar and diverse functions locally conserved as aligned sequence segments. Further discovering their association patterns could reveal subtle family subgroup characteristics. Since aligned residues associations (ARAs) in Aligned Pattern Clusters (APCs) are complex and intertwined due to entangled function, factors, and variance in the source environment, we have recently developed a novel method: Aligned Residue Association Discovery and Disentanglement (ARADD) to solve this problem. ARADD first obtains from an APC an ARA Frequency Matrix and converts it to an adjusted statistical residual vectorspace (SRV). It then disentangles the SRV into Principal Components (PCs) and Re-projects their vectors to a SRV to reveal succinct orthogonal AR groups. In this study, we applied ARADD to class A scavenger receptors (SR-A), a subclass of a diverse protein family binding to modified lipoproteins with diverse biological functionalities not explicitly known. Our experimental results demonstrated that ARADD can unveil subtle subgroups in sequence segments with diverse functionality and highly variable sequence lengths. We also demonstrated that the ARAs captured in a Position Weight Matrix or an APC were entangled in biological function and domain location but disentangled by ARADD to reveal different subclasses without knowing their actual occurrence positions.

7.
Zhonghua Liu Xing Bing Xue Za Zhi ; 25(7): 582-5, 2004 Jul.
Artigo em Chinês | MEDLINE | ID: mdl-15308037

RESUMO

OBJECTIVE: To understand the determinants and epidemiology of drug-resistant tuberculosis (TB) in rural area. METHODS: All the diagnosed TB patients in a county with directly observed treatment (DOTS) short-course program in 2002 and a sample of patients in another county without DOTS program located in northern Jiangsu province were surveyed with questionnaires. Drug susceptibility testing (DST) for positive cultures were performed by standardized proportion method. Univariable analysis and multivariate nonconditional logistic regression modeling were applied for data analysis. RESULTS: Among the 152 patients with DST results, 32.9% of the cases showed resistance to at least one of the first-line anti-tuberculosis drugs with 26.3% to isoniazid, 18.4% to rifampin and 17.1% to both isoniazid and rifampin respectively. Previous treatments for TB and residence in the county without DOTS program were independent risk factors for isoniazid and rifampin resistance. TB patients showing indifferent to their health and delayed health seeking for more than 1 month were more likely to have rifampin resistance. Independent predictors of multidrug-resistant TB would include delayed health seeking for more than 1 month (OR = 4.66, 95% CI: 1.26 - 17.24), residing in the county without a DOTS program (OR = 3.01, 95% CI: 1.10 - 8.22), indifference to their health condition (OR = 5.13, 95% CI: 1.06 - 24.90) and suffering from chronic diseases (OR = 0.22, 95% CI: 0.05 - 0.87). CONCLUSION: Drug-resistant TB was quite serious in this rural areas, mainly associated with man-made factors but partly due to the availability of the transmission.


Assuntos
Antituberculosos/farmacologia , Resistência a Múltiplos Medicamentos , Tuberculose Pulmonar/epidemiologia , Adulto , China/epidemiologia , Resistência Microbiana a Medicamentos , Etambutol/uso terapêutico , Humanos , Incidência , Isoniazida/uso terapêutico , Modelos Logísticos , Masculino , Testes de Sensibilidade Microbiana , Pessoa de Meia-Idade , Rifampina/uso terapêutico , Saúde da População Rural , Estreptomicina/uso terapêutico , Inquéritos e Questionários , Tuberculose Pulmonar/microbiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...