Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Entropy (Basel) ; 25(12)2023 Dec 14.
Artigo em Inglês | MEDLINE | ID: mdl-38136539

RESUMO

Humans are able to quickly adapt to new situations, learn effectively with limited data, and create unique combinations of basic concepts. In contrast, generalizing out-of-distribution (OOD) data and achieving combinatorial generalizations are fundamental challenges for machine learning models. Moreover, obtaining high-quality labeled examples can be very time-consuming and expensive, particularly when specialized skills are required for labeling. To address these issues, we propose BtVAE, a method that utilizes conditional VAE models to achieve combinatorial generalization in certain scenarios and consequently to generate out-of-distribution (OOD) data in a semi-supervised manner. Unlike previous approaches that use new factors of variation during testing, our method uses only existing attributes from the training data but in ways that were not seen during training (e.g., small objects of a specific shape during training and large objects of the same shape during testing).

2.
Mach Learn ; 111(4): 1431-1521, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35602587

RESUMO

Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We consider the case of off-policy generative adversarial imitation learning, and perform an in-depth review, qualitative and quantitative, of the method. We show that forcing the learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We then study the effects of this necessary condition and provide several theoretical results involving the local Lipschitzness of the state-value function. We complement these guarantees with empirical evidence attesting to the strong positive effect that the consistent satisfaction of the Lipschitzness constraint on the reward has on imitation performance. Finally, we tackle a generic pessimistic reward preconditioning add-on spawning a large class of reward shaping methods, which makes the base method it is plugged into provably more robust, as shown in several additional theoretical guarantees. We then discuss these through a fine-grained lens and share our insights. Crucially, the guarantees derived and reported in this work are valid for any reward satisfying the Lipschitzness condition, nothing is specific to imitation. As such, these may be of independent interest.

3.
Sensors (Basel) ; 22(3)2022 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-35161833

RESUMO

Despite the great attention that the research community has paid to the creation of novel indoor positioning methods, a rather limited volume of works has focused on the confidence that Indoor Positioning Systems (IPS) assign to the position estimates that they produce. The concept of estimating, dynamically, the accuracy of the position estimates provided by an IPS has been sporadically studied in the literature of the field. Recently, this concept has started being studied as well in the context of outdoor positioning systems of Internet of Things (IoT) based on Low-Power Wide-Area Networks (LPWANs). What is problematic is that the consistent comparison of the proposed methods is quasi nonexistent: new methods rarely use previous ones as baselines; often, a small number of evaluation metrics are reported while different metrics are reported among different relevant publications, the use of open data is rare, and the publication of open code is absent. In this work, we present an open-source, reproducible benchmarking framework for evaluating and consistently comparing various methods of Dynamic Accuracy Estimation (DAE). This work reviews the relevant literature, presenting in a consistent terminology commonalities and differences and discussing baselines and evaluation metrics. Moreover, it evaluates multiple methods of DAE using open data, open code, and a rich set of relevant evaluation metrics. This is the first work aiming to establish the state of the art of methods of DAE determination in IPS and in LPWAN positioning systems, through an open, transparent, holistic, reproducible, and consistent evaluation of the methods proposed in the relevant literature.


Assuntos
Benchmarking , Confiança , Coleta de Dados
4.
Entropy (Basel) ; 22(8)2020 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-33286658

RESUMO

One of the major shortcomings of variational autoencoders is the inability to produce generations from the individual modalities of data originating from mixture distributions. This is primarily due to the use of a simple isotropic Gaussian as the prior for the latent code in the ancestral sampling procedure for data generations. In this paper, we propose a novel formulation of variational autoencoders, conditional prior VAE (CP-VAE), with a two-level generative process for the observed data where continuous z and a discrete c variables are introduced in addition to the observed variables x. By learning data-dependent conditional priors, the new variational objective naturally encourages a better match between the posterior and prior conditionals, and the learning of the latent categories encoding the major source of variation of the original data in an unsupervised manner. Through sampling continuous latent code from the data-dependent conditional priors, we are able to generate new samples from the individual mixture components corresponding, to the multimodal structure over the original data. Moreover, we unify and analyse our objective under different independence assumptions for the joint distribution of the continuous and discrete latent variables. We provide an empirical evaluation on one synthetic dataset and three image datasets, FashionMNIST, MNIST, and Omniglot, illustrating the generative performance of our new model comparing to multiple baselines.

5.
J Biomed Semantics ; 9(1): 21, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-30111369

RESUMO

BACKGROUND: While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. RESULTS: An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results. CONCLUSIONS: Our proposed representation learning approach leverages terminological embeddings to capture semantic similarity. Our results provide evidence that the approach produces embeddings that are especially well tailored to the ontology matching task, demonstrating a novel pathway for the problem.


Assuntos
Ontologias Biológicas , Aprendizado de Máquina
6.
PLoS One ; 10(12): e0144439, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26645087

RESUMO

Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755-0.771) to 0.769 (95% CI: 0.761-0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.


Assuntos
Comorbidade , Modelos Teóricos
7.
BMC Bioinformatics ; 11: 594, 2010 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-21208396

RESUMO

BACKGROUND: The purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School. RESULTS: We found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test-set is essential. CONCLUSIONS: Valid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design.


Assuntos
Biomarcadores/urina , Proteômica/métodos , Adulto , Algoritmos , Eletroforese Capilar , Feminino , Humanos , Masculino , Espectrometria de Massas , Valores de Referência , Tamanho da Amostra , Adulto Jovem
8.
Brief Bioinform ; 9(2): 102-18, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18310106

RESUMO

Mass-spectra based proteomic profiles have received widespread attention as potential tools for biomarker discovery and early disease diagnosis. A major data-analytical problem involved is the extremely high dimensionality (i.e. number of features or variables) of proteomic data, in particular when the sample size is small. This article reviews dimensionality reduction methods that have been used in proteomic biomarker studies. It then focuses on the problem of selecting the most appropriate method for a specific task or dataset, and proposes method combination as a potential alternative to single-method selection. Finally, it points out the potential of novel dimension reduction techniques, in particular those that incorporate domain knowledge through the use of informative priors or causal inference.


Assuntos
Biomarcadores/análise , Processamento Eletrônico de Dados , Proteômica/métodos , Pesquisa , Algoritmos , Processamento Eletrônico de Dados/instrumentação , Processamento Eletrônico de Dados/métodos , Espectrometria de Massas/métodos , Proteoma/análise , Pesquisa/instrumentação , Projetos de Pesquisa
9.
J Chromatogr B Analyt Technol Biomed Life Sci ; 853(1-2): 20-30, 2007 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-17537688

RESUMO

One of the challenges of current proteomics research is identifying healthy and diseased mass spectrometric (MS) patterns within biological fluids. As a result, sample preparation methodologies, as well as the mathematical tools utilized for MS data analysis become pivotal. This study involves a thorough evaluation of the reproducibility and protein resolution that various urinary protein preparation methodologies provide in MALDI MS analysis. Several precipitation approaches, ultrafiltration, as well as direct dilution of urine in MALDI MS compatible buffers were applied in combination to a thorough bioinformatics analysis of the generated MS data. Our results indicate that ultrafiltration, as well as direct dilution of urine in TFA, can provide information rich and reproducible spectra for mass ranges up to 20 kDa. The importance of the presence of peak reproducibility filters when generating disease classification models is suggested.


Assuntos
Biologia Computacional/métodos , Proteínas/análise , Proteinúria/urina , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Humanos , Peso Molecular , Reprodutibilidade dos Testes
10.
Mass Spectrom Rev ; 25(3): 409-49, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16463283

RESUMO

Among the many applications of mass spectrometry, biomarker pattern discovery from protein mass spectra has aroused considerable interest in the past few years. While research efforts have raised hopes of early and less invasive diagnosis, they have also brought to light the many issues to be tackled before mass-spectra-based proteomic patterns become routine clinical tools. Known issues cover the entire pipeline leading from sample collection through mass spectrometry analytics to biomarker pattern extraction, validation, and interpretation. This study focuses on the data-analytical phase, which takes as input mass spectra of biological specimens and discovers patterns of peak masses and intensities that discriminate between different pathological states. We survey current work and investigate computational issues concerning the different stages of the knowledge discovery process: exploratory analysis, quality control, and diverse transforms of mass spectra, followed by further dimensionality reduction, classification, and model evaluation. We conclude after a brief discussion of the critical biomedical task of analyzing discovered discriminatory patterns to identify their component proteins as well as interpret and validate their biological implications.


Assuntos
Espectrometria de Massas/métodos , Proteínas/análise , Algoritmos , Animais , Biomarcadores , Biologia Computacional , Humanos , Espectrometria de Massas/classificação , Modelos Químicos , Mapeamento de Peptídeos , Proteômica
11.
Proteomics ; 4(8): 2320-32, 2004 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-15274126

RESUMO

In this paper we try to identify potential biomarkers for early stroke diagnosis using surface-enhanced laser desorption/ionization mass spectrometry coupled with analysis tools from machine learning and data mining. Data consist of 42 specimen samples, i.e., mass spectra divided in two big categories, stroke and control specimens. Among the stroke specimens two further categories exist that correspond to ischemic and hemorrhagic stroke; in this paper we limit our data analysis to discriminating between control and stroke specimens. We performed two suites of experiments. In the first one we simply applied a number of different machine learning algorithms; in the second one we have chosen the best performing algorithm as it was determined from the first phase and coupled it with a number of different feature selection methods. The reason for this was 2-fold, first to establish whether feature selection can indeed improve performance, which in our case it did not seem to confirm, but more importantly to acquire a small list of potentially interesting biomarkers. Of the different methods explored the most promising one was support vector machines which gave us high levels of sensitivity and specificity. Finally, by analyzing the models constructed by support vector machines we produced a small set of 13 features that could be used as potential biomarkers, and which exhibited good performance both in terms of sensitivity, specificity and model stability.


Assuntos
Espectrometria de Massas/métodos , Acidente Vascular Cerebral/sangue , Acidente Vascular Cerebral/diagnóstico , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Biomarcadores , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Análise Serial de Proteínas , Sensibilidade e Especificidade , Acidente Vascular Cerebral/classificação , Acidente Vascular Cerebral/patologia
12.
Proteomics ; 3(9): 1716-9, 2003 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-12973731

RESUMO

We addressed the problem of discriminating between 24 diseased and 17 healthy specimens on the basis of protein mass spectra. To prepare the data, we performed mass to charge ratio (m/z) normalization, baseline elimination, and conversion of absolute peak height measures to height ratios. After preprocessing, the major difficulty encountered was the extremely large number of variables (1676 m/z values) versus the number of examples (41). Dimensionality reduction was treated as an integral part of the classification process; variable selection was coupled with model construction in a single ten-fold cross-validation loop. We explored different experimental setups involving two peak height representations, two variable selection methods, and six induction algorithms, all on both the original 1676-mass data set and on a prescreened 124-mass data set. Highest predictive accuracies (1-2 off-sample misclassifications) were achieved by a multilayer perceptron and Naïve Bayes, with the latter displaying more consistent performance (hence greater reliability) over varying experimental conditions. We attempted to identify the most discriminant peaks (proteins) on the basis of scores assigned by the two variable selection methods and by neural network based sensitivity analysis. These three scoring schemes consistently ranked four peaks as the most relevant discriminators: 11683, 1403, 17350 and 66107.


Assuntos
Inteligência Artificial , Neoplasias Pulmonares/diagnóstico , Espectrometria de Massas/métodos , Proteínas/química , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA