Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
1.
Cancers (Basel) ; 13(20)2021 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-34680236

RESUMO

Prognostic biomarkers can have an important role in the clinical practice because they allow stratification of patients in terms of predicting the outcome of a disorder. Obstacles for developing such markers include lack of robustness when using different data sets and limited concordance among similar signatures. In this paper, we highlight a new problem that relates to the biological meaning of already established prognostic gene expression signatures. Specifically, it is commonly assumed that prognostic markers provide sensible biological information and molecular explanations about the underlying disorder. However, recent studies on prognostic biomarkers investigating 80 established signatures of breast and prostate cancer demonstrated that this is not the case. We will show that this surprising result is related to the distinction between causal models and predictive models and the obfuscating usage of these models in the biomedical literature. Furthermore, we suggest a falsification procedure for studies aiming to establish a prognostic signature to safeguard against false expectations with respect to biological utility.

2.
Front Genet ; 12: 649429, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34367234

RESUMO

High-throughput technologies do not only provide novel means for basic biological research but also for clinical applications in hospitals. For instance, the usage of gene expression profiles as prognostic biomarkers for predicting cancer progression has found widespread interest. Aside from predicting the progression of patients, it is generally believed that such prognostic biomarkers also provide valuable information about disease mechanisms and the underlying molecular processes that are causal for a disorder. However, the latter assumption has been challenged. In this paper, we study this problem for prostate cancer. Specifically, we investigate a large number of previously published prognostic signatures of prostate cancer based on gene expression profiles and show that none of these can provide unique information about the underlying disease etiology of prostate cancer. Hence, our analysis reveals that none of the studied signatures has a sensible biological meaning. Overall, this shows that all studied prognostic signatures are merely black-box models allowing sensible predictions of prostate cancer outcome but are not capable of providing causal explanations to enhance the understanding of prostate cancer.

3.
Nonlinear Dyn ; 105(4): 3819-3833, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34429568

RESUMO

We propose a new epidemic model considering the partial mapping relationship in a two-layered time-varying network, which aims to study the influence of information diffusion on epidemic spreading. In the model, one layer represents the epidemic-related information diffusion in the social networks, while the other layer denotes the epidemic spreading in physical networks. In addition, there just exist mapping relationships between partial pairs of nodes in the two-layered network, which characterizes the interaction between information diffusion and epidemic spreading. Meanwhile, the information and epidemics can only spread in their own layers. Afterwards, starting from the microscopic Markov chain (MMC) method, we can establish the dynamic equation of epidemic spreading and then analytically deduce its epidemic threshold, which demonstrates that the ratio of correspondence between two layers has a significant effect on the epidemic threshold of the proposed model. Finally, it is found that MMC method can well match with Monte Carlo (MC) simulations, and the relevant results can be helpful to understand the epidemic spreading properties in depth.

4.
Front Artif Intell ; 4: 576892, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34195608

RESUMO

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.

5.
Front Big Data ; 4: 591749, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33969290

RESUMO

The ultimate goal of the social sciences is to find a general social theory encompassing all aspects of social and collective phenomena. The traditional approach to this is very stringent by trying to find causal explanations and models. However, this approach has been recently criticized for preventing progress due to neglecting prediction abilities of models that support more problem-oriented approaches. The latter models would be enabled by the surge of big Web-data currently available. Interestingly, this problem cannot be overcome with methods from computational social science (CSS) alone because this field is dominated by simulation-based approaches and descriptive models. In this article, we address this issue and argue that the combination of big social data with social networks is needed for creating prediction models. We will argue that this alliance has the potential for gradually establishing a causal social theory. In order to emphasize the importance of integrating big social data with social networks, we call this approach data-driven computational social network science (DD-CSNS).

6.
PLoS One ; 16(3): e0245728, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33735225

RESUMO

At the beginning of 2020, the COVID-19 pandemic was able to spread quickly in Wuhan and in the province of Hubei due to a lack of experience with this novel virus. Additionally, authories had no proven experience with applying insufficient medical, communication and crisis management tools. For a considerable period of time, the actual number of people infected was unknown. There were great uncertainties regarding the dynamics and spread of the Covid-19 virus infection. In this paper, we develop a system dynamics model for the three connected regions (Wuhan, Hubei excl. Wuhan, China excl. Hubei) to understand the infection and spread dynamics of the virus and provide a more accurate estimate of the number of infected people in Wuhan and discuss the necessity and effectivity of protective measures against this epidemic, such as the quarantines imposed throughout China. We use the statistics of confirmed cases of China excl. Hubei. Also the daily data on travel activity within China was utilized, in order to determine the actual numerical development of the infected people in Wuhan City and Hubei Province. We used a multivariate Monte Carlo optimization to parameterize the model to match the official statistics. In particular, we used the model to calculate the infections, which had already broken out, but were not diagnosed for various reasons.


Assuntos
COVID-19/epidemiologia , Algoritmos , COVID-19/prevenção & controle , COVID-19/transmissão , China/epidemiologia , Humanos , Modelos Estatísticos , Método de Monte Carlo , Pandemias , Quarentena , SARS-CoV-2/isolamento & purificação , Viagem
7.
Sci Rep ; 11(1): 156, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33420139

RESUMO

The identification of prognostic biomarkers for predicting cancer progression is an important problem for two reasons. First, such biomarkers find practical application in a clinical context for the treatment of patients. Second, interrogation of the biomarkers themselves is assumed to lead to novel insights of disease mechanisms and the underlying molecular processes that cause the pathological behavior. For breast cancer, many signatures based on gene expression values have been reported to be associated with overall survival. Consequently, such signatures have been used for suggesting biological explanations of breast cancer and drug mechanisms. In this paper, we demonstrate for a large number of breast cancer signatures that such an implication is not justified. Our approach eliminates systematically all traces of biological meaning of signature genes and shows that among the remaining genes, surrogate gene sets can be formed with indistinguishable prognostic prediction capabilities and opposite biological meaning. Hence, our results demonstrate that none of the studied signatures has a sensible biological interpretation or meaning with respect to disease etiology. Overall, this shows that prognostic signatures are black-box models with sensible predictions of breast cancer outcome but no value for revealing causal connections. Furthermore, we show that the number of such surrogate gene sets is not small but very large.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/metabolismo , Neoplasias da Mama/mortalidade , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Prognóstico , Transcriptoma
8.
Sci Rep ; 10(1): 16672, 2020 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-33028846

RESUMO

Gene ontology (GO) is an eminent knowledge base frequently used for providing biological interpretations for the analysis of genes or gene sets from biological, medical and clinical problems. Unfortunately, the interpretation of such results is challenging due to the large number of GO terms, their hierarchical and connected organization as directed acyclic graphs (DAGs) and the lack of tools allowing to exploit this structural information explicitly. For this reason, we developed the R package GOxploreR. The main features of GOxploreR are (I) easy and direct access to structural features of GO, (II) structure-based ranking of GO-terms, (III) mapping to reduced GO-DAGs including visualization capabilities and (IV) prioritizing of GO-terms. The underlying idea of GOxploreR is to exploit a graph-theoretical perspective of GO as manifested by its DAG-structure and the containing hierarchy levels for cumulating semantic information. That means all these features enhance the utilization of structural information of GO and complement existing analysis tools. Overall, GOxploreR provides exploratory as well as confirmatory tools for complementing any kind of analysis resulting in a list of GO-terms, e.g., from differentially expressed genes or gene sets, GWAS or biomarkers. Our R package GOxploreR is freely available from CRAN.


Assuntos
Bases de Dados Genéticas , Ontologia Genética , Software , Humanos
9.
Front Cell Dev Biol ; 8: 673, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32984300

RESUMO

The number of scientific publications in the literature is steadily growing, containing our knowledge in the biomedical, health, and clinical sciences. Since there is currently no automatic archiving of the obtained results, much of this information remains buried in textual details not readily available for further usage or analysis. For this reason, natural language processing (NLP) and text mining methods are used for information extraction from such publications. In this paper, we review practices for Named Entity Recognition (NER) and Relation Detection (RD), allowing, e.g., to identify interactions between proteins and drugs or genes and diseases. This information can be integrated into networks to summarize large-scale details on a particular biomedical or clinical problem, which is then amenable for easy data management and further analysis. Furthermore, we survey novel deep learning methods that have recently been introduced for such tasks.

10.
BMC Genomics ; 21(1): 650, 2020 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-32962626

RESUMO

BACKGROUND: The small number of samples and the curse of dimensionality hamper the better application of deep learning techniques for disease classification. Additionally, the performance of clustering-based feature selection algorithms is still far from being satisfactory due to their limitation in using unsupervised learning methods. To enhance interpretability and overcome this problem, we developed a novel feature selection algorithm. In the meantime, complex genomic data brought great challenges for the identification of biomarkers and therapeutic targets. The current some feature selection methods have the problem of low sensitivity and specificity in this field. RESULTS: In this article, we designed a multi-scale clustering-based feature selection algorithm named MCBFS which simultaneously performs feature selection and model learning for genomic data analysis. The experimental results demonstrated that MCBFS is robust and effective by comparing it with seven benchmark and six state-of-the-art supervised methods on eight data sets. The visualization results and the statistical test showed that MCBFS can capture the informative genes and improve the interpretability and visualization of tumor gene expression and single-cell sequencing data. Additionally, we developed a general framework named McbfsNW using gene expression data and protein interaction data to identify robust biomarkers and therapeutic targets for diagnosis and therapy of diseases. The framework incorporates the MCBFS algorithm, network recognition ensemble algorithm and feature selection wrapper. McbfsNW has been applied to the lung adenocarcinoma (LUAD) data sets. The preliminary results demonstrated that higher prediction results can be attained by identified biomarkers on the independent LUAD data set, and we also structured a drug-target network which may be good for LUAD therapy. CONCLUSIONS: The proposed novel feature selection method is robust and effective for gene selection, classification, and visualization. The framework McbfsNW is practical and helpful for the identification of biomarkers and targets on genomic data. It is believed that the same methods and principles are extensible and applicable to other different kinds of data sets.


Assuntos
Adenocarcinoma de Pulmão/genética , Biomarcadores Tumorais/genética , Genômica/métodos , Neoplasias Pulmonares/genética , Aprendizado de Máquina Supervisionado , Adenocarcinoma de Pulmão/classificação , Adenocarcinoma de Pulmão/patologia , Biomarcadores Tumorais/metabolismo , Análise por Conglomerados , Humanos , Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/patologia , Software
11.
Front Genet ; 11: 18, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32117437

RESUMO

The task of predicting protein-protein interactions (PPIs) has been essential in the context of understanding biological processes. This paper proposes a novel computational model namely FCTP-WSRC to predict PPIs effectively. Initially, combinations of the F-vector, composition (C) and transition (T) are used to map each protein sequence onto numeric feature vectors. Afterwards, an effective feature extraction method PCA (principal component analysis) is employed to reconstruct the most discriminative feature subspaces, which is subsequently used as input in weighted sparse representation based classification (WSRC) for prediction. The FCTP-WSRC model achieves accuracies of 96.67%, 99.82%, and 98.09% for H. pylori, Human and Yeast datasets respectively. Furthermore, the FCTP-WSRC model performs well when predicting three significant PPIs networks: the single-core network (CD9), the multiple-core network (Ras-Raf-Mek-Erk-Elk-Srf pathway), and the cross-connection network (Wnt-related Network). Consequently, the promising results show that the proposed method can be a powerful tool for PPIs prediction with excellent performance and less time.

12.
Sci Rep ; 10(1): 1432, 2020 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-31996705

RESUMO

Artificial intelligence provides the opportunity to reveal important information buried in large amounts of complex data. Electronic health records (eHRs) are a source of such big data that provide a multitude of health related clinical information about patients. However, text data from eHRs, e.g., discharge summary notes, are challenging in their analysis because these notes are free-form texts and the writing formats and styles vary considerably between different records. For this reason, in this paper we study deep learning neural networks in combination with natural language processing to analyze text data from clinical discharge summaries. We provide a detail analysis of patient phenotyping, i.e., the automatic prediction of ten patient disorders, by investigating the influence of network architectures, sample sizes and information content of tokens. Importantly, for patients suffering from Chronic Pain, the disorder that is the most difficult one to classify, we find the largest performance gain for a combined word- and sentence-level input convolutional neural network (ws-CNN). As a general result, we find that the combination of data quality and data quantity of the text data is playing a crucial role for using more complex network architectures that improve significantly beyond a word-level input CNN model. From our investigations of learning curves and token selection mechanisms, we conclude that for such a transition one requires larger sample sizes because the amount of information per sample is quite small and only carried by few tokens and token categories. Interestingly, we found that the token frequency in the eHRs follow a Zipf law and we utilized this behavior to investigate the information content of tokens by defining a token selection mechanism. The latter addresses also issues of explainable AI.


Assuntos
Bioestatística/métodos , Dor Crônica/diagnóstico , Aprendizado Profundo , Registros Eletrônicos de Saúde/normas , Aprendizado de Máquina , Inteligência Artificial , Biologia Computacional , Humanos , Redes Neurais de Computação , Fenótipo
13.
Front Artif Intell ; 3: 4, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33733124

RESUMO

Deep learning models stand for a new learning paradigm in artificial intelligence (AI) and machine learning. Recent breakthrough results in image analysis and speech recognition have generated a massive interest in this field because also applications in many other domains providing big data seem possible. On a downside, the mathematical and computational methodology underlying deep learning models is very challenging, especially for interdisciplinary scientists. For this reason, we present in this paper an introductory review of deep learning approaches including Deep Feedforward Neural Networks (D-FFNN), Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Autoencoders (AEs), and Long Short-Term Memory (LSTM) networks. These models form the major core architectures of deep learning models currently used and should belong in any data scientist's toolbox. Importantly, those core architectural building blocks can be composed flexibly-in an almost Lego-like manner-to build new application-specific network architectures. Hence, a basic understanding of these network architectures is important to be prepared for future developments in AI.

14.
Front Artif Intell ; 3: 524339, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33733197

RESUMO

The field artificial intelligence (AI) was founded over 65 years ago. Starting with great hopes and ambitious goals the field progressed through various stages of popularity and has recently undergone a revival through the introduction of deep neural networks. Some problems of AI are that, so far, neither the "intelligence" nor the goals of AI are formally defined causing confusion when comparing AI to other fields. In this paper, we present a perspective on the desired and current status of AI in relation to machine learning and statistics and clarify common misconceptions and myths. Our discussion is intended to lift the veil of vagueness surrounding AI to reveal its true countenance.

15.
BMC Cancer ; 19(1): 1176, 2019 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-31796020

RESUMO

BACKGROUND: Deciphering the meaning of the human DNA is an outstanding goal which would revolutionize medicine and our way for treating diseases. In recent years, non-coding RNAs have attracted much attention and shown to be functional in part. Yet the importance of these RNAs especially for higher biological functions remains under investigation. METHODS: In this paper, we analyze RNA-seq data, including non-coding and protein coding RNAs, from lung adenocarcinoma patients, a histologic subtype of non-small-cell lung cancer, with deep learning neural networks and other state-of-the-art classification methods. The purpose of our paper is three-fold. First, we compare the classification performance of different versions of deep belief networks with SVMs, decision trees and random forests. Second, we compare the classification capabilities of protein coding and non-coding RNAs. Third, we study the influence of feature selection on the classification performance. RESULTS: As a result, we find that deep belief networks perform at least competitively to other state-of-the-art classifiers. Second, data from non-coding RNAs perform better than coding RNAs across a number of different classification methods. This demonstrates the equivalence of predictive information as captured by non-coding RNAs compared to protein coding RNAs, conventionally used in computational diagnostics tasks. Third, we find that feature selection has in general a negative effect on the classification performance which means that unfiltered data with all features give the best classification results. CONCLUSIONS: Our study is the first to use ncRNAs beyond miRNAs for the computational classification of cancer and for performing a direct comparison of the classification capabilities of protein coding RNAs and non-coding RNAs.


Assuntos
Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/genética , RNA Mensageiro/metabolismo , RNA não Traduzido/genética , Biologia Computacional/métodos , Árvores de Decisões , Humanos , Neoplasias Pulmonares/patologia , Aprendizado de Máquina , MicroRNAs/genética , Redes Neurais de Computação , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos
16.
Front Psychol ; 10: 2596, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31803123

RESUMO

Social media data, for instance from Twitter or Facebook, provide a new type of data that consist of a mixture of text, image and video information. From a scientific point of view, the capabilities of this type of data from such microblogs are not well explored and to date it is largely unknown what principal knowledge can be extracted thereof. In this paper, we present a discussion of the capabilities of data from microblogs for performing a psychoanalysis. This could allow an analysis of the human personality of individual users. Such prospects raises serious concerns regarding the privacy of users of social media platforms.

17.
PLoS One ; 14(11): e0223745, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31725742

RESUMO

In this paper, we define novel graph measures for directed networks. The measures are based on graph polynomials utilizing the out- and in-degrees of directed graphs. Based on these polynomial, we define another polynomial and use their positive zeros as graph measures. The measures have meaningful properties that we investigate based on analytical and numerical results. As the computational complexity to compute the measures is polynomial, our approach is efficient and can be applied to large networks. We emphasize that our approach clearly complements the literature in this field as, to the best of our knowledge, existing complexity measures for directed graphs have never been applied on a large scale.


Assuntos
Biologia Computacional/estatística & dados numéricos , Gráficos por Computador/estatística & dados numéricos , Simulação por Computador , Teoria dos Jogos , Conceitos Matemáticos , Biologia de Sistemas/estatística & dados numéricos
18.
Artigo em Inglês | MEDLINE | ID: mdl-31656552

RESUMO

Binary decision making is a topic of great interest for many fields, including biomedical science, economics, management, politics, medicine, natural science and social science, and much effort has been spent for developing novel computational methods to address problems arising in the aforementioned fields. However, in order to evaluate the effectiveness of any prediction method for binary decision making, the choice of the most appropriate error measures is of paramount importance. Due to the variety of error measures available, the evaluation process of binary decision making can be a complex task. The main objective of this study is to provide a comprehensive survey of error measures for evaluating the outcome of binary decision making applicable to many data-driven fields. This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data MiningTechnologies > PredictionAlgorithmic Development > Statistics.

19.
Front Genet ; 10: 557, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31258549

RESUMO

The LINCS L1000 data repository contains almost two million gene expression profiles for thousands of small molecules and drugs. However, due to the complexity and the size of the data repository and a lack of an interoperable interface, the creation of pharmacologically meaningful workflows utilizing these data is severely hampered. In order to overcome this limitation, we developed the L1000 Viewer, a search engine and graphical web interface for the LINCS data repository. The web interface serves as an interactive platform allowing the user to select different forms of perturbation profiles, e.g., for specific cell lines, drugs, dosages, time points and combinations thereof. At its core, our method has a database we created from inferring and utilizing the intricate dependency graph structure among the data files. The L1000 Viewer is accessible via http://L1000viewer.bio-complexity.com/.

20.
ISA Trans ; 95: 185-193, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31151750

RESUMO

Towing is a critical process to deploy a cylindrical drilling platform. However, the towing process faces a great variety of risks from a complex nautical environment, the dynamics in towing and maneuvering, to unexpected events. Therefore, safely navigating the towing system following a planned route to a target sea area is essential. To tackle the time-varying disturbances induced by wind, current and system parametric uncertainties, a path following control method for a towing system of cylindrical drilling platform is designed based on linear active disturbance rejection control. By utilizing Maneuvering Modeling Group model as well as a catenary model, we develop a three degree-of-freedom dynamic mathematical model of the towing system under external environmental disturbances and internal uncertainties. Furthermore, we design a linear active disturbance rejection control path following controller for real-time tracking error correction based on a guidance method combining cross-track error and parallax. Finally, the path following performance of the towing system is evaluated in a simulation environment under various disturbances and internal uncertainties, where the corresponding tracking error is analyzed. The results show that the linear active disturbance rejection control performs well under both the external disturbance and inherent uncertainties, and better satisfy the tracking performance criteria than a traditional proportional-integral-derivative controller.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...