Search | VHL Regional Portal

Probabilistic Contextual and Structural Dependencies Learning in Grammar-Based Genetic Programming.

Wong, Pak-Kan; Wong, Man-Leung; Leung, Kwong-Sak.

Evol Comput ; 29(2): 239-268, 2021 Jun 01.

Article in English | MEDLINE | ID: mdl-33047611

ABSTRACT

Genetic Programming is a method to automatically create computer programs based on the principles of evolution. The problem of deceptiveness caused by complex dependencies among components of programs is challenging. It is important because it can misguide Genetic Programming to create suboptimal programs. Besides, a minor modification in the programs may lead to a notable change in the program behaviours and affect the final outputs. This article presents Grammar-Based Genetic Programming with Bayesian Classifiers (GBGPBC) in which the probabilistic dependencies among components of programs are captured using a set of Bayesian network classifiers. Our system was evaluated using a set of benchmark problems (the deceptive maximum problems, the royal tree problems, and the bipolar asymmetric royal tree problems). It was shown to be often more robust and more efficient in searching the best programs than other related Genetic Programming approaches in terms of the total number of fitness evaluation. We studied what factors affect the performance of GBGPBC and discovered that robust variants of GBGPBC were consistently weakly correlated with some complexity measures. Furthermore, our approach has been applied to learn a ranking program on a set of customers in direct marketing. Our suggested solutions help companies to earn significantly more when compared with other solutions produced by several well-known machine learning algorithms, such as neural networks, logistic regression, and Bayesian networks.

Subject(s)

Algorithms , Neural Networks, Computer , Bayes Theorem , Machine Learning , Software

High-order dynamic Bayesian Network learning with hidden common causes for causal gene regulatory network.

Lo, Leung-Yau; Wong, Man-Leung; Lee, Kin-Hong; Leung, Kwong-Sak.

BMC Bioinformatics ; 16: 395, 2015 Nov 25.

Article in English | MEDLINE | ID: mdl-26608050

ABSTRACT

BACKGROUND: Inferring gene regulatory network (GRN) has been an important topic in Bioinformatics. Many computational methods infer the GRN from high-throughput expression data. Due to the presence of time delays in the regulatory relationships, High-Order Dynamic Bayesian Network (HO-DBN) is a good model of GRN. However, previous GRN inference methods assume causal sufficiency, i.e. no unobserved common cause. This assumption is convenient but unrealistic, because it is possible that relevant factors have not even been conceived of and therefore un-measured. Therefore an inference method that also handles hidden common cause(s) is highly desirable. Also, previous methods for discovering hidden common causes either do not handle multi-step time delays or restrict that the parents of hidden common causes are not observed genes. RESULTS: We have developed a discrete HO-DBN learning algorithm that can infer also hidden common cause(s) from discrete time series expression data, with some assumptions on the conditional distribution, but is less restrictive than previous methods. We assume that each hidden variable has only observed variables as children and parents, with at least two children and possibly no parents. We also make the simplifying assumption that children of hidden variable(s) are not linked to each other. Moreover, our proposed algorithm can also utilize multiple short time series (not necessarily of the same length), as long time series are difficult to obtain. CONCLUSIONS: We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. Experiment results show that our proposed algorithm can recover the causal GRNs adequately given the incomplete data. Using the limited real expression data and small subnetworks of the YEASTRACT network, we have also demonstrated the potential of our algorithm on real data, though more time series expression data is needed.

Subject(s)

Algorithms , Bayes Theorem , Computational Biology/methods , Gene Expression Profiling , Gene Regulatory Networks , Gene Expression Regulation , Humans

Time Delayed Causal Gene Regulatory Network Inference with Hidden Common Causes.

Lo, Leung-Yau; Wong, Man-Leung; Lee, Kin-Hong; Leung, Kwong-Sak.

PLoS One ; 10(9): e0138596, 2015.

Article in English | MEDLINE | ID: mdl-26394325

ABSTRACT

Inferring the gene regulatory network (GRN) is crucial to understanding the working of the cell. Many computational methods attempt to infer the GRN from time series expression data, instead of through expensive and time-consuming experiments. However, existing methods make the convenient but unrealistic assumption of causal sufficiency, i.e. all the relevant factors in the causal network have been observed and there are no unobserved common cause. In principle, in the real world, it is impossible to be certain that all relevant factors or common causes have been observed, because some factors may not have been conceived of, and therefore are impossible to measure. In view of this, we have developed a novel algorithm named HCC-CLINDE to infer an GRN from time series data allowing the presence of hidden common cause(s). We assume there is a sparse causal graph (possibly with cycles) of interest, where the variables are continuous and each causal link has a delay (possibly more than one time step). A small but unknown number of variables are not observed. Each unobserved variable has only observed variables as children and parents, with at least two children, and the children are not linked to each other. Since it is difficult to obtain very long time series, our algorithm is also capable of utilizing multiple short time series, which is more realistic. To our knowledge, our algorithm is far less restrictive than previous works. We have performed extensive experiments using synthetic data on GRNs of size up to 100, with up to 10 hidden nodes. The results show that our algorithm can adequately recover the true causal GRN and is robust to slight deviation from Gaussian distribution in the error terms. We have also demonstrated the potential of our algorithm on small YEASTRACT subnetworks using limited real data.

Subject(s)

Algorithms , Computational Biology/methods , Gene Regulatory Networks , Models, Genetic , Databases, Genetic/statistics & numerical data , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation , Kinetics , Reproducibility of Results , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics , Time Factors

Scalable model-based clustering for large databases based on data summarization.

Jin, Huidong; Wong, Man-Leung; Leung, K S.

IEEE Trans Pattern Anal Mach Intell ; 27(11): 1710-9, 2005 Nov.

Article in English | MEDLINE | ID: mdl-16285371

ABSTRACT

The scalability problem in data mining involves the development of methods for handling large databases with limited computational resources such as memory and computation time. In this paper, two scalable clustering algorithms, bEMADS and gEMADS, are presented based on the Gaussian mixture model. Both summarize data into subclusters and then generate Gaussian mixtures from their data summaries. Their core algorithm, EMADS, is defined on data summaries and approximates the aggregate behavior of each subcluster of data under the Gaussian mixture model. EMADS is provably convergent. Experimental results substantiate that both algorithms can run several orders of magnitude faster than expectation-maximization with little loss of accuracy.

Subject(s)

Algorithms , Artificial Intelligence , Cluster Analysis , Databases, Factual , Information Storage and Retrieval/methods , Models, Statistical , Pattern Recognition, Automated/methods , Computer Simulation , Data Interpretation, Statistical

An efficient self-organizing map designed by genetic algorithms for the traveling salesman problem.

Jin, Hui-Dong; Leung, Kwong-Sak; Wong, Man-Leung; Xu, Z B.

IEEE Trans Syst Man Cybern B Cybern ; 33(6): 877-88, 2003.

Article in English | MEDLINE | ID: mdl-18238240

ABSTRACT

As a typical combinatorial optimization problem, the traveling salesman problem (TSP) has attracted extensive research interest. In this paper, we develop a self-organizing map (SOM) with a novel learning rule. It is called the integrated SOM (ISOM) since its learning rule integrates the three learning mechanisms in the SOM literature. Within a single learning step, the excited neuron is first dragged toward the input city, then pushed to the convex hull of the TSP, and finally drawn toward the middle point of its two neighboring neurons. A genetic algorithm is successfully specified to determine the elaborate coordination among the three learning mechanisms as well as the suitable parameter setting. The evolved ISOM (eISOM) is examined on three sets of TSP to demonstrate its power and efficiency. The computation complexity of the eISOM is quadratic, which is comparable to other SOM-like neural networks. Moreover, the eISOM can generate more accurate solutions than several typical approaches for TSP including the SOM developed by Budinich, the expanding SOM, the convex elastic net, and the FLEXMAP algorithm. Though its solution accuracy is not yet comparable to some sophisticated heuristics, the eISOM is one of the most accurate neural networks for the TSP.

Learning nonlinear multiregression networks based on evolutionary computation.

Leung, Kwong-Sak; Wong, Man-Leung; Lam, Wai; Wang, Zhenyuan; Xu, Kebin.

IEEE Trans Syst Man Cybern B Cybern ; 32(5): 630-44, 2002.

Article in English | MEDLINE | ID: mdl-18244867

ABSTRACT

This paper describes a novel knowledge discovery and data mining framework dealing with nonlinear interactions among domain attributes. Our network-based model provides an effective and efficient reasoning procedure to perform prediction and decision making. Unlike many existing paradigms based on linear models, the attribute relationship in our framework is represented by nonlinear nonnegative multiregressions based on the Choquet integral. This kind of multiregression is able to model a rich set of nonlinear interactions directly. Our framework involves two layers. The outer layer is a network structure consisting of network elements as its components, while the inner layer is concerned with a particular network element modeled by Choquet integrals. We develop a fast double optimization algorithm (FDOA) for learning the multiregression coefficients of a single network element. Using this local learning component and multiregression-residual-cost evolutionary programming (MRCEP), we propose a global learning algorithm, called MRCEP-FDOA, for discovering the network structures and their elements from databases. We have conducted a series of experiments to assess the effectiveness of our algorithm and investigate the performance under different parameter combinations, as well as sizes of the training data sets. The empirical results demonstrate that our framework can successfully discover the target network structure and the regression coefficients.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL