Search | VHL Regional Portal

Show: 20 | 50 | 100

Results 1 - 10 de 10

Filter

Recent Progress in the Discovery and Design of Antimicrobial Peptides Using Traditional Machine Learning and Deep Learning.

Yan, Jielu; Cai, Jianxiu; Zhang, Bob; Wang, Yapeng; Wong, Derek F; Siu, Shirley W I.

Antibiotics (Basel) ; 11(10)2022 Oct 21.

Article in English | MEDLINE | ID: mdl-36290108

ABSTRACT

Antimicrobial resistance has become a critical global health problem due to the abuse of conventional antibiotics and the rise of multi-drug-resistant microbes. Antimicrobial peptides (AMPs) are a group of natural peptides that show promise as next-generation antibiotics due to their low toxicity to the host, broad spectrum of biological activity, including antibacterial, antifungal, antiviral, and anti-parasitic activities, and great therapeutic potential, such as anticancer, anti-inflammatory, etc. Most importantly, AMPs kill bacteria by damaging cell membranes using multiple mechanisms of action rather than targeting a single molecule or pathway, making it difficult for bacterial drug resistance to develop. However, experimental approaches used to discover and design new AMPs are very expensive and time-consuming. In recent years, there has been considerable interest in using in silico methods, including traditional machine learning (ML) and deep learning (DL) approaches, to drug discovery. While there are a few papers summarizing computational AMP prediction methods, none of them focused on DL methods. In this review, we aim to survey the latest AMP prediction methods achieved by DL approaches. First, the biology background of AMP is introduced, then various feature encoding methods used to represent the features of peptide sequences are presented. We explain the most popular DL techniques and highlight the recent works based on them to classify AMPs and design novel peptide sequences. Finally, we discuss the limitations and challenges of AMP prediction.

Time series for blind biosignal classification model.

Wong, Derek F; Chao, Lidia S; Zeng, Xiaodong; Vai, Mang-I; Lam, Heng-Leong.

Comput Biol Med ; 54: 32-6, 2014 Nov.

Article in English | MEDLINE | ID: mdl-25199847

ABSTRACT

Biosignals such as electrocardiograms (ECG), electroencephalograms (EEG), and electromyograms (EMG), are important noninvasive measurements useful for making diagnostic decisions. Recently, considerable research has been conducted in order to potentially automate signal classification for assisting in disease diagnosis. However, the biosignal type (ECG, EEG, EMG or other) needs to be known prior to the classification process. If the given biosignal is of an unknown type, none of the existing methodologies can be utilized. In this paper, a blind biosignal classification model (B(2)SC Model) is proposed in order to identify the source biosignal type automatically, and thus ultimately benefit the diagnostic decision. The approach employs time series algorithms for constructing the model. It uses a dynamic time warping (DTW) algorithm with clustering to discover the similarity between two biosignals, and consequently classifies disease without prior knowledge of the source signal type. The empirical experiments presented in this paper demonstrate the effectiveness of the method as well as the scalability of the approach.

Subject(s)

Algorithms , Artificial Intelligence , Diagnosis, Computer-Assisted/methods , Electrodiagnosis/methods , Models, Statistical , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Animals , Computer Simulation , Data Interpretation, Statistical , Humans

Unsupervised quality estimation model for English to German translation and its application in extensive supervised evaluation.

Han, Aaron L-F; Wong, Derek F; Chao, Lidia S; He, Liangye; Lu, Yi.

ScientificWorldJournal ; 2014: 760301, 2014.

Article in English | MEDLINE | ID: mdl-24892086

ABSTRACT

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.

Subject(s)

Models, Theoretical , Translating , England , Germany

iSentenizer-µ: multilingual sentence boundary detection model.

Wong, Derek F; Chao, Lidia S; Zeng, Xiaodong.

ScientificWorldJournal ; 2014: 196574, 2014.

Article in English | MEDLINE | ID: mdl-24883358

ABSTRACT

Sentence boundary detection (SBD) system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new languages domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation process has to be restarted from scratch. In this paper, we present a multilingual sentence boundary detection system (iSentenizer-µ) for Danish, German, English, Spanish, Dutch, French, Italian, Portuguese, Greek, Finnish, and Swedish languages. The proposed system is able to detect the sentence boundaries of a mixture of different text genres and languages with high accuracy. We employ i (+)Learning algorithm, an incremental tree learning architecture, for constructing the system. iSentenizer-µ, under the incremental learning framework, is adaptable to text of different topics and Roman-alphabet languages, by merging new data into existing model to learn the new knowledge incrementally by revision instead of retraining. The system has been extensively evaluated on different languages and text genres and has been compared against two state-of-the-art SBD systems, Punkt and MaxEnt. The experimental results show that the proposed system outperforms the other systems on all datasets.

Subject(s)

Natural Language Processing , Decision Trees , Language , Linguistics , Models, Theoretical , Translating

A relationship: word alignment, phrase table, and translation quality.

Tian, Liang; Wong, Derek F; Chao, Lidia S; Oliveira, Francisco.

ScientificWorldJournal ; 2014: 438106, 2014.

Article in English | MEDLINE | ID: mdl-24883402

ABSTRACT

In the last years, researchers conducted several studies to evaluate the machine translation quality based on the relationship between word alignments and phrase table. However, existing methods usually employ ad-hoc heuristics without theoretical support. So far, there is no discussion from the aspect of providing a formula to describe the relationship among word alignments, phrase table, and machine translation performance. In this paper, on one hand, we focus on formulating such a relationship for estimating the size of extracted phrase pairs given one or more word alignment points. On the other hand, a corpus-motivated pruning technique is proposed to prune the default large phrase table. Experiment proves that the deduced formula is feasible, which not only can be used to predict the size of the phrase table, but also can be a valuable reference for investigating the relationship between the translation performance and phrase tables based on different links of word alignment. The corpus-motivated pruning results show that nearly 98% of phrases can be reduced without any significant loss in translation quality.

Subject(s)

Natural Language Processing , Translating , Language , Linguistics/methods

Chinese unknown word recognition for PCFG-LA parsing.

Huang, Qiuping; He, Liangye; Wong, Derek F; Chao, Lidia S.

ScientificWorldJournal ; 2014: 959328, 2014.

Article in English | MEDLINE | ID: mdl-24895681

ABSTRACT

This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce the POS tag ambiguities of unknown words to improve the parsing performance. In addition, a novel method, using graph-based semisupervised learning (SSL), is proposed to improve the syntax parsing of unknown words. Its goal is to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The method is mainly to propagate lexical emission probabilities to unknown words by building the similarity graphs over the words of labeled and unlabeled data. The derived distributions are incorporated into the parsing process. The proposed methods are effective in dealing with the unknown words to improve the parsing. Empirical results for Penn Chinese Treebank and TCT Treebank revealed its effectiveness.

Subject(s)

Vocabulary , Asian People , Humans , Language , Recognition, Psychology

A systematic comparison of data selection criteria for SMT domain adaptation.

Wang, Longyue; Wong, Derek F; Chao, Lidia S; Lu, Yi; Xing, Junwen.

ScientificWorldJournal ; 2014: 745485, 2014.

Article in English | MEDLINE | ID: mdl-24683356

ABSTRACT

Data selection has shown significant improvements in effective use of training data by extracting sentences from large general-domain corpora to adapt statistical machine translation (SMT) systems to in-domain data. This paper performs an in-depth analysis of three different sentence selection techniques. The first one is cosine tf-idf, which comes from the realm of information retrieval (IR). The second is perplexity-based approach, which can be found in the field of language modeling. These two data selection techniques applied to SMT have been already presented in the literature. However, edit distance for this task is proposed in this paper for the first time. After investigating the individual model, a combination of all three techniques is proposed at both corpus level and model level. Comparative experiments are conducted on Hong Kong law Chinese-English corpus and the results indicate the following: (i) the constraint degree of similarity measuring is not monotonically related to domain-specific translation quality; (ii) the individual selection models fail to perform effectively and robustly; but (iii) bilingual resources and combination methods are helpful to balance out-of-vocabulary (OOV) and irrelevant data; (iv) finally, our method achieves the goal to consistently boost the overall translation performance that can ensure optimal quality of a real-life SMT system.

Subject(s)

Artificial Intelligence , Models, Theoretical

Unsupervised chunking based on graph propagation from bilingual corpus.

Zhu, Ling; Wong, Derek F; Chao, Lidia S.

ScientificWorldJournal ; 2014: 401943, 2014.

Article in English | MEDLINE | ID: mdl-24772017

ABSTRACT

This paper presents a novel approach for unsupervised shallow parsing model trained on the unannotated Chinese text of parallel Chinese-English corpus. In this approach, no information of the Chinese side is applied. The exploitation of graph-based label propagation for bilingual knowledge transfer, along with an application of using the projected labels as features in unsupervised model, contributes to a better performance. The experimental comparisons with the state-of-the-art algorithms show that the proposed approach is able to achieve impressive higher accuracy in terms of F-score.

Subject(s)

Artificial Intelligence , Language , Models, Theoretical , Algorithms

Constructing better classifier ensemble based on weighted accuracy and diversity measure.

Zeng, Xiaodong; Wong, Derek F; Chao, Lidia S.

ScientificWorldJournal ; 2014: 961747, 2014.

Article in English | MEDLINE | ID: mdl-24672402

ABSTRACT

A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.

Subject(s)

Models, Theoretical , Algorithms , Humans

10.

A supportive attribute-assisted discretization model for medical classification.

Wong, Derek F; Chao, Lidia S; Zeng, Xiao Dong.

Biomed Mater Eng ; 24(1): 289-95, 2014.

Article in English | MEDLINE | ID: mdl-24211909

ABSTRACT

Discretization of a continuous-valued symptom (attribute) in medical data set is a crucial preprocessing step for the medical classification task. This paper proposes a supportive attribute - assisted discretization (SAAD) model for medical diagnostic problems. The intent of this approach is to discover the best supportive symptom that correlates closely with the continuous-valued symptom being discretized and to conduct the discretization process using the significant supportive information that is provided by the best supportive symptom, because we hypothesize that a good discretization scheme should rely heavily on the interaction between a continuous-valued attribute and both its supportive attribute and the class attribute. SAAD can consider each continuous-valued symptom differently and intelligently, which allows it to be capable of minimizing the information lost and the data uncertainty. Hence, SAAD results in higher classification accuracy. Empirical experiments using ten real-life datasets from the UCI repository were conducted to compare the classification accuracy achieved by several prestigious classifiers with SAAD and other state-of-the-art discretization approaches. The experimental results demonstrate the effectiveness and usefulness of the proposed approach in enhancing the diagnostic accuracy.

Subject(s)

Computational Biology/methods , Disease/classification , Software , Algorithms , Bayes Theorem , Data Mining , Databases, Factual , Diagnosis, Computer-Assisted , Humans , Models, Theoretical , Reproducibility of Results

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL