Your browser doesn't support javascript.
loading
Support vector data description for finding non-coding RNA gene / 生物医学工程学杂志
Article in Chinese | WPRIM (Western Pacific) | ID: wpr-230785
Responsible library: WPRO
ABSTRACT
In the field of computational molecule biology, there is still a challenging question of how to detect non-coding RNA gene in lots of unlabeled sequences. Generally, the methods of machine learning and classification are employed to answer this question. However, only a limited number of positive training samples and unlabeled samples are available. The negative samples are difficult to define appropriately, yet they are necessary for usual learning-then-classification method. The common way for most of the existing non-coding RNA gene finding methods is to produce a number of random sequences as negative samples, which may hold some characteristic of positive sample sequences. Consequently, the contrived uncertain factor was introduced and the performance of methods was not good enough. In this paper, Support Vector Data Description (SVDD) is in use for to learning and classification as well as for detecting non-coding RNA gene in lots of unlabeled sequences, and the k-means clustering algorithm is employed before SVDD training to deal with the high flase positive fault in the result of SVDD. The training samples (target samples) are non-coding RNA genes validated by experiment. Moreover, appropriate features were constructed by Principal Component Analysis (PCA). The effectiveness and performance of the method are demonstrated by testing the cases in NONCODE databases and E. coli genome.
Subject(s)
Full text: Available Health context: Neglected Diseases Health problem: Neglected Diseases / Zoonoses Database: WPRIM (Western Pacific) Main subject: Algorithms / Pattern Recognition, Automated / Cluster Analysis / RNA, Untranslated / Escherichia coli / Support Vector Machine / Genetics / Methods Limits: Humans Language: Chinese Journal: Journal of Biomedical Engineering Year: 2010 Document type: Article
Full text: Available Health context: Neglected Diseases Health problem: Neglected Diseases / Zoonoses Database: WPRIM (Western Pacific) Main subject: Algorithms / Pattern Recognition, Automated / Cluster Analysis / RNA, Untranslated / Escherichia coli / Support Vector Machine / Genetics / Methods Limits: Humans Language: Chinese Journal: Journal of Biomedical Engineering Year: 2010 Document type: Article
...