Search | VHL Regional Portal

1.

Encoding histopathology whole slide images with location-aware graphs for diagnostically relevant regions retrieval.

Zheng, Yushan; Jiang, Zhiguo; Shi, Jun; Xie, Fengying; Zhang, Haopeng; Luo, Wei; Hu, Dingyi; Sun, Shujiao; Jiang, Zhongmin; Xue, Chenghai.

Med Image Anal ; 76: 102308, 2022 02.

Article in English | MEDLINE | ID: mdl-34856455

ABSTRACT

Content-based histopathological image retrieval (CBHIR) has become popular in recent years in histopathological image analysis. CBHIR systems provide auxiliary diagnosis information for pathologists by searching for and returning regions that are contently similar to the region of interest (ROI) from a pre-established database. It is challenging and yet significant in clinical applications to retrieve diagnostically relevant regions from a database consisting of histopathological whole slide images (WSIs). In this paper, we propose a novel framework for regions retrieval from WSI database based on location-aware graphs and deep hash techniques. Compared to the present CBHIR framework, both structural information and global location information of ROIs in the WSI are preserved by graph convolution and self-attention operations, which makes the retrieval framework more sensitive to regions that are similar in tissue distribution. Moreover, benefited from the graph structure, the proposed framework has good scalability for both the size and shape variation of ROIs. It allows the pathologist to define query regions using free curves according to the appearance of tissue. Thirdly, the retrieval is achieved based on the hash technique, which ensures the framework is efficient and adequate for practical large-scale WSI database. The proposed method was evaluated on an in-house endometrium dataset with 2650 WSIs and the public ACDC-LungHP dataset. The experimental results have demonstrated that the proposed method achieved a mean average precision above 0.667 on the endometrium dataset and above 0.869 on the ACDC-LungHP dataset in the task of irregular region retrieval, which are superior to the state-of-the-art methods. The average retrieval time from a database containing 1855 WSIs is 0.752 ms. The source code is available at https://github.com/zhengyushan/lagenet.

Subject(s)

Image Processing, Computer-Assisted , Software , Databases, Factual , Female , Humans

2.

Stain Standardization Capsule for Application-Driven Histopathological Image Normalization.

Zheng, Yushan; Jiang, Zhiguo; Zhang, Haopeng; Xie, Fengying; Hu, Dingyi; Sun, Shujiao; Shi, Jun; Xue, Chenghai.

IEEE J Biomed Health Inform ; 25(2): 337-347, 2021 02.

Article in English | MEDLINE | ID: mdl-32248128

ABSTRACT

Color consistency is crucial to developing robust deep learning methods for histopathological image analysis. With the increasing application of digital histopathological slides, the deep learning methods are probably developed based on the data from multiple medical centers. This requirement makes it a challenging task to normalize the color variance of histopathological images from different medical centers. In this paper, we propose a novel color standardization module named stain standardization capsule based on the capsule network and the corresponding dynamic routing algorithm. The proposed module can learn and generate uniform stain separation outputs for histopathological images in various color appearance without the reference to manually selected template images. The proposed module is light and can be jointly trained with the application-driven CNN model. The proposed method was validated on three histopathology datasets and a cytology dataset, and was compared with state-of-the-art methods. The experimental results have demonstrated that the SSC module is effective in improving the performance of histopathological image analysis and has achieved the best performance in the compared methods.

Subject(s)

Coloring Agents , Image Processing, Computer-Assisted , Algorithms , Humans , Reference Standards , Staining and Labeling

3.

SUV39H1 regulates the progression of MLL-AF9-induced acute myeloid leukemia.

Chu, Yajing; Chen, Yangpeng; Guo, Huidong; Li, Mengke; Wang, Bichen; Shi, Deyang; Cheng, Xuelian; Guan, Jinxia; Wang, Xiaomin; Xue, Chenghai; Cheng, Tao; Shi, Jun; Yuan, Weiping.

Oncogene ; 39(50): 7239-7252, 2020 12.

Article in English | MEDLINE | ID: mdl-33037410

ABSTRACT

Epigenetic regulations play crucial roles in leukemogenesis and leukemia progression. SUV39H1 is the dominant H3K9 methyltransferase in the hematopoietic system, and its expression declines with aging. However, the role of SUV39H1 via its-mediated repressive modification H3K9me3 in leukemogenesis/leukemia progression remains to be explored. We found that SUV39H1 was down-regulated in a variety of leukemias, including MLL-r AML, as compared with normal individuals. Decreased levels of Suv39h1 expression and genomic H3K9me3 occupancy were observed in LSCs from MLL-r-induced AML mouse models in comparison with that of hematopoietic stem/progenitor cells. Suv39h1 overexpression increased leukemia latency and decreased the frequency of LSCs in MLL-r AML mouse models, while Suv39h1 knockdown accelerated disease progression with increased number of LSCs. Increased Suv39h1 expression led to the inactivation of Hoxb13 and Six1, as well as reversion of Hoxa9/Meis1 downstream target genes, which in turn decelerated leukemia progression. Interestingly, Hoxb13 expression is up-regulated in MLL-AF9-induced AML cells, while knockdown of Hoxb13 in MLL-AF9 leukemic cells significantly prolonged the survival of leukemic mice with reduced LSC frequencies. Our data revealed that SUV39H1 functions as a tumor suppressor in MLL-AF9-induced AML progression. These findings provide the direct link of SUV39H1 to AML development and progression.

Subject(s)

Disease Progression , Leukemia, Myeloid, Acute/pathology , Methyltransferases/metabolism , Myeloid-Lymphoid Leukemia Protein/metabolism , Oncogene Proteins, Fusion/metabolism , Repressor Proteins/metabolism , Animals , Apoptosis , Cell Line, Tumor , Cell Transformation, Neoplastic , Female , Gene Expression Regulation, Neoplastic , Hematopoietic Stem Cells/cytology , Histones/metabolism , Humans , Leukemia, Myeloid, Acute/genetics , Lysine/metabolism , Methylation , Mice , Transcription, Genetic

4.

Effects of somatic alterations at pathway level are more mechanism-explanatory and clinically applicable to quantity of liver metastases of colorectal cancer.

Zhang, Zhong-Guo; Ma, Fei; Zhao, Shuang; Yang, Xiaoyu; Liu, Fang; Xue, Chenghai; Liu, Liren; Gu, Jin; Piao, Haozhe.

Cancer Med ; 8(10): 4732-4742, 2019 08.

Article in English | MEDLINE | ID: mdl-31219228

ABSTRACT

BACKGROUND: The quantity of metastases lesions is an important reference when it comes to making a more informed treatment decision for patients with colorectal cancer liver metastases. However, the molecular alterations in patients with different numbers of lesions have not been systematically studied. METHODS: We investigated somatic alterations and microsatellite instability (MSI) of liver metastases from patients with single, multiple or diffuse metastasis lesions. A new algorithm "Pathway Damage Score" was developed to comprehensively assess the functional impact of somatic alterations at the pathway level. Pathogenic pathways of different metastasis were identified and their prognosis effects were evaluated. Furthermore, the subnetworks and affected phenotypes of the altered genes in each pathogenic pathway were analyzed. RESULTS: Somatic alterations and altered genes occurred sporadically as well as in MSI state in different metastasis types, although MSS patients had more metastatic lesions than that of the MSI patients. Every metastasis group has their own pathogenic pathways and damaged "Cargo recognition for clathrin-mediated endocytosis" is significantly associated with poor prognosis (P < 0.001). Further pathway subnetwork analysis showed that except conventional drivers, other genes could also contribute to metastasis formation. CONCLUSIONS: Progression of liver metastasis could be driven by the coefficient of all altered genes belonging to the pathways. Thus, compared to somatic alterations and genes, pathway level analysis is more reasonable for functional interpretations of molecular alterations in clinical samples.

Subject(s)

Colorectal Neoplasms/pathology , Gene Regulatory Networks , Liver Neoplasms/secondary , Microsatellite Instability , Algorithms , Clinical Decision-Making , Colorectal Neoplasms/genetics , Disease Progression , Humans , Liver Neoplasms/genetics , Neoplasm Staging , Survival Analysis , Exome Sequencing

5.

The fusion landscape of hepatocellular carcinoma.

Zhu, Chengpei; Wu, Liangcai; Lv, Yanling; Guan, Jinxia; Bai, Xue; Lin, Jianzhen; Liu, Tingting; Yang, Xiaobo; Robson, Simon C; Sang, Xinting; Xue, Chenghai; Zhao, Haitao.

Mol Oncol ; 13(5): 1214-1225, 2019 05.

Article in English | MEDLINE | ID: mdl-30903738

ABSTRACT

Most cases of hepatocellular carcinoma (HCC) are already advanced at the time of diagnosis, which limits treatment options. Challenges in early-stage diagnosis may be due to the genetic complexity of HCC. Gene fusion plays a critical function in tumorigenesis and cancer progression in multiple cancers, yet the identities of fusion genes as potential diagnostic markers in HCC have not been investigated. Here, we employed STAR-Fusion and identified 43 recurrent fusion events in our own and four public RNA-seq datasets. We identified 2354 different gene fusions in two hepatitis B virus (HBV)-HCC patients. Validation analysis against the four RNA-seq datasets revealed that only 1.8% (43/2354) were recurrent fusions. Comparison with the four fusion databases demonstrated that 19 recurrent fusions were not previously annotated to diseases and three were annotated as disease-related fusion events. Finally, we validated six of the novel fusion events, including RP11-476K15.1-CTD-2015H3.2, by RT-PCR and Sanger sequencing of 14 pairs of HBV-related HCC samples. In summary, our study provides new insights into gene fusions in HCC and may contribute to the development of anti-HCC therapy.

Subject(s)

Carcinoma, Hepatocellular/genetics , Databases, Nucleic Acid , Liver Neoplasms/genetics , Oncogene Proteins, Fusion/genetics , Carcinoma, Hepatocellular/metabolism , Carcinoma, Hepatocellular/pathology , Carcinoma, Hepatocellular/virology , Female , Gene Expression Regulation, Neoplastic , Hepatitis B virus/genetics , Hepatitis B virus/metabolism , Humans , Liver Neoplasms/metabolism , Liver Neoplasms/pathology , Liver Neoplasms/virology , Male , Oncogene Proteins, Fusion/metabolism

6.

Adaptive color deconvolution for histological WSI normalization.

Zheng, Yushan; Jiang, Zhiguo; Zhang, Haopeng; Xie, Fengying; Shi, Jun; Xue, Chenghai.

Comput Methods Programs Biomed ; 170: 107-120, 2019 Mar.

Article in English | MEDLINE | ID: mdl-30712599

ABSTRACT

BACKGROUND AND OBJECTIVE: Color consistency of histological images is significant for developing reliable computer-aided diagnosis (CAD) systems. However, the color appearance of digital histological images varies across different specimen preparations, staining, and scanning situations. This variability affects the diagnosis and decreases the accuracy of CAD approaches. It is important and challenging to develop effective color normalization methods for digital histological images. METHODS: We proposed a novel adaptive color deconvolution (ACD) algorithm for stain separation and color normalization of hematoxylin-eosin-stained whole slide images (WSIs). To avoid artifacts and reduce the failure rate of normalization, multiple prior knowledges of staining are considered and embedded in the ACD model. To improve the capacity of color normalization for various WSIs, an integrated optimization is designed to simultaneously estimate the parameters of the stain separation and color normalization. The solving of ACD model and application of the proposed method involves only pixel-wise operation, which makes it very efficient and applicable to WSIs. RESULTS: The proposed method was evaluated on four WSI-datasets including breast, lung and cervix cancers and was compared with 6 state-of-the-art methods. The proposed method achieved the most consistent performance in color normalization according to the quantitative metrics. Through a qualitative assessment for 500 WSIs, the failure rate of normalization was 0.4% and the structure and color artifacts were effectively avoided. Applied to CAD methods, the area under receiver operating characteristic curve for cancer image classification was improved from 0.842 to 0.914. The average time of solving the ACD model is 2.97 s. CONCLUSIONS: The proposed ACD model has prone effective for color normalization of hematoxylin-eosin-stained WSIs in various color appearances. The model is robust and can be applied to WSIs containing different lesions. The proposed model can be efficiently solved and is effective to improve the performance of cancer image recognition, which is adequate for developing automatic CAD programs and systems based on WSIs.

Subject(s)

Color , Diagnosis, Computer-Assisted/methods , Histological Techniques , Staining and Labeling , Algorithms , Humans , Neoplasms

7.

Genome-wide DNA methylation analysis identifies candidate epigenetic markers and drivers of hepatocellular carcinoma.

Zheng, Yongchang; Huang, Qianqian; Ding, Zijian; Liu, Tingting; Xue, Chenghai; Sang, Xinting; Gu, Jin.

Brief Bioinform ; 19(1): 101-108, 2018 01 01.

Article in English | MEDLINE | ID: mdl-27760737

ABSTRACT

The alteration of DNA methylation landscape is a key epigenetic event in cancer. As the accumulation of large-scale genome-wide DNA methylation data from clinical samples, we are able to characterize the patterns of DNA methylation alterations for identifying candidate epigenetic markers and drivers. In this survey, we take hepatocellular carcinoma (HCC) as an example to show the basic steps of analyzing the DNA methylation patterns in cancer across multiple data sets. We collected three genome-wide DNA methylation data sets with â¼800 clinical samples and the corresponding gene expression data sets. First, by quantitatively analyzing two global methylation alterations, it is found that about 90% tumors acquire either genome-wide DNA hypo-methylation or CpG island methylator phenotype. Second, probe-level analysis identified 267, 228 and 197 hyper-methylated sites in promoter regions for the three data sets, respectively. These local hyper-methylated patterns are highly consistent: 84 sites (from 61 promoters) are hyper-methylated in all the three studied data sets, including many previously reported genes, such as CDKL2, TBX15 and NKX6-2. Then, these hyper-methylated sites were used as candidate markers to classify tumor and non-tumor samples. The classifiers based on only 10 selected probes can achieve high discriminative ability across different data sets. Finally, by integrative analyzing DNA methylation and gene expression data, we identified 222 candidate epigenetic drivers, which are enriched in inflammatory response and multiple metabolic pathways. A set of high-confidence candidates, including SFN, SPP1 and TKT, are significantly associated with patients' overall survivals. In summary, this study systematically characterized the DNA methylation alterations and their impacts on gene expressions in HCCs based on multiple data sets.

Subject(s)

Carcinoma, Hepatocellular/genetics , DNA Methylation , Epigenesis, Genetic , Gene Expression Regulation, Neoplastic , Genome, Human , Liver Neoplasms/genetics , Biomarkers, Tumor/genetics , Carcinoma, Hepatocellular/pathology , CpG Islands , Humans , Liver Neoplasms/pathology , Promoter Regions, Genetic

8.

HITS-CLIP and integrative modeling define the Rbfox splicing-regulatory network linked to brain development and autism.

Weyn-Vanhentenryck, Sebastien M; Mele, Aldo; Yan, Qinghong; Sun, Shuying; Farny, Natalie; Zhang, Zuo; Xue, Chenghai; Herre, Margaret; Silver, Pamela A; Zhang, Michael Q; Krainer, Adrian R; Darnell, Robert B; Zhang, Chaolin.

Cell Rep ; 6(6): 1139-1152, 2014 Mar 27.

Article in English | MEDLINE | ID: mdl-24613350

ABSTRACT

The RNA binding proteins Rbfox1/2/3 regulate alternative splicing in the nervous system, and disruption of Rbfox1 has been implicated in autism. However, comprehensive identification of functional Rbfox targets has been challenging. Here, we perform HITS-CLIP for all three Rbfox family members in order to globally map, at a single-nucleotide resolution, their in vivo RNA interaction sites in the mouse brain. We find that the two guanines in the Rbfox binding motif UGCAUG are critical for protein-RNA interactions and crosslinking. Using integrative modeling, these interaction sites, combined with additional datasets, define 1,059 direct Rbfox target alternative splicing events. Over half of the quantifiable targets show dynamic changes during brain development. Of particular interest are 111 events from 48 candidate autism-susceptibility genes, including syndromic autism genes Shank3, Cacna1c, and Tsc2. Alteration of Rbfox targets in some autistic brains is correlated with downregulation of all three Rbfox proteins, supporting the potential clinical relevance of the splicing-regulatory network.

Subject(s)

Autistic Disorder/genetics , Brain/growth & development , Gene Regulatory Networks , RNA-Binding Proteins/genetics , RNA/genetics , Repressor Proteins/genetics , Alternative Splicing , Animals , Autistic Disorder/metabolism , Base Sequence , Brain/metabolism , Exons , Genetic Predisposition to Disease , Humans , Immunoprecipitation , Mice , Models, Genetic , Models, Molecular , Molecular Sequence Data , RNA/metabolism , RNA-Binding Proteins/metabolism , Repressor Proteins/metabolism

9.

Landscape of transcription in human cells.

Djebali, Sarah; Davis, Carrie A; Merkel, Angelika; Dobin, Alex; Lassmann, Timo; Mortazavi, Ali; Tanzer, Andrea; Lagarde, Julien; Lin, Wei; Schlesinger, Felix; Xue, Chenghai; Marinov, Georgi K; Khatun, Jainab; Williams, Brian A; Zaleski, Chris; Rozowsky, Joel; Röder, Maik; Kokocinski, Felix; Abdelhamid, Rehab F; Alioto, Tyler; Antoshechkin, Igor; Baer, Michael T; Bar, Nadav S; Batut, Philippe; Bell, Kimberly; Bell, Ian; Chakrabortty, Sudipto; Chen, Xian; Chrast, Jacqueline; Curado, Joao; Derrien, Thomas; Drenkow, Jorg; Dumais, Erica; Dumais, Jacqueline; Duttagupta, Radha; Falconnet, Emilie; Fastuca, Meagan; Fejes-Toth, Kata; Ferreira, Pedro; Foissac, Sylvain; Fullwood, Melissa J; Gao, Hui; Gonzalez, David; Gordon, Assaf; Gunawardena, Harsha; Howald, Cedric; Jha, Sonali; Johnson, Rory; Kapranov, Philipp; King, Brandon.

Nature ; 489(7414): 101-8, 2012 Sep 06.

Article in English | MEDLINE | ID: mdl-22955620

ABSTRACT

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

Subject(s)

DNA/genetics , Encyclopedias as Topic , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription, Genetic/genetics , Transcriptome/genetics , Alleles , Cell Line , DNA, Intergenic/genetics , Enhancer Elements, Genetic , Exons/genetics , Gene Expression Profiling , Genes/genetics , Genomics , Humans , Polyadenylation/genetics , Protein Isoforms/genetics , RNA/biosynthesis , RNA/genetics , RNA Editing/genetics , RNA Splicing/genetics , Repetitive Sequences, Nucleic Acid/genetics , Sequence Analysis, RNA

10.

Finding noncoding RNA transcripts from low abundance expressed sequence tags.

Xue, Chenghai; Li, Fei; Li, Fei.

Cell Res ; 18(6): 695-700, 2008 Jun.

Article in English | MEDLINE | ID: mdl-18504459

ABSTRACT

It has been proved that noncoding RNA (ncRNA) genes are much more numerous than expected. However, it remains a difficult task to identify ncRNAs with either computational algorithms or biological experiments. Recent reports have suggested that ncRNAs may also appear in the expressed sequence tags (EST's) database. Nevertheless, intergenic ESTs have received little attention and are poorly annotated owing to their low abundance. Here, we have developed a computational strategy for discovering ncRNA genes from human ESTs. We first collected ESTs that are located in the intergenic regions and do not have detailed annotations. The intergenic regions were divided into non-overlapping 50-nt windows and PhastCons scores obtained from the UCSC database were assigned to these windows. We kept conserved windows that had PhastCons scores of over 0.8 and that had at least three supporting ESTs to act as seeds. Each cluster of ESTs corresponding to the seeds was assembled into a long contig. We used two criteria to screen for ncRNA transcripts from these contigs: the first was that the longest predicted open reading frame was less than 300 nt and the second was that the likely Pol-II promoters exist within 2,000 nt upstream or downstream of the contigs. As a result, 118 novel ncRNA genes were identified from human low abundance ESTs. Of seven randomly selected candidates, six were transcribed in human 2BS cells as shown by RT-PCR. Our work proves that the EST is a 'hidden treasure' for detecting novel ncRNA genes.

Subject(s)

Expressed Sequence Tags , RNA, Untranslated/genetics , Transcription, Genetic/genetics , Chromosomes, Human, Pair 21/genetics , Conserved Sequence , DNA, Intergenic/metabolism , Genome, Human/genetics , Humans , Promoter Regions, Genetic/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reproducibility of Results , Reverse Transcriptase Polymerase Chain Reaction , Software , Species Specificity

11.

Functional importance of different patterns of correlation between adjacent cassette exons in human and mouse.

Peng, Tao; Xue, Chenghai; Bi, Jianning; Li, Tingting; Wang, Xiaowo; Zhang, Xuegong; Li, Yanda.

BMC Genomics ; 9: 191, 2008 Apr 26.

Article in English | MEDLINE | ID: mdl-18439302

ABSTRACT

BACKGROUND: Alternative splicing expands transcriptome diversity and plays an important role in regulation of gene expression. Previous studies focus on the regulation of a single cassette exon, but recent experiments indicate that multiple cassette exons within a gene may interact with each other. This interaction can increase the potential to generate various transcripts and adds an extra layer of complexity to gene regulation. Several cases of exon interaction have been discovered. However, the extent to which the cassette exons coordinate with each other remains unknown. RESULTS: Based on EST data, we employed a metric of correlation coefficients to describe the interaction between two adjacent cassette exons and then categorized these exon pairs into three different groups by their interaction (correlation) patterns. Sequence analysis demonstrates that strongly-correlated groups are more conserved and contain a higher proportion of pairs with reading frame preservation in a combinatorial manner. Multiple genome comparison further indicates that different groups of correlated pairs have different evolutionary courses: (1) The vast majority of positively-correlated pairs are old, (2) most of the weakly-correlated pairs are relatively young, and (3) negatively-correlated pairs are a mixture of old and young events. CONCLUSION: We performed a large-scale analysis of interactions between adjacent cassette exons. Compared with weakly-correlated pairs, the strongly-correlated pairs, including both the positively and negatively correlated ones, show more evidence that they are under delicate splicing control and tend to be functionally important. Additionally, the positively-correlated pairs bear strong resemblance to constitutive exons, which suggests that they may evolve from ancient constitutive exons, while negatively and weakly correlated pairs are more likely to contain newly emerging exons.

Subject(s)

Exons/genetics , Alternative Splicing/genetics , Animals , Base Sequence , Conserved Sequence , DNA/genetics , Evolution, Molecular , Genome, Human/genetics , Genomics , Humans , Introns/genetics , Mice , RNA, Messenger/genetics , Reading Frames , Selection, Genetic

12.

RScan: fast searching structural similarities for structured RNAs in large databases.

Xue, Chenghai; Liu, Guo-Ping.

BMC Genomics ; 8: 257, 2007 Jul 31.

Article in English | MEDLINE | ID: mdl-17663795

ABSTRACT

BACKGROUND: Many RNAs have evolutionarily conserved secondary structures instead of primary sequences. Recently, there are an increasing number of methods being developed with focus on the structural alignments for finding conserved secondary structures as well as common structural motifs in pair-wise or multiple sequences. A challenging task is to search similar structures quickly for structured RNA sequences in large genomic databases since existing methods are too slow to be used in large databases. RESULTS: An implementation of a fast structural alignment algorithm, RScan, is proposed to fulfill the task. RScan is developed by levering the advantages of both hashing algorithms and local alignment algorithms. In our experiment, on the average, the times for searching a tRNA and an rRNA in the randomized A. pernix genome are only 256 seconds and 832 seconds respectively by using RScan, but need 3,178 seconds and 8,951 seconds respectively by using an existing method RSEARCH. Remarkably, RScan can handle large database queries, taking less than 4 minutes for searching similar structures for a microRNA precursor in human chromosome 21. CONCLUSION: These results indicate that RScan is a preferable choice for real-life application of searching structural similarities for structured RNAs in large databases. RScan software is freely available at http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm.

Subject(s)

Databases, Nucleic Acid , Nucleic Acid Conformation , RNA/analysis , Sequence Analysis, RNA/methods , Sequence Homology, Nucleic Acid , Algorithms , Animals , Base Sequence , Chromosome Mapping , Chromosomes, Human/chemistry , Electronic Data Processing , Humans , MicroRNAs/analysis , MicroRNAs/chemistry , Molecular Sequence Data , RNA/chemistry , Sequence Alignment

13.

Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine.

Xue, Chenghai; Li, Fei; He, Tao; Liu, Guo-Ping; Li, Yanda; Zhang, Xuegong.

BMC Bioinformatics ; 6: 310, 2005 Dec 29.

Article in English | MEDLINE | ID: mdl-16381612

ABSTRACT

BACKGROUND: MicroRNAs (miRNAs) are a group of short (approximately 22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology. RESULTS: A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information. CONCLUSION: The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.

Subject(s)

Computational Biology/methods , MicroRNAs/classification , MicroRNAs/genetics , Nucleic Acid Conformation , Animals , Base Sequence , Computer Simulation , False Positive Reactions , Genome , Genomics , Humans , Models, Statistical , Molecular Sequence Data , RNA, Messenger/metabolism , Software , Species Specificity

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL