Search | VHL Regional Portal

1.

Two-Part and Related Regression Models for Longitudinal Data.

Farewell, V T; Long, D L; Tom, B D M; Yiu, S; Su, L.

Annu Rev Stat Appl ; 4: 283-315, 2017 Mar.

Article in English | MEDLINE | ID: mdl-28890906

ABSTRACT

Statistical models that involve a two-part mixture distribution are applicable in a variety of situations. Frequently, the two parts are a model for the binary response variable and a model for the outcome variable that is conditioned on the binary response. Two common examples are zero-inflated or hurdle models for count data and two-part models for semicontinuous data. Recently, there has been particular interest in the use of these models for the analysis of repeated measures of an outcome variable over time. The aim of this review is to consider motivations for the use of such models in this context and to highlight the central issues that arise with their use. We examine two-part models for semicontinuous and zero-heavy count data, and we also consider models for count data with a two-part random effects distribution.

2.

rMFilter: acceleration of long read-based structure variation calling by chimeric read filtering.

Liu, Bo; Jiang, Tao; Yiu, S M; Li, Junyi; Wang, Yadong.

Bioinformatics ; 33(17): 2750-2752, 2017 Sep 01.

Article in English | MEDLINE | ID: mdl-28482046

ABSTRACT

MOTIVATION: Long read sequencing technologies provide new opportunities to investigate genome structural variations (SVs) more accurately. However, the state-of-the-art SV calling pipelines are computational intensive and the applications of long reads are restricted. RESULTS: We propose a local region match-based filter (rMFilter) to efficiently nail down chimeric noisy long reads based on short token matches within local genomic regions. rMFilter is able to substantially accelerate long read-based SV calling pipelines without loss of effectiveness. It can be easily integrated into current long read-based pipelines to facilitate SV studies. AVAILABILITY AND IMPLEMENTATION: The C ++ source code of rMFilter is available at https://github.com/hitbc/rMFilter . CONTACT: ydwang@hit.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genomic Structural Variation , Genomics/methods , Sequence Analysis, DNA/methods , Software , Genome, Human , Humans

3.

An improved parameter estimation scheme for image modification detection based on DCT coefficient analysis.

Yu, Liyang; Han, Qi; Niu, Xiamu; Yiu, S M; Fang, Junbin; Zhang, Ye.

Forensic Sci Int ; 259: 200-9, 2016 Feb.

Article in English | MEDLINE | ID: mdl-26804669

ABSTRACT

Most of the existing image modification detection methods which are based on DCT coefficient analysis model the distribution of DCT coefficients as a mixture of a modified and an unchanged component. To separate the two components, two parameters, which are the primary quantization step, Q1, and the portion of the modified region, α, have to be estimated, and more accurate estimations of α and Q1 lead to better detection and localization results. Existing methods estimate α and Q1 in a completely blind manner, without considering the characteristics of the mixture model and the constraints to which α should conform. In this paper, we propose a more effective scheme for estimating α and Q1, based on the observations that, the curves on the surface of the likelihood function corresponding to the mixture model is largely smooth, and α can take values only in a discrete set. We conduct extensive experiments to evaluate the proposed method, and the experimental results confirm the efficacy of our method.

4.

Sequence assembly using next generation sequencing data--challenges and solutions.

Chin, Francis Y L; Leung, Henry C M; Yiu, S M.

Sci China Life Sci ; 57(11): 1140-8, 2014 Nov.

Article in English | MEDLINE | ID: mdl-25326069

ABSTRACT

Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence. Compared with traditional Sanger sequencing methods, although the throughput of NGS reads increases, the read length is shorter and the error rate is higher. It introduces several problems in assembling. Moreover, paired-end reads instead of single-end reads can be sampled which contain more information. The existing assemblers cannot fully utilize this information and fails to assemble longer contigs. In this article, we will revisit the major problems of assembling NGS reads on genomic, transcriptomic, metagenomic and metatranscriptomic data. We will also describe our IDBA package for solving these problems. IDBA package has adopted several novel ideas in assembling, including using multiple k, local assembling and progressive depth removal. Compared with existence assemblers, IDBA has better performance on many simulated and real sequencing datasets.

Subject(s)

Computational Biology/methods , DNA/chemistry , RNA/chemistry , Sequence Analysis, DNA/methods , Algorithms , Contig Mapping/methods , Escherichia coli/genetics , False Positive Reactions , Genome , Genome, Bacterial , Humans , Lactobacillus plantarum/genetics , Metagenomics , Software , Transcription, Genetic , Transcriptome

5.

Predictive factors and radiological features of radiation-induced cranial nerve palsy in patients with nasopharyngeal carcinoma following radical radiotherapy.

Luk, Yiu S; Shum, John S F; Sze, Henry C K; Chan, Lucy L K; Ng, W T; Lee, Anne W M.

Oral Oncol ; 49(1): 49-54, 2013 Jan.

Article in English | MEDLINE | ID: mdl-22892236

ABSTRACT

OBJECTIVES: To identify the key predictive factors of radiation-induced cranial nerve palsy in patients with nasopharyngeal carcinoma (NPC). METHOD AND MATERIALS: From November 1998 to December 2007, all consecutive patients with newly diagnosed NPC who were curatively treated with radiotherapy and subsequently developed radiation-induced cranial nerve palsy (RICNP) were included in our study. Patients with cranial nerve palsy due to disease recurrence were excluded. Their records were retrospectively reviewed. RESULTS: Amongst 965 patients with NPC treated with radical radiotherapy, 41 developed new cranial nerve palsy. After exclusion of 5 patients with cranial nerve palsy due to recurrence, 36 (3.7%) developed RICNP. The median follow-up was 8.9 years (range, 3.2-11.3 years). Ten of the 36 patients had cranial nerve palsy at presentation. Twenty-seven patients had single cranial nerve palsy and 9 patients had multiple cranial nerve palsy. The most commonly involved cranial nerve was cranial nerve XII, with 30 patients having palsy of cranial nerve XII and 6 of them having bilateral cranial nerve XII palsies. Magnetic resonance imaging features of radiation-induced hypoglossal nerve palsy were demonstrated in our study. Multivariate analysis revealed that cranial nerve palsy at presentation was an independent prognostic factor for the development of RICNP. Other factors including T staging, N staging, gender, age, radiotherapy technique and the use of chemotherapy have no significant relationship with the risk of developing RICNP. CONCLUSION: RICNP in patients with NPC is not a rare complication, and cranial nerve palsy at presentation is an important prognostic factor.

Subject(s)

Carcinoma/radiotherapy , Hypoglossal Nerve Injuries/etiology , Nasopharyngeal Neoplasms/radiotherapy , Paralysis/etiology , Radiation Injuries/etiology , Trigeminal Nerve Injuries/etiology , Age Factors , Brachytherapy , Chemotherapy, Adjuvant , Contrast Media , Female , Follow-Up Studies , Forecasting , Gadolinium , Humans , Magnetic Resonance Imaging/methods , Male , Middle Aged , Neoplasm Staging , Prognosis , Radiotherapy, Conformal , Radiotherapy, High-Energy , Radiotherapy, Intensity-Modulated , Retrospective Studies , Sex Factors

6.

MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample.

Wang, Yi; Leung, Henry C M; Yiu, S M; Chin, Francis Y L.

Bioinformatics ; 28(18): i356-i362, 2012 Sep 15.

Article in English | MEDLINE | ID: mdl-22962452

ABSTRACT

MOTIVATION: Metagenomic binning remains an important topic in metagenomic analysis. Existing unsupervised binning methods for next-generation sequencing (NGS) reads do not perform well on (i) samples with low-abundance species or (ii) samples (even with high abundance) when there are many extremely low-abundance species. These two problems are common for real metagenomic datasets. Binning methods that can solve these problems are desirable. RESULTS: We proposed a two-round binning method (MetaCluster 5.0) that aims at identifying both low-abundance and high-abundance species in the presence of a large amount of noise due to many extremely low-abundance species. In summary, MetaCluster 5.0 uses a filtering strategy to remove noise from the extremely low-abundance species. It separate reads of high-abundance species from those of low-abundance species in two different rounds. To overcome the issue of low coverage for low-abundance species, multiple w values are used to group reads with overlapping w-mers, whereas reads from high-abundance species are grouped with high confidence based on a large w and then binning expands to low-abundance species using a relaxed (shorter) w. Compared to the recent tools, TOSS and MetaCluster 4.0, MetaCluster 5.0 can find more species (especially those with low abundance of say 6× to 10×) and can achieve better sensitivity and specificity using less memory and running time. AVAILABILITY: http://i.cs.hku.hk/~alse/MetaCluster/ CONTACT: chin@cs.hku.hk.

Subject(s)

Metagenomics/methods , Software , Algorithms , Sensitivity and Specificity , Sequence Analysis, DNA/methods

7.

An efficient alignment algorithm for searching simple pseudoknots over long genomic sequence.

Ma, Christopher; Wong, Thomas K F; Lam, T W; Hon, W K; Sadakane, K; Yiu, S M.

IEEE/ACM Trans Comput Biol Bioinform ; 9(6): 1629-38, 2012.

Article in English | MEDLINE | ID: mdl-22848134

ABSTRACT

Structural alignment has been shown to be an effective computational method to identify structural noncoding RNA(ncRNA) candidates as ncRNAs are known to be conserved in secondary structures. However, the complexity of the structural alignment algorithms becomes higher when the structure has pseudoknots. Even for the simplest type of pseudoknots (simple pseudoknots), the fastest algorithm runs in O(mn3) time, where m, n are the length of the query ncRNA (with known structure) and the length of the target sequence (with unknown structure), respectively. In practice, we are usually given a long DNA sequence and we try to locate regions in the sequence for possible candidates of a particular ncRNA. Thus, we need to run the structural alignment algorithm on every possible region in the long sequence. For example, finding candidates for a known ncRNA of length 100 on a sequence of length 50,000, it takes more than one day. In this paper, we provide an efficient algorithm to solve the problem for simple pseudoknots and it is shown to be 10 times faster. The speedup stems from an effective pruning strategy consisting of the computation of a lower bound score for the optimal alignment and an estimation of the maximum score that a candidate can achieve to decide whether to prune the current candidate or not.

Subject(s)

Algorithms , Computational Biology/methods , Genome , Nucleic Acid Conformation , Sequence Analysis, DNA/methods , DNA/chemistry , DNA/genetics , Models, Genetic , RNA, Untranslated/chemistry , RNA, Untranslated/genetics , Software

8.

Structural alignment of RNA with triple helix structure.

Wong, Thomas K F; Yiu, S M.

J Comput Biol ; 19(4): 365-78, 2012 Apr.

Article in English | MEDLINE | ID: mdl-22468707

ABSTRACT

Structural alignment is useful in identifying members of ncRNAs. Existing tools are all based on the secondary structures of the molecules. There is evidence showing that tertiary interactions (the interaction between a single-stranded nucleotide and a base-pair) in triple helix structures are critical in some functions of ncRNAs. In this article, we address the problem of structural alignment of RNAs with the triple helix. We provide a formal definition to capture a simplified model of a triple helix structure, then develop an algorithm of O(mn(3)) time to align a query sequence (of length m) with known triple helix structure with a target sequence (of length n) with an unknown structure. The resulting algorithm is shown to be useful in identifying ncRNA members in a simulated genome.

Subject(s)

Algorithms , RNA, Untranslated/chemistry , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Base Sequence , Models, Molecular , Nucleic Acid Conformation

9.

IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth.

Peng, Yu; Leung, Henry C M; Yiu, S M; Chin, Francis Y L.

Bioinformatics ; 28(11): 1420-8, 2012 Jun 01.

Article in English | MEDLINE | ID: mdl-22495754

ABSTRACT

MOTIVATION: Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even. These assemblers fail to construct correct long contigs. RESULTS: We introduce the IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single-cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. Several non-trivial techniques have been employed to tackle the problems. Instead of using a simple threshold, we use multiple depthrelative thresholds to remove erroneous k-mers in both low-depth and high-depth regions. The technique of local assembly with paired-end information is used to solve the branch problem of low-depth short repeat regions. To speed up the process, an error correction step is conducted to correct reads of high-depth regions that can be aligned to highconfident contigs. Comparison of the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC, SOAPdenovo and Meta-IDBA) for different datasets, shows that IDBA-UD can reconstruct longer contigs with higher accuracy. AVAILABILITY: The IDBA-UD toolkit is available at our website http://www.cs.hku.hk/~alse/idba_ud

Subject(s)

Algorithms , Metagenomics/methods , Sequence Analysis, DNA/methods , Single-Cell Analysis/methods , Bacteria/genetics , Genome , High-Throughput Nucleotide Sequencing

10.

MetaCluster 4.0: a novel binning algorithm for NGS reads and huge number of species.

Wang, Yi; Leung, Henry C M; Yiu, S M; Chin, Francis Y L.

J Comput Biol ; 19(2): 241-9, 2012 Feb.

Article in English | MEDLINE | ID: mdl-22300323

ABSTRACT

Next-generation sequencing (NGS) technologies allow the sequencing of microbial communities directly from the environment without prior culturing. The output of environmental DNA sequencing consists of many reads from genomes of different unknown species, making the clustering together reads from the same (or similar) species (also known as binning) a crucial step. The difficulties of the binning problem are due to the following four factors: (1) the lack of reference genomes; (2) uneven abundance ratio of species; (3) short NGS reads; and (4) a large number of species (can be more than a hundred). None of the existing binning tools can handle all four factors. No tools, including both AbundanceBin and MetaCluster 3.0, have demonstrated reasonable performance on a sample with more than 20 species. In this article, we introduce MetaCluster 4.0, an unsupervised binning algorithm that can accurately (with about 80% precision and sensitivity in all cases and at least 90% in some cases) and efficiently bin short reads with varying abundance ratios and is able to handle datasets with 100 species. The novelty of MetaCluster 4.0 stems from solving a few important problems: how to divide reads into groups by a probabilistic approach, how to estimate the 4-mer distribution of each group, how to estimate the number of species, and how to modify MetaCluster 3.0 to handle a large number of species. We show that Meta Cluster 4.0 is effective for both simulated and real datasets. Supplementary Material is available at www.liebertonline.com/cmb.

Subject(s)

Algorithms , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/methods , Software , Bacteria/genetics , Base Sequence , Cluster Analysis , Data Interpretation, Statistical , Genome, Bacterial , Models, Statistical

11.

Memory efficient algorithms for structural alignment of RNAs with pseudoknots.

Wong, Thomas K F; Chiu, Y S; Lam, T W; Yiu, S M.

IEEE/ACM Trans Comput Biol Bioinform ; 9(1): 161-8, 2012.

Article in English | MEDLINE | ID: mdl-21464506

ABSTRACT

In this paper, we consider the problem of structural alignment of a target RNA sequence of length n and a query RNA sequence of length m with known secondary structure that may contain simple pseudoknots or embedded simple pseudoknots. The best known algorithm for solving this problem runs in O(mn3) time for simple pseudoknot or O(mn4) time for embedded simple pseudoknot with space complexity of O(mn3) for both structures, which require too much memory making it infeasible for comparing noncoding RNAs (ncRNAs) with length several hundreds or more. We propose memory efficient algorithms to solve the same problem. We reduce the space complexity to O(n3) for simple pseudoknot and O(mn2 + n3) for embedded simple pseudoknot while maintaining the same time complexity. We also show how to modify our algorithm to handle a restricted class of recursive simple pseudoknot which is found abundant in real data with space complexity of O(mn2 + n3) and time complexity of O(mn4). Experimental results show that our algorithms are feasible for comparing ncRNAs of length more than 500.

Subject(s)

Algorithms , Nucleic Acid Conformation , RNA, Untranslated/chemistry , Sequence Alignment/methods , Sequence Analysis, RNA/methods

12.

Meta-IDBA: a de Novo assembler for metagenomic data.

Peng, Yu; Leung, Henry C M; Yiu, S M; Chin, Francis Y L.

Bioinformatics ; 27(13): i94-101, 2011 Jul 01.

Article in English | MEDLINE | ID: mdl-21685107

ABSTRACT

MOTIVATION: Next-generation sequencing techniques allow us to generate reads from a microbial environment in order to analyze the microbial community. However, assembling of a set of mixed reads from different species to form contigs is a bottleneck of metagenomic research. Although there are many assemblers for assembling reads from a single genome, there are no assemblers for assembling reads in metagenomic data without reference genome sequences. Moreover, the performances of these assemblers on metagenomic data are far from satisfactory, because of the existence of common regions in the genomes of subspecies and species, which make the assembly problem much more complicated. RESULTS: We introduce the Meta-IDBA algorithm for assembling reads in metagenomic data, which contain multiple genomes from different species. There are two core steps in Meta-IDBA. It first tries to partition the de Bruijn graph into isolated components of different species based on an important observation. Then, for each component, it captures the slight variants of the genomes of subspecies from the same species by multiple alignments and represents the genome of one species, using a consensus sequence. Comparison of the performances of Meta-IDBA and existing assemblers, such as Velvet and Abyss for different metagenomic datasets shows that Meta-IDBA can reconstruct longer contigs with similar accuracy. AVAILABILITY: Meta-IDBA toolkit is available at our website http://www.cs.hku.hk/~alse/metaidba. CONTACT: chin@cs.hku.hk.

Subject(s)

Algorithms , Metagenomics/methods , Software , Escherichia coli/classification , Escherichia coli/genetics , Escherichia coli/isolation & purification , Genome, Bacterial , Sequence Analysis, DNA/methods

13.

A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio.

Leung, Henry C M; Yiu, S M; Yang, Bin; Peng, Yu; Wang, Yi; Liu, Zhihua; Chen, Jingchi; Qin, Junjie; Li, Ruiqiang; Chin, Francis Y L.

Bioinformatics ; 27(11): 1489-95, 2011 Jun 01.

Article in English | MEDLINE | ID: mdl-21493653

ABSTRACT

MOTIVATION: With the rapid development of next-generation sequencing techniques, metagenomics, also known as environmental genomics, has emerged as an exciting research area that enables us to analyze the microbial environment in which we live. An important step for metagenomic data analysis is the identification and taxonomic characterization of DNA fragments (reads or contigs) resulting from sequencing a sample of mixed species. This step is referred to as 'binning'. Binning algorithms that are based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms or phylogenetic markers. Due to the limited availability of reference genomes and the bias and low availability of markers, these algorithms may not be applicable in all cases. Unsupervised binning algorithms which can handle fragments from unknown species provide an alternative approach. However, existing unsupervised binning algorithms only work on datasets either with balanced species abundance ratios or rather different abundance ratios, but not both. RESULTS: In this article, we present MetaCluster 3.0, an integrated binning method based on the unsupervised top--down separation and bottom--up merging strategy, which can bin metagenomic fragments of species with very balanced abundance ratios (say 1:1) to very different abundance ratios (e.g. 1:24) with consistently higher accuracy than existing methods. AVAILABILITY: MetaCluster 3.0 can be downloaded at http://i.cs.hku.hk/~alse/MetaCluster/.

Subject(s)

Algorithms , Metagenomics/methods , Sequence Analysis, DNA , Cluster Analysis

14.

Structural alignment of RNA with complex pseudoknot structure.

Wong, Thomas K F; Lam, T W; Sung, Wing-Kin; Cheung, Brenda W Y; Yiu, S M.

J Comput Biol ; 18(1): 97-108, 2011 Jan.

Article in English | MEDLINE | ID: mdl-21210732

ABSTRACT

The secondary structure of an ncRNA molecule is known to play an important role in its biological functions. Aligning a known ncRNA to a target candidate to determine the sequence and structural similarity helps in identifying de novo ncRNA molecules that are in the same family of the known ncRNA. However, existing algorithms cannot handle complex pseudoknot structures which are found in nature. In this article, we propose algorithms to handle two types of complex pseudoknots: simple non-standard pseudoknots and recursive pseudoknots. Although our methods are not designed for general pseudoknots, it already covers all known ncRNAs in both Rfam and PseudoBase databases. An evaluation of our algorithms shows that it is useful to identify ncRNA molecules in other species which are in the same family of a known ncRNA.

Subject(s)

Computer Simulation , Models, Molecular , RNA, Untranslated/chemistry , Algorithms , Base Sequence , Humans , Molecular Sequence Data , Nucleic Acid Conformation , RNA, Untranslated/classification , Sequence Alignment

15.

Improving the accuracy of signal transduction pathway construction using Level-2 neighbours.

Wong, Thomas K F; Lam, T W; Yiu, S M; Wong, Simon C K.

Int J Bioinform Res Appl ; 6(6): 542-55, 2010.

Article in English | MEDLINE | ID: mdl-21354961

ABSTRACT

In this paper, we consider the problem of reconstructing a pathway for a given set of proteins based on available genomics and proteomics information such as gene expression data. In all previous approaches, the scoring function for a candidate pathway usually only depends on adjacent proteins in the pathway. We propose to also consider proteins that are of distance two in the pathway (we call them Level-2 neighbours). We derive a scoring function based on both adjacent proteins and Level-2 neighbours in the pathway and show that our scoring function can increase the accuracy of the predicted pathways through a set of experiments. The problem of computing the pathway with optimal score, in general, is NP-hard. We thus extend a randomised algorithm to make it work on our scoring function to compute the optimal pathway with high probability.

Subject(s)

Algorithms , Intracellular Signaling Peptides and Proteins/metabolism , Signal Transduction , Computational Biology , Databases, Protein , Genomics , Intracellular Signaling Peptides and Proteins/chemistry , Intracellular Signaling Peptides and Proteins/genetics , Proteomics

16.

Correcting short reads with high error rates for improved sequencing result.

Wong, Thomas K F; Lam, T W; Chan, P Y; Yiu, S M.

Int J Bioinform Res Appl ; 5(2): 224-37, 2009.

Article in English | MEDLINE | ID: mdl-19324607

ABSTRACT

In the sequencing process, reads of the sequence are generated, then assembled to form contigs. New technologies can produce reads faster with lower cost and higher coverage. However, these reads are shorter. With errors, short reads make the assembly step more difficult. Chaisson et al. (2004) proposed an algorithm to correct the reads prior to the assembly step. The result is not satisfactory when the error rate is high (e.g., >or=3%). We improve their approach to handle reads of higher error rates. Experimental results show that our approach is much more effective in correcting errors, producing contigs of higher quality.

Subject(s)

Computational Biology/methods , Sequence Analysis, DNA/methods , Algorithms , DNA/chemistry , Databases, Genetic , Sequence Alignment

17.

Predicting protein complexes from PPI data: a core-attachment approach.

Leung, Henry C M; Xiang, Qian; Yiu, S M; Chin, Francis Y L.

J Comput Biol ; 16(2): 133-44, 2009 Feb.

Article in English | MEDLINE | ID: mdl-19193141

ABSTRACT

UNLABELLED: Protein complexes play a critical role in many biological processes. Identifying the component proteins in a protein complex is an important step in understanding the complex as well as the related biological activities. This paper addresses the problem of predicting protein complexes from the protein-protein interaction (PPI) network of one species using a computational approach. Most of the previous methods rely on the assumption that proteins within the same complex would have relatively more interactions. This translates into dense subgraphs in the PPI network. However, the existing software tools have limited success. Recently, Gavin et al. (2006) provided a detailed study on the organization of protein complexes and suggested that a complex consists of two parts: a core and an attachment. Based on this core-attachment concept, we developed a novel approach to identify complexes from the PPI network by identifying their cores and attachments separately. We evaluated the effectiveness of our proposed approach using three different datasets and compared the quality of our predicted complexes with three existing tools. The evaluation results show that we can predict many more complexes and with higher accuracy than these tools with an improvement of over 30%. To verify the cores we identified in each complex, we compared our cores with the mediators produced by Andreopoulos et al. (2007), which were claimed to be the cores, based on the benchmark result produced by Gavin et al. (2006). We found that the cores we produced are of much higher quality ranging from 10- to 30-fold more correctly predicted cores and with better accuracy. AVAILABILITY: (http://alse.cs.hku.hk/complexes/).

Subject(s)

Models, Theoretical , Multiprotein Complexes , Protein Interaction Mapping , Software , Markov Chains , Mathematics , Multiprotein Complexes/chemistry , Multiprotein Complexes/metabolism , Proteins/chemistry , Proteins/metabolism

18.

Filtering of false positive microRNA candidates by a clustering-based approach.

Leung, Wing-Sze; Lin, Marie C M; Cheung, David W; Yiu, S M.

BMC Bioinformatics ; 9 Suppl 12: S3, 2008 Dec 12.

Article in English | MEDLINE | ID: mdl-19091026

ABSTRACT

BACKGROUND: MicroRNAs are small non-coding RNA gene products that play diversified roles from species to species. The explosive growth of microRNA researches in recent years proves the importance of microRNAs in the biological system and it is believed that microRNAs have valuable therapeutic potentials in human diseases. Continual efforts are therefore required to locate and verify the unknown microRNAs in various genomes. As many miRNAs are found to be arranged in clusters, meaning that they are in close proximity with their neighboring miRNAs, we are interested in utilizing the concept of microRNA clustering and applying it in microRNA computational prediction. RESULTS: We first validate the microRNA clustering phenomenon in the human, mouse and rat genomes. There are 45.45%, 51.86% and 48.67% of the total miRNAs that are clustered in the three genomes, respectively. We then conduct sequence and secondary structure similarity analyses among clustered miRNAs, non-clustered miRNAs, neighboring sequences of clustered miRNAs and random sequences, and find that clustered miRNAs are structurally more similar to one another, and the RNAdistance score can be used to assess the structural similarity between two sequences. We therefore design a clustering-based approach which utilizes this observation to filter false positives from a list of candidates generated by a selected microRNA prediction program, and successfully raise the positive predictive value by a considerable amount ranging from 15.23% to 23.19% in the human, mouse and rat genomes, while keeping a reasonably high sensitivity. CONCLUSION: Our clustering-based approach is able to increase the effectiveness of currently available microRNA prediction program by raising the positive predictive value while maintaining a high sensitivity, and hence can serve as a filtering step. We believe that it is worthwhile to carry out further experiments and tests with our approach using data from other genomes and other prediction software tools. Better results may be achieved with fine-tuning of parameters.

Subject(s)

Computational Biology/methods , MicroRNAs/chemistry , Algorithms , Animals , Cluster Analysis , Computer Simulation , False Positive Reactions , Genome , Humans , Mice , MicroRNAs/genetics , Predictive Value of Tests , Rats , Software

19.

Prevalence and significance of white-coat hypertension and masked hypertension in type 2 diabetics.

Ng, C M; Yiu, S F; Choi, K L; Choi, C H; Ng, Y W; Tiu, S C.

Hong Kong Med J ; 14(6): 437-43, 2008 Dec.

Article in English | MEDLINE | ID: mdl-19060342

ABSTRACT

OBJECTIVES: To explore the prevalence of various categories of hypertension in diabetic patients, and assess any corresponding associations with end-organ complications. DESIGN: Cross-sectional study. SETTING: Tertiary centre of a regional hospital in Hong Kong. PATIENTS: All ambulatory type 2 diabetic patients attending our clinics from January 2002 to November 2004 were invited to participate in the protocol. RESULTS: A total of 133 diabetic patients were included; 82 had normal clinic blood pressures, 15 (18%) of whom had masked hypertension, the remaining 67 (82%) had 'normotension'. The remaining 51 patients had high clinic blood pressures, of whom 28 (55%) had white-coat hypertension and 23 (45%) had sustained hypertension. Urinary albumin excretion rate was higher in patients with masked hypertension (10 mg/day; range, 7-580 mg/day) and sustained hypertension (7 mg/day; 7-3360 mg/day) in comparison to those with white-coat hypertension (7 mg/day; 7-109 mg/day) or 'normotension' (7 mg/day; 7-181 mg/day) [P<0.01]. Likewise, the prevalence of albuminuria was significantly higher in patients with masked hypertension (40%) and sustained hypertension (26%) than in those with 'normotension' (6%) and white-coat hypertension (11%) [P<0.01]. The prevalence of left ventricular hypertrophy was significantly higher in subjects with masked hypertension (38%) and sustained hypertension (26%) compared to patients with 'normotension' (8%) or white-coat hypertension (11%) [P<0.01]. Left ventricular diastolic dysfunction was more prevalent in patients with masked hypertension (46%), sustained hypertension (48%), and white-coat hypertension (43%) in comparison to subjects with 'normotension' (18%) [P=0.01]. CONCLUSION: Masked hypertension is associated with a higher prevalence of albuminuria, left ventricular diastolic dysfunction, and hypertrophy. White-coat hypertension carries a more benign prognosis than sustained hypertension and masked hypertension. Our cross-sectional study supports the recommendation to performing ambulatory blood pressure measurements in type 2 diabetic patients.

Subject(s)

Diabetes Mellitus, Type 2/complications , Hypertension/complications , Albuminuria/complications , Cross-Sectional Studies , Female , Humans , Male , Middle Aged , Prevalence , Ventricular Dysfunction, Left/complications

20.

Finding alternative splicing patterns with strong support from expressed sequences on individual exons/introns.

Wong, Thomas K F; Lam, Tak-Wah; Yang, Wanling; Yiu, S M.

J Bioinform Comput Biol ; 6(5): 1021-33, 2008 Oct.

Article in English | MEDLINE | ID: mdl-18942164

ABSTRACT

We consider the problem of predicting alternative splicing patterns from a set of expressed sequences (cDNAs and ESTs). Some of these expressed sequences may be errorous, thus forming incorrect exons/introns. These incorrect exons/introns may cause a lot of false positives. For example, we examined a popular alternative splicing database, ECgene, which predicts alternate splicing patterns from expressed sequences. The result shows that about 81.3%-81.6% (sensitivity) of known patterns are found, but the specificity can be as low as 5.9%. Based on the idea that errorous sequences are usually not consistent with other sequences, in this paper we provide an alternative approach for finding alternative splicing patterns which ensures that individual exons/introns of the reported patterns have enough support from the expressed sequences. On the same dataset, our approach can achieve a much higher specificity and a slight increase in sensitivity (38.9% and 84.9%, respectively). Our approach also gives better results compared with popular alternative splicing databases (ASD, ECgene, SpliceNest) and the software ClusterMerge.

Subject(s)

Algorithms , Alternative Splicing/genetics , Exons/genetics , Gene Expression/genetics , Introns/genetics , RNA Splice Sites/genetics , Sequence Analysis, DNA/methods , Base Sequence , Molecular Sequence Data

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL