Search | VHL Regional Portal

1.

Mining Semantic Correlations Between Mispredictions and Corrections for Interactive Semantic Segmentation.

Gao, Yutong; Lang, Congyan; Liu, Fayao; Foo, Chuan-Sheng; Cao, Yuanzhouhan; Sun, Lijuan; Wei, Yunchao.

IEEE Trans Neural Netw Learn Syst ; PP2024 Apr 10.

Article in English | MEDLINE | ID: mdl-38598394

ABSTRACT

Interactive semantic segmentation pursues high-quality segmentation results at the cost of a small number of user clicks. It is attracting more and more research attention for its convenience in labeling semantic pixel-level data. Existing interactive segmentation methods often pursue higher interaction efficiency by mining the latent information of user clicks or exploring efficient interaction manners. However, these works neglect to explicitly exploit the semantic correlations between user corrections and model mispredictions, thus suffering from two flaws. First, similar prediction errors frequently occur in actual use, causing users to repeatedly correct them. Second, the interaction difficulty of different semantic classes varies across images, but existing models use monotonic parameters for all images which lack semantic pertinence. Therefore, in this article, we explore the semantic correlations existing in corrections and mispredictions by proposing a simple yet effective online learning solution to the above problems, named correction-misprediction correlation mining ( CM2 ). Specifically, we leverage the correction-misprediction similarities to design a confusion memory module (CMM) for automatic correction when similar prediction errors reappear. Furthermore, we measure the semantic interaction difficulty by counting the correction-misprediction pairs and design a challenge adaptive convolutional layer (CACL), which can adaptively switch different parameters according to interaction difficulties to better segment the challenging classes. Our method requires no extra training besides the online learning process and can effectively improve interaction efficiency. Our proposed CM2 achieves state-of-the-art results on three public semantic segmentation benchmarks.

2.

COFT-AD: COntrastive Fine-Tuning for Few-Shot Anomaly Detection.

Liao, Jingyi; Xu, Xun; Nguyen, Manh Cuong; Goodge, Adam; Foo, Chuan Sheng.

IEEE Trans Image Process ; 33: 2090-2103, 2024.

Article in English | MEDLINE | ID: mdl-38470590

ABSTRACT

Existing approaches towards anomaly detection (AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we propose a novel methodology to address the challenge of FSAD which incorporates two important techniques. Firstly, we employ a model pre-trained on a large source dataset to initialize model weights. Secondly, to ameliorate the covariate shift between source and target domains, we adopt contrastive training to fine-tune on the few-shot target domain data. To learn suitable representations for the downstream AD task, we additionally incorporate cross-instance positive pairs to encourage a tight cluster of the normal samples, and negative pairs for better separation between normal and synthesized negative samples. We evaluate few-shot anomaly detection on 3 controlled AD tasks and 4 real-world AD tasks to demonstrate the effectiveness of the proposed method.

3.

On Representation Knowledge Distillation for Graph Neural Networks.

Joshi, Chaitanya K; Liu, Fayao; Xun, Xu; Lin, Jie; Foo, Chuan Sheng.

IEEE Trans Neural Netw Learn Syst ; PP2022 Dec 02.

Article in English | MEDLINE | ID: mdl-36459610

ABSTRACT

Knowledge distillation (KD) is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the local structure preserving (LSP) loss, which matches local structural relationships defined over edges across the student and teacher's node embeddings. This article studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose graph contrastive representation distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across four datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving (GSP) variant of LSP) as well as baselines from 2-D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other.

4.

SemiCurv: Semi-Supervised Curvilinear Structure Segmentation.

Xu, Xun; Nguyen, Manh Cuong; Yazici, Yasin; Lu, Kangkang; Min, Hlaing; Foo, Chuan-Sheng.

IEEE Trans Image Process ; 31: 5109-5120, 2022.

Article in English | MEDLINE | ID: mdl-35895645

ABSTRACT

Recent work on curvilinear structure segmentation has mostly focused on backbone network design and loss engineering. The challenge of collecting labelled data, an expensive and labor intensive process, has been overlooked. While labelled data is expensive to obtain, unlabelled data is often readily available. In this work, we propose SemiCurv, a semi-supervised learning (SSL) framework for curvilinear structure segmentation that is able to utilize such unlabelled data to reduce the labelling burden. Our framework addresses two key challenges in formulating curvilinear segmentation in a semi-supervised manner. First, to fully exploit the power of consistency based SSL, we introduce a geometric transformation as strong data augmentation and then align segmentation predictions via a differentiable inverse transformation to enable the computation of pixel-wise consistency. Second, the traditional mean square error (MSE) on unlabelled data is prone to collapsed predictions and this issue exacerbates with severe class imbalance (significantly more background pixels). We propose a N-pair consistency loss to avoid trivial predictions on unlabelled data. We evaluate SemiCurv on six curvilinear segmentation datasets, and find that with no more than 5% of the labelled data, it achieves close to 95% of the performance relative to its fully supervised counterpart.

5.

High-Resolution Digital Phenotypes From Consumer Wearables and Their Applications in Machine Learning of Cardiometabolic Risk Markers: Cohort Study.

Zhou, Weizhuang; Chan, Yu En; Foo, Chuan Sheng; Zhang, Jingxian; Teo, Jing Xian; Davila, Sonia; Huang, Weiting; Yap, Jonathan; Cook, Stuart; Tan, Patrick; Chin, Calvin Woon-Loong; Yeo, Khung Keong; Lim, Weng Khong; Krishnaswamy, Pavitra.

J Med Internet Res ; 24(7): e34669, 2022 07 29.

Article in English | MEDLINE | ID: mdl-35904853

ABSTRACT

BACKGROUND: Consumer-grade wearable devices enable detailed recordings of heart rate and step counts in free-living conditions. Recent studies have shown that summary statistics from these wearable recordings have potential uses for longitudinal monitoring of health and disease states. However, the relationship between higher resolution physiological dynamics from wearables and known markers of health and disease remains largely uncharacterized. OBJECTIVE: We aimed to derive high-resolution digital phenotypes from observational wearable recordings and to examine their associations with modifiable and inherent markers of cardiometabolic disease risk. METHODS: We introduced a principled framework to extract interpretable high-resolution phenotypes from wearable data recorded in free-living conditions. The proposed framework standardizes the handling of data irregularities; encodes contextual information regarding the underlying physiological state at any given time; and generates a set of 66 minimally redundant features across active, sedentary, and sleep states. We applied our approach to a multimodal data set, from the SingHEART study (NCT02791152), which comprises heart rate and step count time series from wearables, clinical screening profiles, and whole genome sequences from 692 healthy volunteers. We used machine learning to model nonlinear relationships between the high-resolution phenotypes on the one hand and clinical or genomic risk markers for blood pressure, lipid, weight and sugar abnormalities on the other. For each risk type, we performed model comparisons based on Brier scores to assess the predictive value of high-resolution features over and beyond typical baselines. We also qualitatively characterized the wearable phenotypes for participants who had actualized clinical events. RESULTS: We found that the high-resolution features have higher predictive value than typical baselines for clinical markers of cardiometabolic disease risk: the best models based on high-resolution features had 17.9% and 7.36% improvement in Brier score over baselines based on age and gender and resting heart rate, respectively (P<.001 in each case). Furthermore, heart rate dynamics from different activity states contain distinct information (maximum absolute correlation coefficient of 0.15). Heart rate dynamics in sedentary states are most predictive of lipid abnormalities and obesity, whereas patterns in active states are most predictive of blood pressure abnormalities (P<.001). Moreover, in comparison with standard measures, higher resolution patterns in wearable heart rate recordings are better able to represent subtle physiological dynamics related to genomic risk for cardiometabolic disease (improvement of 11.9%-22.0% in Brier scores; P<.001). Finally, illustrative case studies reveal connections between these high-resolution phenotypes and actualized clinical events, even for borderline profiles lacking apparent cardiometabolic risk markers. CONCLUSIONS: High-resolution digital phenotypes recorded by consumer wearables in free-living states have the potential to enhance the prediction of cardiometabolic disease risk and could enable more proactive and personalized health management.

Subject(s)

Cardiovascular Diseases , Wearable Electronic Devices , Cardiovascular Diseases/diagnosis , Clinical Studies as Topic , Cohort Studies , Humans , Lipids , Machine Learning , Phenotype

6.

MA-GANet: A Multi-Attention Generative Adversarial Network for Defocus Blur Detection.

Jiang, Zeyu; Xu, Xun; Zhang, Le; Zhang, Chao; Foo, Chuan Sheng; Zhu, Ce.

IEEE Trans Image Process ; 31: 3494-3508, 2022.

Article in English | MEDLINE | ID: mdl-35533163

ABSTRACT

Background clutters pose challenges to defocus blur detection. Existing approaches often produce artifact predictions in background areas with clutter and relatively low confident predictions in boundary areas. In this work, we tackle the above issues from two perspectives. Firstly, inspired by the recent success of self-attention mechanism, we introduce channel-wise and spatial-wise attention modules to attentively aggregate features at different channels and spatial locations to obtain more discriminative features. Secondly, we propose a generative adversarial training strategy to suppress spurious and low reliable predictions. This is achieved by utilizing a discriminator to identify predicted defocus map from ground-truth ones. As such, the defocus network (generator) needs to produce 'realistic' defocus map to minimize discriminator loss. We further demonstrate that the generative adversarial training allows exploiting additional unlabeled data to improve performance, a.k.a. semi-supervised learning, and we provide the first benchmark on semi-supervised defocus detection. Finally, we demonstrate that the existing evaluation metrics for defocus detection generally fail to quantify the robustness with respect to thresholding. For a fair and practical evaluation, we introduce an effective yet efficient AUFß metric. Extensive experiments on three public datasets verify the superiority of the proposed methods compared against state-of-the-art approaches.

7.

Neural network modeling of differential binding between wild-type and mutant CTCF reveals putative binding preferences for zinc fingers 1-2.

Kaplow, Irene M; Banerjee, Abhimanyu; Foo, Chuan Sheng.

BMC Genomics ; 23(1): 295, 2022 Apr 12.

Article in English | MEDLINE | ID: mdl-35410161

ABSTRACT

BACKGROUND: Many transcription factors (TFs), such as multi zinc-finger (ZF) TFs, have multiple DNA binding domains (DBDs), and deciphering the DNA binding motifs of individual DBDs is a major challenge. One example of such a TF is CCCTC-binding factor (CTCF), a TF with eleven ZFs that plays a variety of roles in transcriptional regulation, most notably anchoring DNA loops. Previous studies found that CTCF ZFs 3-7 bind CTCF's core motif and ZFs 9-11 bind a specific upstream motif, but the motifs of ZFs 1-2 have yet to be identified. RESULTS: We developed a new approach to identifying the binding motifs of individual DBDs of a TF through analyzing chromatin immunoprecipitation sequencing (ChIP-seq) experiments in which a single DBD is mutated: we train a deep convolutional neural network to predict whether wild-type TF binding sites are preserved in the mutant TF dataset and interpret the model. We applied this approach to mouse CTCF ChIP-seq data and identified the known binding preferences of CTCF ZFs 3-11 as well as a putative GAG binding motif for ZF 1. We analyzed other CTCF datasets to provide additional evidence that ZF 1 is associated with binding at the motif we identified, and we found that the presence of the motif for ZF 1 is associated with CTCF ChIP-seq peak strength. CONCLUSIONS: Our approach can be applied to any TF for which in vivo binding data from both the wild-type and mutated versions of the TF are available, and our findings provide new potential insights binding preferences of CTCF's DBDs.

Subject(s)

Transcription Factors , Zinc , Animals , Binding Sites , CCCTC-Binding Factor/metabolism , DNA/metabolism , Mice , Neural Networks, Computer , Protein Binding , Transcription Factors/genetics , Transcription Factors/metabolism , Zinc/metabolism , Zinc Fingers/genetics

8.

Scalable and Practical Natural Gradient for Large-Scale Deep Learning.

Osawa, Kazuki; Tsuji, Yohei; Ueno, Yuichiro; Naruse, Akira; Foo, Chuan-Sheng; Yokota, Rio.

IEEE Trans Pattern Anal Mach Intell ; 44(1): 404-415, 2022 01.

Article in English | MEDLINE | ID: mdl-32750792

ABSTRACT

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad hoc modifications of batch normalization. We propose scalable and practical natural gradient descent (SP-NGD), a principled approach for training models that allows them to attain similar generalization performance to models trained with first-order optimization methods, but with accelerated convergence. Furthermore, SP-NGD scales to large mini-batch sizes with a negligible computational overhead as compared to first-order methods. We evaluated SP-NGD on a benchmark task where highly optimized first-order methods are available as references: training a ResNet-50 model for image classification on ImageNet. We demonstrate convergence to a top-1 validation accuracy of 75.4 percent in 5.5 minutes using a mini-batch size of 32,768 with 1,024 GPUs, as well as an accuracy of 74.9 percent with an extremely large mini-batch size of 131,072 in 873 steps of SP-NGD.

Subject(s)

Deep Learning , Algorithms , Benchmarking , Neural Networks, Computer

9.

An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series.

Garg, Astha; Zhang, Wenyu; Samaran, Jules; Savitha, Ramasamy; Foo, Chuan-Sheng.

IEEE Trans Neural Netw Learn Syst ; 33(6): 2508-2517, 2022 06.

Article in English | MEDLINE | ID: mdl-34464278

ABSTRACT

Several techniques for multivariate time series anomaly detection have been proposed recently, but a systematic comparison on a common set of datasets and metrics is lacking. This article presents a systematic and comprehensive evaluation of unsupervised and semisupervised deep-learning-based methods for anomaly detection and diagnosis on multivariate time series data from cyberphysical systems. Unlike previous works, we vary the model and post-processing of model errors, i.e., the scoring functions independently of each other, through a grid of ten models and four scoring functions, comparing these variants to state-of-the-art methods. In time-series anomaly detection, detecting anomalous events is more important than detecting individual anomalous time points. Through experiments, we find that the existing evaluation metrics either do not take events into account or cannot distinguish between a good detector and trivial detectors, such as a random or an all-positive detector. We propose a new metric to overcome these drawbacks, namely, the composite F-score (Fc1), for evaluating time-series anomaly detection. Our study highlights that dynamic scoring functions work much better than static ones for multivariate time series anomaly detection, and the choice of scoring functions often matters more than the choice of the underlying model. We also find that a simple, channel-wise model-the univariate fully connected auto-encoder, with the dynamic Gaussian scoring function emerges as a winning candidate for both anomaly detection and diagnosis, beating state-of-the-art algorithms.

Subject(s)

Algorithms , Neural Networks, Computer , Supervised Machine Learning , Time Factors

10.

Exploring Spatial Diversity for Region-Based Active Learning.

Cai, Lile; Xu, Xun; Zhang, Lining; Foo, Chuan-Sheng.

IEEE Trans Image Process ; 30: 8702-8712, 2021.

Article in English | MEDLINE | ID: mdl-34665728

ABSTRACT

State-of-the-art methods for semantic segmentation are based on deep neural networks trained on large-scale labeled datasets. Acquiring such datasets would incur large annotation costs, especially for dense pixel-level prediction tasks like semantic segmentation. We consider region-based active learning as a strategy to reduce annotation costs while maintaining high performance. In this setting, batches of informative image regions instead of entire images are selected for labeling. Importantly, we propose that enforcing local spatial diversity is beneficial for active learning in this case, and to incorporate spatial diversity along with the traditional active selection criterion, e.g., data sample uncertainty, in a unified optimization framework for region-based active learning. We apply this framework to the Cityscapes and PASCAL VOC datasets and demonstrate that the inclusion of spatial diversity effectively improves the performance of uncertainty-based and feature diversity-based active learning methods. Our framework achieves 95% performance of fully supervised methods with only 5 - 9% of the labeled pixels, outperforming all state-of-the-art region-based active learning methods for semantic segmentation.

11.

Semi-supervised classification of radiology images with NoTeacher: A teacher that is not mean.

Unnikrishnan, Balagopal; Nguyen, Cuong; Balaram, Shafa; Li, Chao; Foo, Chuan Sheng; Krishnaswamy, Pavitra.

Med Image Anal ; 73: 102148, 2021 10.

Article in English | MEDLINE | ID: mdl-34274693

ABSTRACT

Deep learning models achieve strong performance for radiology image classification, but their practical application is bottlenecked by the need for large labeled training datasets. Semi-supervised learning (SSL) approaches leverage small labeled datasets alongside larger unlabeled datasets and offer potential for reducing labeling cost. In this work, we introduce NoTeacher, a novel consistency-based SSL framework which incorporates probabilistic graphical models. Unlike Mean Teacher which maintains a teacher network updated via a temporal ensemble, NoTeacher employs two independent networks, thereby eliminating the need for a teacher network. We demonstrate how NoTeacher can be customized to handle a range of challenges in radiology image classification. Specifically, we describe adaptations for scenarios with 2D and 3D inputs, with uni and multi-label classification, and with class distribution mismatch between labeled and unlabeled portions of the training data. In realistic empirical evaluations on three public benchmark datasets spanning the workhorse modalities of radiology (X-Ray, CT, MRI), we show that NoTeacher achieves over 90-95% of the fully supervised AUROC with less than 5-15% labeling budget. Further, NoTeacher outperforms established SSL methods with minimal hyperparameter tuning, and has implications as a principled and practical option for semi-supervised learning in radiology applications.

Subject(s)

Radiology , Supervised Machine Learning , Humans , Radiography

12.

Global translation during early development depends on the essential transcription factor PRDM10.

Han, Brenda Y; Seah, Michelle K Y; Brooks, Imogen R; Quek, Delia H P; Huxley, Dominic R; Foo, Chuan-Sheng; Lee, Li Ting; Wollmann, Heike; Guo, Huili; Messerschmidt, Daniel M; Guccione, Ernesto.

Nat Commun ; 11(1): 3603, 2020 07 17.

Article in English | MEDLINE | ID: mdl-32681107

ABSTRACT

Members of the PR/SET domain-containing (PRDM) family of zinc finger transcriptional regulators play diverse developmental roles. PRDM10 is a yet uncharacterized family member, and its function in vivo is unknown. Here, we report an essential requirement for PRDM10 in pre-implantation embryos and embryonic stem cells (mESCs), where loss of PRDM10 results in severe cell growth inhibition. Detailed genomic and biochemical analyses reveal that PRDM10 functions as a sequence-specific transcription factor. We identify Eif3b, which encodes a core component of the eukaryotic translation initiation factor 3 (eIF3) complex, as a key downstream target, and demonstrate that growth inhibition in PRDM10-deficient mESCs is in part mediated through EIF3B-dependent effects on global translation. Our work elucidates the molecular function of PRDM10 in maintaining global translation, establishes its essential role in early embryonic development and mESC homeostasis, and offers insights into the functional repertoire of PRDMs as well as the transcriptional mechanisms regulating translation.

Subject(s)

Gene Expression Regulation, Developmental , Mice/metabolism , Transcription Factors/metabolism , Animals , Embryonic Development , Embryonic Stem Cells/metabolism , Eukaryotic Initiation Factors/genetics , Eukaryotic Initiation Factors/metabolism , Female , Intracellular Signaling Peptides and Proteins/genetics , Intracellular Signaling Peptides and Proteins/metabolism , Male , Mice/embryology , Mice/genetics , Protein Biosynthesis , Transcription Factors/genetics

13.

Holistic Multi-modal Memory Network for Movie Question Answering.

Wang, Anran; Luu, Anh Tuan; Foo, Chuan-Sheng; Zhu, Hongyuan; Tay, Yi; Chandrasekhar, Vijay.

IEEE Trans Image Process ; 2019 Aug 02.

Article in English | MEDLINE | ID: mdl-31395548

ABSTRACT

Answering questions using multi-modal context is a challenging problem as it requires a deep integration of diverse data sources. Existing approaches only consider a subset of all possible interactions among data sources during one attention hop. In this paper, we present a Holistic Multi-modal Memory Network (HMMN) framework that fully considers interactions between different input sources (multi-modal context, question) at each hop. In addition, to hone in on relevant information, our framework takes answer choices into consideration during the context retrieval stage. Our HMMN framework effectively integrates information from the multi-modal context, question, and answer choices, enabling more informative context to be retrieved for question answering. Experimental results on the MovieQA and TVQA datasets validate the effectiveness of our HMMN framework. Extensive ablation studies show the importance of holistic reasoning and reveal the contributions of different attention strategies to model performance.

14.

The C2H2-ZF transcription factor Zfp335 recognizes two consensus motifs using separate zinc finger arrays.

Han, Brenda Yuyuan; Foo, Chuan-Sheng; Wu, Shuang; Cyster, Jason G.

Genes Dev ; 30(13): 1509-14, 2016 07 01.

Article in English | MEDLINE | ID: mdl-27401554

ABSTRACT

The complexities of DNA recognition by transcription factors (TFs) with multiple Cys2-His2 zinc fingers (C2H2-ZFs) remain poorly studied. We previously reported a mutation (R1092W) in the C2H2-ZF TF Zfp335 that led to selective loss of binding at a subset of targets, although the basis for this effect was unclear. We show that Zfp335 binds DNA and drives transcription via recognition of two distinct consensus motifs by separate ZF clusters and identify the specific motif interaction disrupted by R1092W. Our work presents Zfp335 as a model for understanding how C2H2-ZF TFs may use multiple recognition motifs to control gene expression.

Subject(s)

Gene Expression Regulation/genetics , Transcription Factors/metabolism , Zinc Fingers/physiology , Animals , DNA-Binding Proteins , HEK293 Cells , Humans , Intracellular Signaling Peptides and Proteins , Jurkat Cells , Mice , Mutation , Nuclear Proteins , Protein Binding/genetics , Transcription Factors/chemistry , Transcription Factors/genetics , Zinc Fingers/genetics

15.

Zinc finger protein Zfp335 is required for the formation of the naïve T cell compartment.

Han, Brenda Y; Wu, Shuang; Foo, Chuan-Sheng; Horton, Robert M; Jenne, Craig N; Watson, Susan R; Whittle, Belinda; Goodnow, Chris C; Cyster, Jason G.

Elife ; 32014 Oct 24.

Article in English | MEDLINE | ID: mdl-25343476

ABSTRACT

The generation of naïve T lymphocytes is critical for immune function yet the mechanisms governing their maturation remain incompletely understood. We have identified a mouse mutant, bloto, that harbors a hypomorphic mutation in the zinc finger protein Zfp335. Zfp335(bloto/bloto) mice exhibit a naïve T cell deficiency due to an intrinsic developmental defect that begins to manifest in the thymus and continues into the periphery, affecting T cells that have recently undergone thymic egress. The effects of Zfp335(bloto) are multigenic and cannot be attributed to altered thymic selection, proliferation or Bcl2-dependent survival. Zfp335 binds to promoter regions via a consensus motif, and its target genes are enriched in categories related to protein metabolism, mitochondrial function, and transcriptional regulation. Restoring the expression of one target, Ankle2, partially rescues T cell maturation. These findings identify Zfp335 as a transcription factor and essential regulator of late-stage intrathymic and post-thymic T cell maturation.

Subject(s)

Membrane Proteins/genetics , Mutation , Nuclear Proteins/genetics , T-Lymphocytes/metabolism , Thymus Gland/metabolism , Transcription Factors/genetics , Zinc Fingers/genetics , Amino Acid Sequence , Animals , Base Sequence , Cell Differentiation , Gene Expression Profiling , Gene Expression Regulation , Genetic Complementation Test , Immunity, Innate , Membrane Proteins/immunology , Membrane Proteins/metabolism , Mice , Mice, Inbred CBA , Mice, Transgenic , Molecular Sequence Data , Nuclear Proteins/immunology , Nuclear Proteins/metabolism , Promoter Regions, Genetic , Protein Binding , Proto-Oncogene Proteins c-bcl-2/genetics , Proto-Oncogene Proteins c-bcl-2/immunology , Proto-Oncogene Proteins c-bcl-2/metabolism , Sequence Alignment , Signal Transduction , T-Lymphocytes/immunology , T-Lymphocytes/pathology , Thymus Gland/immunology , Thymus Gland/pathology , Transcription Factors/immunology , Transcription Factors/metabolism , Transcription, Genetic , Zinc Fingers/immunology

16.

A max-margin model for efficient simultaneous alignment and folding of RNA sequences.

Do, Chuong B; Foo, Chuan-Sheng; Batzoglou, Serafim.

Bioinformatics ; 24(13): i68-76, 2008 Jul 01.

Article in English | MEDLINE | ID: mdl-18586747

ABSTRACT

MOTIVATION: The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for simultaneous alignment and consensus folding of unaligned RNA sequences. Algorithmically, RAF exploits sparsity in the set of likely pairing and alignment candidates for each nucleotide (as identified by the CONTRAfold or CONTRAlign programs) to achieve an effectively quadratic running time for simultaneous pairwise alignment and folding. RAF's fast sparse dynamic programming, in turn, serves as the inference engine within a discriminative machine learning algorithm for parameter estimation. RESULTS: In cross-validated benchmark tests, RAF achieves accuracies equaling or surpassing the current best approaches for RNA multiple sequence secondary structure prediction. However, RAF requires nearly an order of magnitude less time than other simultaneous folding and alignment methods, thus making it especially appropriate for high-throughput studies. AVAILABILITY: Source code for RAF is available at:http://contra.stanford.edu/contrafold/.

Subject(s)

Algorithms , Consensus Sequence/genetics , RNA/genetics , RNA/ultrastructure , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Base Sequence , Computer Simulation , Models, Chemical , Models, Molecular , Molecular Sequence Data , Nucleic Acid Conformation

17.

Discovering protein complexes in dense reliable neighborhoods of protein interaction networks.

Li, Xiao-Li; Foo, Chuan-Sheng; Ng, See-Kiong.

Comput Syst Bioinformatics Conf ; 6: 157-68, 2007.

Article in English | MEDLINE | ID: mdl-17951821

ABSTRACT

Multiprotein complexes play central roles in many cellular pathways. Although many high-throughput experimental techniques have already enabled systematic screening of pairwise protein-protein interactions en masse, the amount of experimentally determined protein complex data has remained relatively lacking. As such, researchers have begun to exploit the vast amount of pairwise interaction data to help discover new protein complexes. However, mining for protein complexes in interaction networks is not an easy task because there are many data artefacts in the underlying protein-protein interaction data due to the limitations in the current high-throughput screening methods. We propose a novel DECAFF (Dense-neighborhood Extraction using Connectivity and conFidence Features) algorithm to mine for dense and reliable subgraphs in protein interaction networks. Our method is devised to address two major limitations in current high throughout protein interaction data, namely, incompleteness and high data noise. Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by DECAFF matched significantly better with actual protein complexes than other existing approaches. Our results demonstrate that pairwise protein interaction networks can be effectively mined to discover new protein complexes, provided that the data artefacts in the underlying interaction data are taken into account adequately.

Subject(s)

Algorithms , Models, Biological , Multiprotein Complexes/metabolism , Protein Interaction Mapping/methods , Proteome/metabolism , Signal Transduction/physiology , Computer Simulation

18.

Interaction graph mining for protein complexes using local clique merging.

Li, Xiao-Li; Tan, Soon-Heng; Foo, Chuan-Sheng; Ng, See-Kiong.

Genome Inform ; 16(2): 260-9, 2005.

Article in English | MEDLINE | ID: mdl-16901108

ABSTRACT

While recent technological advances have made available large datasets of experimentally-detected pairwise protein-protein interactions, there is still a lack of experimentally-determined protein complex data. To make up for this lack of protein complex data, we explore the mining of existing protein interaction graphs for protein complexes. This paper proposes a novel graph mining algorithm to detect the dense neighborhoods (highly connected regions) in an interaction graph which may correspond to protein complexes. Our algorithm first locates local cliques for each graph vertex (protein) and then merge the detected local cliques according to their affinity to form maximal dense regions. We present experimental results with yeast protein interaction data to demonstrate the effectiveness of our proposed method. Compared with other existing techniques, our predicted complexes can match or overlap significantly better with the known protein complexes in the MIPS benchmark database. Novel protein complexes were also predicted to help biologists in their search for new protein complexes.

Subject(s)

Algorithms , Multiprotein Complexes/physiology , Protein Interaction Mapping/statistics & numerical data , Multiprotein Complexes/chemistry , Predictive Value of Tests , Protein Interaction Mapping/methods , Saccharomyces cerevisiae Proteins/chemistry

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL