Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Cybern ; 52(6): 5148-5160, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33175686

RESUMO

Short bursts of repeating patterns [intervals of recurrence (IoR)] manifest themselves in many applications, such as in the time-series data captured from an athlete's movements using a wearable sensor while performing exercises. We present an efficient, online, one-pass, and real-time algorithm for finding and tracking IoR in a time-series data stream. We provide a detailed theoretical analysis of the behavior of any IoR and derive fundamental properties that can be used on real-world data streams. We show that why our method, unlike current state-of-the-art techniques, is robust to variations in repeats of the same pattern adjacent to each other. To evaluate our algorithm, we build a wearable device that runs our algorithm to conduct a user study. Our results show that our algorithm can detect intervals of repeating activities on edge devices with high accuracy (over 70% F1 -Score) and in a real-time environment with only a 1.5-s lag. Our experimental results from real-world datasets demonstrate that our approach outperforms state-of-the-art algorithms in both accuracy and robustness to variations of the signal of recurrence.


Assuntos
Algoritmos , Dispositivos Eletrônicos Vestíveis , Terapia por Exercício , Humanos , Movimento , Fatores de Tempo
2.
IEEE Trans Cybern ; 49(5): 1629-1641, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-29994745

RESUMO

Dunn's internal cluster validity index is used to assess partition quality and subsequently identify a "best" crisp partition of n objects. Computing Dunn's index (DI) for partitions of n p -dimensional feature vector data has quadratic time complexity O(pn2) , so its computation is impractical for very large values of n . This note presents six methods for approximating DI. Four methods are based on Maximin sampling, which identifies a skeleton of the full partition that contains some boundary points in each cluster. Two additional methods are presented that estimate boundary points associated with unsupervised training of one class support vector machines. Numerical examples compare approximations to DI based on all six methods. Four experiments on seven real and synthetic data sets support our assertion that computing approximations to DI with an incremental, neighborhood-based Maximin skeleton is both tractable and reliably accurate.

3.
IEEE Trans Neural Netw Learn Syst ; 29(10): 5057-5070, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-29994608

RESUMO

One-class support vector machines (OCSVMs) are very effective for semisupervised anomaly detection. However, their performance strongly depends on the settings of their hyperparameters, which has not been well studied. Moreover, unavailability of a clean training set that only comprises normal data in many real-life problems has given rise to the application of OCSVMs in an unsupervised manner. However, it has been shown that if the training set includes anomalies, the normal boundary created by OCSVMs is prone to skew toward the anomalies. This problem decreases the detection rate of anomalies and results in poor performance of the classifier. In this paper, we propose a new technique to set the hyperparameters and clean suspected anomalies from unlabelled training sets. The proposed method removes suspected anomalies based on a $K$ -nearest neighbors technique, which is then used to directly estimate the hyperparameters. We examine several benchmark data sets with diverse distributions and dimensionality. Our findings suggest that on the examined data sets, the proposed technique is roughly 70 times faster than supervised parameter estimation via grid-search and cross validation, and one to three orders of magnitude faster than broadly used semisupervised and unsupervised parameter estimation methods for OCSVMs. Moreover, our method statistically outperforms those semisupervised and unsupervised methods and its accuracy is comparable to supervised grid-search and cross validation.

4.
IEEE Trans Cybern ; 46(10): 2372-2385, 2016 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26441434

RESUMO

Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k -means model. Specifically, we use k -means, single pass k -means, online k -means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k -means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4 292 637 samples in 41 dimensions) in 76 s.

5.
Bioinformatics ; 30(19): 2832-3, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24930143

RESUMO

MOTIVATION: Recent advances in high-throughput lipid profiling by liquid chromatography electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) have made it possible to quantify hundreds of individual molecular lipid species (e.g. fatty acyls, glycerolipids, glycerophospholipids, sphingolipids) in a single experimental run for hundreds of samples. This enables the lipidome of large cohorts of subjects to be profiled to identify lipid biomarkers significantly associated with disease risk, progression and treatment response. Clinically, these lipid biomarkers can be used to construct classification models for the purpose of disease screening or diagnosis. However, the inclusion of a large number of highly correlated biomarkers within a model may reduce classification performance, unnecessarily inflate associated costs of a diagnosis or a screen and reduce the feasibility of clinical translation. An unsupervised feature reduction approach can reduce feature redundancy in lipidomic biomarkers by limiting the number of highly correlated lipids while retaining informative features to achieve good classification performance for various clinical outcomes. Good predictive models based on a reduced number of biomarkers are also more cost effective and feasible from a clinical translation perspective. RESULTS: The application of LICRE to various lipidomic datasets in diabetes and cardiovascular disease demonstrated superior discrimination in terms of the area under the receiver operator characteristic curve while using fewer lipid markers when predicting various clinical outcomes. AVAILABILITY AND IMPLEMENTATION: The MATLAB implementation of LICRE is available from http://ww2.cs.mu.oz.au/∼gwong/LICRE


Assuntos
Cromatografia Líquida/métodos , Lipídeos/química , Espectrometria de Massas em Tandem/métodos , Algoritmos , Biomarcadores/metabolismo , Doenças Cardiovasculares/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Diabetes Mellitus/metabolismo , Humanos , Curva ROC
6.
Cell Microbiol ; 16(5): 734-50, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24612056

RESUMO

Motility is a fundamental part of cellular life and survival, including for Plasmodium parasites--single-celled protozoan pathogens responsible for human malaria. The motile life cycle forms achieve motility, called gliding, via the activity of an internal actomyosin motor. Although gliding is based on the well-studied system of actin and myosin, its core biomechanics are not completely understood. Currently accepted models suggest it results from a specifically organized cellular motor that produces a rearward directional force. When linked to surface-bound adhesins, this force is passaged to the cell posterior, propelling the parasite forwards. Gliding motility is observed in all three life cycle stages of Plasmodium: sporozoites, merozoites and ookinetes. However, it is only the ookinetes--formed inside the midgut of infected mosquitoes--that display continuous gliding without the necessity of host cell entry. This makes them ideal candidates for invasion-free biomechanical analysis. Here we apply a plate-based imaging approach to study ookinete motion in three-dimensional (3D) space to understand Plasmodium cell motility and how movement facilitates midgut colonization. Using single-cell tracking and numerical analysis of parasite motion in 3D, our analysis demonstrates that ookinetes move with a conserved left-handed helical trajectory. Investigation of cell morphology suggests this trajectory may be based on the ookinete subpellicular cytoskeleton, with complementary whole and subcellular electron microscopy showing that, like their motion paths, ookinetes share a conserved left-handed corkscrew shape and underlying twisted microtubular architecture. Through comparisons of 3D movement between wild-type ookinetes and a cytoskeleton-knockout mutant we demonstrate that perturbation of cell shape changes motion from helical to broadly linear. Therefore, while the precise linkages between cellular architecture and actomyosin motor organization remain unknown, our analysis suggests that the molecular basis of cell shape may, in addition to motor force, be a key adaptive strategy for malaria parasite dissemination and, as such, transmission.


Assuntos
Fenômenos Biomecânicos , Plasmodium/citologia , Plasmodium/fisiologia , Actinas/metabolismo , Imageamento Tridimensional , Locomoção , Microscopia , Miosinas/metabolismo , Imagem Óptica
7.
Environ Sci Technol ; 47(1): 485-92, 2013 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-23211093

RESUMO

Internet traffic has grown rapidly in recent years and is expected to continue to expand significantly over the next decade. Consequently, the resulting greenhouse gas (GHG) emissions of telecommunications service-supporting infrastructures have become an important issue. In this study, we develop a set of models for assessing the use-phase power consumption and carbon dioxide emissions of telecom network services to help telecom providers gain a better understanding of the GHG emissions associated with the energy required for their networks and services. Due to the fact that measuring the power consumption and traffic in a telecom network is a challenging task, these models utilize different granularities of available network information. As the granularity of the network measurement information decreases, the corresponding models have the potential to produce larger estimation errors. Therefore, we examine the accuracy of these models under various network scenarios using two approaches: (i) a sensitivity analysis through simulations and (ii) a case study of a deployed network. Both approaches show that the accuracy of the models depends on the network size, the total amount of network service traffic (i.e., for the service under assessment), and the number of network nodes used to process the service.


Assuntos
Poluentes Atmosféricos/análise , Dióxido de Carbono/análise , Redes de Comunicação de Computadores , Modelos Teóricos , Simulação por Computador , Efeito Estufa
8.
Bioinformatics ; 28(2): 151-9, 2012 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-22110244

RESUMO

MOTIVATION: Feature selection is a key concept in machine learning for microarray datasets, where features represented by probesets are typically several orders of magnitude larger than the available sample size. Computational tractability is a key challenge for feature selection algorithms in handling very high-dimensional datasets beyond a hundred thousand features, such as in datasets produced on single nucleotide polymorphism microarrays. In this article, we present a novel feature set reduction approach that enables scalable feature selection on datasets with hundreds of thousands of features and beyond. Our approach enables more efficient handling of higher resolution datasets to achieve better disease subtype classification of samples for potentially more accurate diagnosis and prognosis, which allows clinicians to make more informed decisions in regards to patient treatment options. RESULTS: We applied our feature set reduction approach to several publicly available cancer single nucleotide polymorphism (SNP) array datasets and evaluated its performance in terms of its multiclass predictive classification accuracy over different cancer subtypes, its speedup in execution as well as its scalability with respect to sample size and array resolution. Feature Set Reduction (FSR) was able to reduce the dimensions of an SNP array dataset by more than two orders of magnitude while achieving at least equal, and in most cases superior predictive classification performance over that achieved on features selected by existing feature selection methods alone. An examination of the biological relevance of frequently selected features from FSR-reduced feature sets revealed strong enrichment in association with cancer. AVAILABILITY: FSR was implemented in MATLAB R2010b and is available at http://ww2.cs.mu.oz.au/~gwong/FSR.


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Neoplasias/genética , Inteligência Artificial , Humanos , Neoplasias/classificação , Neoplasias/diagnóstico
9.
BMC Bioinformatics ; 12: 84, 2011 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-21435268

RESUMO

BACKGROUND: Many different microarray experiments are publicly available today. It is natural to ask whether different experiments for the same phenotypic conditions can be combined using meta-analysis, in order to increase the overall sample size. However, some genes are not measured in all experiments, hence they cannot be included or their statistical significance cannot be appropriately estimated in traditional meta-analysis. Nonetheless, these genes, which we refer to as incomplete genes, may also be informative and useful. RESULTS: We propose a meta-analysis framework, called "Incomplete Gene Meta-analysis", which can include incomplete genes by imputing the significance of missing replicates, and computing a meta-score for every gene across all datasets. We demonstrate that the incomplete genes are worthy of being included and our method is able to appropriately estimate their significance in two groups of experiments. We first apply the Incomplete Gene Meta-analysis and several comparable methods to five breast cancer datasets with an identical set of probes. We simulate incomplete genes by randomly removing a subset of probes from each dataset and demonstrate that our method consistently outperforms two other methods in terms of their false discovery rate. We also apply the methods to three gastric cancer datasets for the purpose of discriminating diffuse and intestinal subtypes. CONCLUSIONS: Meta-analysis is an effective approach that identifies more robust sets of differentially expressed genes from multiple studies. The incomplete genes that mainly arise from the use of different platforms may also have statistical and biological importance but are ignored or are not appropriately involved by previous studies. Our Incomplete Gene Meta-analysis is able to incorporate the incomplete genes by estimating their significance. The results on both breast and gastric cancer datasets suggest that the highly ranked genes and associated GO terms produced by our method are more significant and biologically meaningful according to the previous literature.


Assuntos
Perfilação da Expressão Gênica/métodos , Metanálise como Assunto , Modelos Lineares , Análise de Sequência com Séries de Oligonucleotídeos
10.
Opt Express ; 19(26): B260-9, 2011 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-22274028

RESUMO

Energy-efficient video distribution systems have become an important tool to deal with the rapid growth in Internet video traffic and to maintain the environmental sustainability of the Internet. Due to the limitations in terms of energy-efficiency of the conventional server centric method for delivering video services to the end users, storing video contents closer to the end users could potentially achieve significant improvements in energy-efficiency. Because of dissimilarities in user behavior and limited cache sizes, caching systems should be designed according to the behavior of user communities. In this paper, several energy consumption models are presented to evaluate the energy savings of single-level caching and multi-level caching systems that support varying levels of similarity in user behavior. The results show that single level caching systems can achieve high energy savings for communities with high similarity in user behavior. In contrast, when user behavior is dissimilar, multi-level caching systems should be used to increase the energy efficiency.


Assuntos
Comportamento , Internet , Webcasts como Assunto , Humanos , Modelos Teóricos , Televisão
11.
BMC Bioinformatics ; 11: 477, 2010 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-20860844

RESUMO

BACKGROUND: In the study of cancer genomics, gene expression microarrays, which measure thousands of genes in a single assay, provide abundant information for the investigation of interesting genes or biological pathways. However, in order to analyze the large number of noisy measurements in microarrays, effective and efficient bioinformatics techniques are needed to identify the associations between genes and relevant phenotypes. Moreover, systematic tests are needed to validate the statistical and biological significance of those discoveries. RESULTS: In this paper, we develop a robust and efficient method for exploratory analysis of microarray data, which produces a number of different orderings (rankings) of both genes and samples (reflecting correlation among those genes and samples). The core algorithm is closely related to biclustering, and so we first compare its performance with several existing biclustering algorithms on two real datasets - gastric cancer and lymphoma datasets. We then show on the gastric cancer data that the sample orderings generated by our method are highly statistically significant with respect to the histological classification of samples by using the Jonckheere trend test, while the gene modules are biologically significant with respect to biological processes (from the Gene Ontology). In particular, some of the gene modules associated with biclusters are closely linked to gastric cancer tumorigenesis reported in previous literature, while others are potentially novel discoveries. CONCLUSION: In conclusion, we have developed an effective and efficient method, Bi-Ordering Analysis, to detect informative patterns in gene expression microarrays by ranking genes and samples. In addition, a number of evaluation metrics were applied to assess both the statistical and biological significance of the resulting bi-orderings. The methodology was validated on gastric cancer and lymphoma datasets.


Assuntos
Perfilação da Expressão Gênica/métodos , Anotação de Sequência Molecular/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Neoplasias Gástricas/genética , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Redes Reguladoras de Genes , Humanos , Linfoma/genética , Reconhecimento Automatizado de Padrão/métodos
12.
Bioinformatics ; 26(8): 1007-14, 2010 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-20189937

RESUMO

MOTIVATION: High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed. RESULTS: We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Detecting REcurrent Copy number change using rank-order Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them. AVAILABILITY: The MATLAB implementation of DRECS is available at http://ww2.cs.mu.oz.au/ approximately gwong/DRECS/index.html.


Assuntos
Variações do Número de Cópias de DNA , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Genoma Humano , Humanos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...