Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
IEEE Trans Neural Netw Learn Syst ; 35(4): 5014-5026, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37104113

RESUMO

The first step toward investigating the effectiveness of a treatment via a randomized trial is to split the population into control and treatment groups then compare the average response of the treatment group receiving the treatment to the control group receiving the placebo. To ensure that the difference between the two groups is caused only by the treatment, it is crucial that the control and the treatment groups have similar statistics. Indeed, the validity and reliability of a trial are determined by the similarity of two groups' statistics. Covariate balancing methods increase the similarity between the distributions of the two groups' covariates. However, often in practice, there are not enough samples to accurately estimate the groups' covariate distributions. In this article, we empirically show that covariate balancing with the standardized means difference (SMD) covariate balancing measure, as well as Pocock and Simon's sequential treatment assignment method, are susceptible to worst case treatment assignments. Worst case treatment assignments are those admitted by the covariate balance measure, but result in highest possible ATE estimation errors. We developed an adversarial attack to find adversarial treatment assignment for any given trial. Then, we provide an index to measure how close the given trial is to the worst case. To this end, we provide an optimization-based algorithm, namely adversarial treatment assignment in treatment effect trials (ATASTREET), to find the adversarial treatment assignments.


Assuntos
Redes Neurais de Computação , Projetos de Pesquisa , Reprodutibilidade dos Testes , Ensaios Clínicos Controlados Aleatórios como Assunto , Simulação por Computador
2.
Anal Chem ; 95(48): 17458-17466, 2023 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971927

RESUMO

Microfluidics can split samples into thousands or millions of partitions, such as droplets or nanowells. Partitions capture analytes according to a Poisson distribution, and in diagnostics, the analyte concentration is commonly inferred with a closed-form solution via maximum likelihood estimation (MLE). Here, we present a new scalable approach to multiplexing analytes. We generalize MLE with microfluidic partitioning and extend our previously developed Sparse Poisson Recovery (SPoRe) inference algorithm. We also present the first in vitro demonstration of SPoRe with droplet digital PCR (ddPCR) toward infection diagnostics. Digital PCR is intrinsically highly sensitive, and SPoRe helps expand its multiplexing capacity by circumventing its channel limitations. We broadly amplify bacteria with 16S ddPCR and assign barcodes to nine pathogen genera by using five nonspecific probes. Given our two-channel ddPCR system, we measured two probes at a time in multiple groups of droplets. Although individual droplets are ambiguous in their bacterial contents, we recover the concentrations of bacteria in the sample from the pooled data. We achieve stable quantification down to approximately 200 total copies of the 16S gene per sample, enabling a suite of clinical applications given a robust upstream microbial DNA extraction procedure. We develop a new theory that generalizes the application of this framework to many realistic sensing modalities, and we prove scaling rules for system design to achieve further expanded multiplexing. The core principles demonstrated here could impact many biosensing applications with microfluidic partitioning.


Assuntos
Bactérias , Microfluídica , Reação em Cadeia da Polimerase/métodos , Bactérias/genética
3.
IEEE Trans Signal Process ; 70: 2388-2401, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36082267

RESUMO

Compressed sensing (CS) is a signal processing technique that enables the efficient recovery of a sparse high-dimensional signal from low-dimensional measurements. In the multiple measurement vector (MMV) framework, a set of signals with the same support must be recovered from their corresponding measurements. Here, we present the first exploration of the MMV problem where signals are independently drawn from a sparse, multivariate Poisson distribution. We are primarily motivated by a suite of biosensing applications of microfluidics where analytes (such as whole cells or biomarkers) are captured in small volume partitions according to a Poisson distribution. We recover the sparse parameter vector of Poisson rates through maximum likelihood estimation with our novel Sparse Poisson Recovery (SPoRe) algorithm. SPoRe uses batch stochastic gradient ascent enabled by Monte Carlo approximations of otherwise intractable gradients. By uniquely leveraging the Poisson structure, SPoRe substantially outperforms a comprehensive set of existing and custom baseline CS algorithms. Notably, SPoRe can exhibit high performance even with one-dimensional measurements and high noise levels. This resource efficiency is not only unprecedented in the field of CS but is also particularly potent for applications in microfluidics in which the number of resolvable measurements per partition is often severely limited. We prove the identifiability property of the Poisson model under such lax conditions, analytically develop insights into system performance, and confirm these insights in simulated experiments. Our findings encourage a new approach to biosensing and are generalizable to other applications featuring spatial and temporal Poisson signals.

4.
IEEE Trans Pattern Anal Mach Intell ; 44(2): 1098-1107, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-33026983

RESUMO

Inferring appropriate information from large datasets has become important. In particular, identifying relationships among variables in these datasets has far-reaching impacts. In this article, we introduce the uniform information coefficient (UIC), which measures the amount of dependence between two multidimensional variables and is able to detect both linear and non-linear associations. Our proposed UIC is inspired by the maximal information coefficient (MIC) [1].; however, the MIC was originally designed to measure dependence between two one-dimensional variables. Unlike the MIC calculation that depends on the type of association between two variables, we show that the UIC calculation is less computationally expensive and more robust to the type of association between two variables. The UIC achieves this by replacing the dynamic programming step in the MIC calculation with a simpler technique based on the uniform partitioning of the data grid. This computational efficiency comes at the cost of not maximizing the information coefficient as done by the MIC algorithm. We present theoretical guarantees for the performance of the UIC and a variety of experiments to demonstrate its quality in detecting associations.


Assuntos
Algoritmos
5.
Artigo em Inglês | MEDLINE | ID: mdl-34746376

RESUMO

Ridge-like regularization often leads to improved generalization performance of machine learning models by mitigating overfitting. While ridge-regularized machine learning methods are widely used in many important applications, direct training via optimization could become challenging in huge data scenarios with millions of examples and features. We tackle such challenges by proposing a general approach that achieves ridge-like regularization through implicit techniques named Minipatch Ridge (MPRidge). Our approach is based on taking an ensemble of coefficients of unregularized learners trained on many tiny, random subsamples of both the examples and features of the training data, which we call minipatches. We empirically demonstrate that MPRidge induces an implicit ridge-like regularizing effect and performs nearly the same as explicit ridge regularization for a general class of predictors including logistic regression, SVM, and robust regression. Embarrassingly parallelizable, MPRidge provides a computationally appealing alternative to inducing ridge-like regularization for improving generalization performance in challenging big-data settings.

6.
IEEE Trans Pattern Anal Mach Intell ; 43(7): 2233-2244, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33891546

RESUMO

We introduce a novel video-rate hyperspectral imager with high spatial, temporal and spectral resolutions. Our key hypothesis is that spectral profiles of pixels within each super-pixel tend to be similar. Hence, a scene-adaptive spatial sampling of a hyperspectral scene, guided by its super-pixel segmented image, is capable of obtaining high-quality reconstructions. To achieve this, we acquire an RGB image of the scene, compute its super-pixels, from which we generate a spatial mask of locations where we measure high-resolution spectrum. The hyperspectral image is subsequently estimated by fusing the RGB image and the spectral measurements using a learnable guided filtering approach. Due to low computational complexity of the superpixel estimation step, our setup can capture hyperspectral images of the scenes with little overhead over traditional snapshot hyperspectral cameras, but with significantly higher spatial and spectral resolutions. We validate the proposed technique with extensive simulations as well as a lab prototype that measures hyperspectral video at a spatial resolution of 600 ×900 pixels, at a spectral resolution of 10 nm over visible wavebands, and achieving a frame rate at 18fps.

7.
Nucleic Acids Res ; 48(10): 5217-5234, 2020 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-32338745

RESUMO

As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.


Assuntos
Algoritmos , Metagenômica/métodos , Probabilidade , Processamento de Sinais Assistido por Computador , Humanos , Metagenoma/genética
8.
PLoS One ; 14(3): e0212508, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30840653

RESUMO

Open Educational Resources (OER) have been lauded for their ability to reduce student costs and improve equity in higher education. Research examining whether OER provides learning benefits have produced mixed results, with most studies showing null effects. We argue that the common methods used to examine OER efficacy are unlikely to detect positive effects based on predictions of the access hypothesis. The access hypothesis states that OER benefits learning by providing access to critical course materials, and therefore predicts that OER should only benefit students who would not otherwise have access to the materials. Through the use of simulation analysis, we demonstrate that even if there is a learning benefit of OER, standard research methods are unlikely to detect it.


Assuntos
Educação a Distância , Aprendizagem , Estudantes , Adolescente , Adulto , Feminino , Humanos , Masculino
9.
Sci Adv ; 3(12): e1701548, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29226243

RESUMO

Modern biology increasingly relies on fluorescence microscopy, which is driving demand for smaller, lighter, and cheaper microscopes. However, traditional microscope architectures suffer from a fundamental trade-off: As lenses become smaller, they must either collect less light or image a smaller field of view. To break this fundamental trade-off between device size and performance, we present a new concept for three-dimensional (3D) fluorescence imaging that replaces lenses with an optimized amplitude mask placed a few hundred micrometers above the sensor and an efficient algorithm that can convert a single frame of captured sensor data into high-resolution 3D images. The result is FlatScope: perhaps the world's tiniest and lightest microscope. FlatScope is a lensless microscope that is scarcely larger than an image sensor (roughly 0.2 g in weight and less than 1 mm thick) and yet able to produce micrometer-resolution, high-frame rate, 3D fluorescence movies covering a total volume of several cubic millimeters. The ability of FlatScope to reconstruct full 3D images from a single frame of captured sensor data allows us to image 3D volumes roughly 40,000 times faster than a laser scanning confocal microscope while providing comparable resolution. We envision that this new flat fluorescence microscopy paradigm will lead to implantable endoscopes that minimize tissue damage, arrays of imagers that cover large areas, and bendable, flexible microscopes that conform to complex topographies.

10.
Biometrics ; 73(1): 10-19, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27163413

RESUMO

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.


Assuntos
Análise por Conglomerados , Interpretação Estatística de Dados , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos
11.
Sci Adv ; 2(9): e1600025, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27704040

RESUMO

Early identification of pathogens is essential for limiting development of therapy-resistant pathogens and mitigating infectious disease outbreaks. Most bacterial detection schemes use target-specific probes to differentiate pathogen species, creating time and cost inefficiencies in identifying newly discovered organisms. We present a novel universal microbial diagnostics (UMD) platform to screen for microbial organisms in an infectious sample, using a small number of random DNA probes that are agnostic to the target DNA sequences. Our platform leverages the theory of sparse signal recovery (compressive sensing) to identify the composition of a microbial sample that potentially contains novel or mutant species. We validated the UMD platform in vitro using five random probes to recover 11 pathogenic bacteria. We further demonstrated in silico that UMD can be generalized to screen for common human pathogens in different taxonomy levels. UMD's unorthodox sensing approach opens the door to more efficient and universal molecular diagnostics.


Assuntos
Bactérias/genética , Sondas de DNA/genética , DNA Bacteriano/genética , Infecções/diagnóstico , Bactérias/isolamento & purificação , Bactérias/patogenicidade , DNA Bacteriano/classificação , Humanos , Infecções/genética , Infecções/microbiologia , Reação em Cadeia da Polimerase
12.
J Stat Plan Inference ; 166: 52-66, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26500388

RESUMO

We develop a modeling framework for joint factor and cluster analysis of datasets where multiple categorical response items are collected on a heterogeneous population of individuals. We introduce a latent factor multinomial probit model and employ prior constructions that allow inference on the number of factors as well as clustering of the subjects into homogenous groups according to their relevant factors. Clustering, in particular, allows us to borrow strength across subjects, therefore helping in the estimation of the model parameters, particularly when the number of observations is small. We employ Markov chain Monte Carlo techniques and obtain tractable posterior inference for our objectives, including sampling of missing data. We demonstrate the effectiveness of our method on simulated data. We also analyze two real-world educational datasets and show that our method outperforms state-of-the-art methods. In the analysis of the real-world data, we uncover hidden relationships between the questions and the underlying educational concepts, while simultaneously partitioning the students into groups of similar educational mastery.

13.
IEEE Trans Image Process ; 21(2): 494-504, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21859622

RESUMO

Compressive sensing (CS) is an emerging approach for the acquisition of signals having a sparse or compressible representation in some basis. While the CS literature has mostly focused on problems involving 1-D signals and 2-D images, many important applications involve multidimensional signals; the construction of sparsifying bases and measurement systems for such signals is complicated by their higher dimensionality. In this paper, we propose the use of Kronecker product matrices in CS for two purposes. First, such matrices can act as sparsifying bases that jointly model the structure present in all of the signal dimensions. Second, such matrices can represent the measurement protocols used in distributed settings. Our formulation enables the derivation of analytical bounds for the sparse approximation of multidimensional signals and CS recovery performance, as well as a means of evaluating novel distributed measurement schemes.

14.
Philos Trans A Math Phys Eng Sci ; 370(1958): 118-35, 2012 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-22124085

RESUMO

Signal compression is an important tool for reducing communication costs and increasing the lifetime of wireless sensor network deployments. In this paper, we overview and classify an array of proposed compression methods, with an emphasis on illustrating the differences between the various approaches.

15.
Science ; 331(6018): 717-9, 2011 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-21311012

RESUMO

The data deluge is changing the operating environment of many sensing systems from data-poor to data-rich--so data-rich that we are in jeopardy of being overwhelmed. Managing and exploiting the data deluge require a reinvention of sensor system design and signal processing theory. The potential pay-offs are huge, as the resulting sensor systems will enable radically new information technologies and powerful new tools for scientific discovery.


Assuntos
Processamento Eletrônico de Dados , Informática , Gestão da Informação , Armazenamento e Recuperação da Informação , Processamento de Sinais Assistido por Computador
17.
IEEE Trans Pattern Anal Mach Intell ; 32(10): 1888-98, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20724764

RESUMO

This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2nu-SVM. We then exploit a characterization of the 2nu-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study, we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.

18.
IEEE Trans Image Process ; 19(10): 2580-94, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20550996

RESUMO

The emergence of low-cost sensing architectures for diverse modalities has made it possible to deploy sensor networks that capture a single event from a large number of vantage points and using multiple modalities. In many scenarios, these networks acquire large amounts of very high-dimensional data. For example, even a relatively small network of cameras can generate massive amounts of high-dimensional image and video data. One way to cope with this data deluge is to exploit low-dimensional data models. Manifold models provide a particularly powerful theoretical and algorithmic framework for capturing the structure of data governed by a small number of parameters, as is often the case in a sensor network. However, these models do not typically take into account dependencies among multiple sensors. We thus propose a new joint manifold framework for data ensembles that exploits such dependencies. We show that joint manifold structure can lead to improved performance for a variety of signal processing algorithms for applications including classification and manifold learning. Additionally, recent results concerning random projections of manifolds enable us to formulate a scalable and universal dimensionality reduction scheme that efficiently fuses the data from all sensors.

19.
Artigo em Inglês | MEDLINE | ID: mdl-19158952

RESUMO

Compressive sensing microarrays (CSMs) are DNA-based sensors that operate using group testing and compressive sensing (CS) principles. In contrast to conventional DNA microarrays, in which each genetic sensor is designed to respond to a single target, in a CSM, each sensor responds to a set of targets. We study the problem of designing CSMs that simultaneously account for both the constraints from CS theory and the biochemistry of probe-target DNA hybridization. An appropriate cross-hybridization model is proposed for CSMs, and several methods are developed for probe design and CS signal recovery based on the new model. Lab experiments suggest that in order to achieve accurate hybridization profiling, consensus probe sequences are required to have sequence homology of at least 80% with all targets to be detected. Furthermore, out-of-equilibrium datasets are usually as accurate as those obtained from equilibrium conditions. Consequently, one can use CSMs in applications in which only short hybridization times are allowed.

20.
IEEE Trans Image Process ; 17(7): 1069-82, 2008 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-18586616

RESUMO

The dual-tree quaternion wavelet transform (QWT) is a new multiscale analysis tool for geometric image features. The QWT is a near shift-invariant tight frame representation whose coefficients sport a magnitude and three phases: two phases encode local image shifts while the third contains image texture information. The QWT is based on an alternative theory for the 2-D Hilbert transform and can be computed using a dual-tree filter bank with linear computational complexity. To demonstrate the properties of the QWT's coherent magnitude/phase representation, we develop an efficient and accurate procedure for estimating the local geometrical structure of an image. We also develop a new multiscale algorithm for estimating the disparity between a pair of images that is promising for image registration and flow estimation applications. The algorithm features multiscale phase unwrapping, linear complexity, and sub-pixel estimation accuracy.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Processamento de Sinais Assistido por Computador , Tomografia de Coerência Óptica/métodos , Aumento da Imagem/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...