Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Biostatistics ; 23(2): 643-665, 2022 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33417699

RESUMO

Personalized cancer treatments based on the molecular profile of a patient's tumor are an emerging and exciting class of treatments in oncology. As genomic tumor profiling is becoming more common, targeted treatments for specific molecular alterations are gaining traction. To discover new potential therapeutics that may apply to broad classes of tumors matching some molecular pattern, experimentalists and pharmacologists rely on high-throughput, in vitro screens of many compounds against many different cell lines. We propose a hierarchical Bayesian model of how cancer cell lines respond to drugs in these experiments and develop a method for fitting the model to real-world high-throughput screening data. Through a case study, the model is shown to capture nontrivial associations between molecular features and drug response, such as requiring both wild type TP53 and overexpression of MDM2 to be sensitive to Nutlin-3(a). In quantitative benchmarks, the model outperforms a standard approach in biology, with $\approx20\%$ lower predictive error on held out data. When combined with a conditional randomization testing procedure, the model discovers markers of therapeutic response that recapitulate known biology and suggest new avenues for investigation. All code for the article is publicly available at https://github.com/tansey/deep-dose-response.


Assuntos
Antineoplásicos , Neoplasias , Antineoplásicos/farmacologia , Teorema de Bayes , Avaliação Pré-Clínica de Medicamentos/métodos , Detecção Precoce de Câncer , Ensaios de Triagem em Larga Escala , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética
2.
J Am Med Inform Assoc ; 29(1): 3-11, 2021 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-34534312

RESUMO

OBJECTIVE: The study sought to build predictive models of next menstrual cycle start date based on mobile health self-tracked cycle data. Because app users may skip tracking, disentangling physiological patterns of menstruation from tracking behaviors is necessary for the development of predictive models. MATERIALS AND METHODS: We use data from a popular menstrual tracker (186 000 menstruators with over 2 million tracked cycles) to learn a predictive model, which (1) accounts explicitly for self-tracking adherence; (2) updates predictions as a given cycle evolves, allowing for interpretable insight into how these predictions change over time; and (3) enables modeling of an individual's cycle length history while incorporating population-level information. RESULTS: Compared with 5 baselines (mean, median, convolutional neural network, recurrent neural network, and long short-term memory network), the model yields better predictions and consistently outperforms them as the cycle evolves. The model also provides predictions of skipped tracking probabilities. DISCUSSION: Mobile health apps such as menstrual trackers provide a rich source of self-tracked observations, but these data have questionable reliability, as they hinge on user adherence to the app. By taking a machine learning approach to modeling self-tracked cycle lengths, we can separate true cycle behavior from user adherence, allowing for more informed predictions and insights into the underlying observed data structure. CONCLUSIONS: Disentangling physiological patterns of menstruation from adherence allows for accurate and informative predictions of menstrual cycle start date and is necessary for mobile tracking apps. The proposed predictive model can support app users in being more aware of their self-tracking behavior and in better understanding their cycle dynamics.


Assuntos
Aplicativos Móveis , Telemedicina , Feminino , Humanos , Ciclo Menstrual/fisiologia , Menstruação , Reprodutibilidade dos Testes
3.
Proc Mach Learn Res ; 149: 535-566, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35072087

RESUMO

We explore how to quantify uncertainty when designing predictive models for healthcare to provide well-calibrated results. Uncertainty quantification and calibration are critical in medicine, as one must not only accommodate the variability of the underlying physiology, but adjust to the uncertain data collection and reporting process. This occurs not only on the context of electronic health records (i.e., the clinical documentation process), but on mobile health as well (i.e., user specific self-tracking patterns must be accounted for). In this work, we show that accurate uncertainty estimation is directly relevant to an important health application: the prediction of menstrual cycle length, based on self-tracked information. We take advantage of a flexible generative model that accommodates under-dispersed distributions via two degrees of freedom to fit the mean and variance of the observed cycle lengths. From a machine learning perspective, our work showcases how flexible generative models can not only provide state-of-the art predictive accuracy, but enable well-calibrated predictions. From a healthcare perspective, we demonstrate that with flexible generative models, not only can we accommodate the idiosyncrasies of mobile health data, but we can also adjust the predictive uncertainty to per-user cycle length patterns. We evaluate the proposed model in real-world cycle length data collected by one of the most popular menstrual trackers worldwide, and demonstrate how the proposed generative model provides accurate and well-calibrated cycle length predictions. Providing meaningful, less uncertain cycle length predictions is beneficial for menstrual health researchers, mobile health users and developers, as it may help design more usable mobile health solutions.

4.
NPJ Digit Med ; 3: 79, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32509976

RESUMO

The menstrual cycle is a key indicator of overall health for women of reproductive age. Previously, menstruation was primarily studied through survey results; however, as menstrual tracking mobile apps become more widely adopted, they provide an increasingly large, content-rich source of menstrual health experiences and behaviors over time. By exploring a database of user-tracked observations from the Clue app by BioWink GmbH of over 378,000 users and 4.9 million natural cycles, we show that self-reported menstrual tracker data can reveal statistically significant relationships between per-person cycle length variability and self-reported qualitative symptoms. A concern for self-tracked data is that they reflect not only physiological behaviors, but also the engagement dynamics of app users. To mitigate such potential artifacts, we develop a procedure to exclude cycles lacking user engagement, thereby allowing us to better distinguish true menstrual patterns from tracking anomalies. We uncover that women located at different ends of the menstrual variability spectrum, based on the consistency of their cycle length statistics, exhibit statistically significant differences in their cycle characteristics and symptom tracking patterns. We also find that cycle and period length statistics are stationary over the app usage timeline across the variability spectrum. The symptoms that we identify as showing statistically significant association with timing data can be useful to clinicians and users for predicting cycle variability from symptoms, or as potential health indicators for conditions like endometriosis. Our findings showcase the potential of longitudinal, high-resolution self-tracked data to improve understanding of menstruation and women's health as a whole.

5.
Proc Natl Acad Sci U S A ; 117(2): 836-847, 2020 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-31882445

RESUMO

Predicting how interactions between transcription factors and regulatory DNA sequence dictate rates of transcription and, ultimately, drive developmental outcomes remains an open challenge in physical biology. Using stripe 2 of the even-skipped gene in Drosophila embryos as a case study, we dissect the regulatory forces underpinning a key step along the developmental decision-making cascade: the generation of cytoplasmic mRNA patterns via the control of transcription in individual cells. Using live imaging and computational approaches, we found that the transcriptional burst frequency is modulated across the stripe to control the mRNA production rate. However, we discovered that bursting alone cannot quantitatively recapitulate the formation of the stripe and that control of the window of time over which each nucleus transcribes even-skipped plays a critical role in stripe formation. Theoretical modeling revealed that these regulatory strategies (bursting and the time window) respond in different ways to input transcription factor concentrations, suggesting that the stripe is shaped by the interplay of 2 distinct underlying molecular processes.


Assuntos
Drosophila/fisiologia , Embrião não Mamífero/fisiologia , Desenvolvimento Embrionário/fisiologia , Fatores de Transcrição/metabolismo , Animais , Núcleo Celular , Drosophila/embriologia , Drosophila/genética , Proteínas de Drosophila , Desenvolvimento Embrionário/genética , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Genes de Insetos , Masculino , Modelos Biológicos , RNA Mensageiro , Transcrição Gênica
6.
J Clin Monit Comput ; 33(1): 95-105, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29556884

RESUMO

To develop and validate a prediction model for delayed cerebral ischemia (DCI) after subarachnoid hemorrhage (SAH) using a temporal unsupervised feature engineering approach, demonstrating improved precision over standard features. 488 consecutive SAH admissions from 2006 to 2014 to a tertiary care hospital were included. Models were trained on 80%, while 20% were set aside for validation testing. Baseline information and standard grading scales were evaluated: age, sex, Hunt Hess grade, modified Fisher Scale (mFS), and Glasgow Coma Scale (GCS). An unsupervised approach applying random kernels was used to extract features from physiological time series (systolic and diastolic blood pressure, heart rate, respiratory rate, and oxygen saturation). Classifiers (Partial Least Squares, linear and kernel Support Vector Machines) were trained on feature subsets of the derivation dataset. Models were applied to the validation dataset. The performances of the best classifiers on the validation dataset are reported by feature subset. Standard grading scale (mFS): AUC 0.58. Combined demographics and grading scales: AUC 0.60. Random kernel derived physiologic features: AUC 0.74. Combined baseline and physiologic features with redundant feature reduction: AUC 0.77. Current DCI prediction tools rely on admission imaging and are advantageously simple to employ. However, using an agnostic and computationally inexpensive learning approach for high-frequency physiologic time series data, we demonstrated that our models achieve higher classification accuracy.


Assuntos
Isquemia Encefálica/diagnóstico por imagem , Diagnóstico por Computador/métodos , Hemorragia Subaracnóidea/diagnóstico por imagem , Idoso , Área Sob a Curva , Cuidados Críticos , Reações Falso-Positivas , Feminino , Escala de Coma de Glasgow , Humanos , Análise dos Mínimos Quadrados , Masculino , Pessoa de Meia-Idade , Admissão do Paciente , Valor Preditivo dos Testes , Reprodutibilidade dos Testes , Fatores de Risco , Índice de Gravidade de Doença , Máquina de Vetores de Suporte , Centros de Atenção Terciária , Fatores de Tempo
7.
Cell Rep ; 22(2): 340-349, 2018 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-29320731

RESUMO

T cells engage in two modes of interaction with antigen-presenting surfaces: stable synapses and motile kinapses. Although it is surmised that durable interactions of T cells with antigen-presenting cells involve synapses, in situ 3D imaging cannot resolve the mode of interaction. We have established in vitro 2D platforms and quantitative metrics to determine cell-intrinsic modes of interaction when T cells are faced with spatially continuous or restricted stimulation. All major resting human T cell subsets, except memory CD8 T cells, spend more time in the kinapse mode on continuous stimulatory surfaces. Surprisingly, we did not observe any concordant relationship between the mode and durability of interaction on cell-sized stimulatory spots. Naive CD8 T cells maintain kinapses for more than 3 hr before leaving stimulatory spots, whereas their memory counterparts maintain synapses for only an hour before leaving. Thus, durable interactions do not require stable synapses.


Assuntos
Sinapses Imunológicas/imunologia , Receptores de Antígenos de Linfócitos T/imunologia , Humanos
8.
PLoS Comput Biol ; 12(3): e1004793, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-27003682

RESUMO

Gene regulatory circuits must contend with intrinsic noise that arises due to finite numbers of proteins. While some circuits act to reduce this noise, others appear to exploit it. A striking example is the competence circuit in Bacillus subtilis, which exhibits much larger noise in the duration of its competence events than a synthetically constructed analog that performs the same function. Here, using stochastic modeling and fluorescence microscopy, we show that this larger noise allows cells to exit terminal phenotypic states, which expands the range of stress levels to which cells are responsive and leads to phenotypic heterogeneity at the population level. This is an important example of how noise confers a functional benefit in a genetic decision-making circuit.


Assuntos
Adaptação Fisiológica/genética , Bacillus subtilis/genética , Proteínas de Bactérias/genética , Redes Reguladoras de Genes/genética , Aptidão Genética/genética , Modelos Genéticos , Simulação por Computador , Modelos Estatísticos , Razão Sinal-Ruído , Estresse Fisiológico/genética
9.
J Biomed Inform ; 58: 156-165, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26464024

RESUMO

We present the Unsupervised Phenome Model (UPhenome), a probabilistic graphical model for large-scale discovery of computational models of disease, or phenotypes. We tackle this challenge through the joint modeling of a large set of diseases and a large set of clinical observations. The observations are drawn directly from heterogeneous patient record data (notes, laboratory tests, medications, and diagnosis codes), and the diseases are modeled in an unsupervised fashion. We apply UPhenome to two qualitatively different mixtures of patients and diseases: records of extremely sick patients in the intensive care unit with constant monitoring, and records of outpatients regularly followed by care providers over multiple years. We demonstrate that the UPhenome model can learn from these different care settings, without any additional adaptation. Our experiments show that (i) the learned phenotypes combine the heterogeneous data types more coherently than baseline LDA-based phenotypes; (ii) they each represent single diseases rather than a mix of diseases more often than the baseline ones; and (iii) when applied to unseen patient records, they are correlated with the patients' ground-truth disorders. Code for training, inference, and quantitative evaluation is made available to the research community.


Assuntos
Registros Eletrônicos de Saúde , Aprendizagem , Probabilidade , Humanos , Fenótipo
10.
Biophys J ; 108(8): 1852-5, 2015 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-25902425

RESUMO

Nanopore sequencing promises long read-lengths and single-molecule resolution, but the stochastic motion of the DNA molecule inside the pore is, as of this writing, a barrier to high accuracy reads. We develop a method of statistical inference that explicitly accounts for this error, and demonstrate that high accuracy (>99%) sequence inference is feasible even under highly diffusive motion by using a hidden Markov model to jointly analyze multiple stochastic reads. Using this model, we place bounds on achievable inference accuracy under a range of experimental parameters.


Assuntos
DNA/química , Modelos Estatísticos , Nanoporos , Análise de Sequência de DNA/métodos
11.
BMC Bioinformatics ; 16: 3, 2015 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-25591752

RESUMO

BACKGROUND: Single-molecule techniques have emerged as incisive approaches for addressing a wide range of questions arising in contemporary biological research [Trends Biochem Sci 38:30-37, 2013; Nat Rev Genet 14:9-22, 2013; Curr Opin Struct Biol 2014, 28C:112-121; Annu Rev Biophys 43:19-39, 2014]. The analysis and interpretation of raw single-molecule data benefits greatly from the ongoing development of sophisticated statistical analysis tools that enable accurate inference at the low signal-to-noise ratios frequently associated with these measurements. While a number of groups have released analysis toolkits as open source software [J Phys Chem B 114:5386-5403, 2010; Biophys J 79:1915-1927, 2000; Biophys J 91:1941-1951, 2006; Biophys J 79:1928-1944, 2000; Biophys J 86:4015-4029, 2004; Biophys J 97:3196-3205, 2009; PLoS One 7:e30024, 2012; BMC Bioinformatics 288 11(8):S2, 2010; Biophys J 106:1327-1337, 2014; Proc Int Conf Mach Learn 28:361-369, 2013], it remains difficult to compare analysis for experiments performed in different labs due to a lack of standardization. RESULTS: Here we propose a standardized single-molecule dataset (SMD) file format. SMD is designed to accommodate a wide variety of computer programming languages, single-molecule techniques, and analysis strategies. To facilitate adoption of this format we have made two existing data analysis packages that are used for single-molecule analysis compatible with this format. CONCLUSION: Adoption of a common, standard data file format for sharing raw single-molecule data and analysis outcomes is a critical step for the emerging and powerful single-molecule field, which will benefit both sophisticated users and non-specialists by allowing standardized, transparent, and reproducible analysis practices.


Assuntos
Fenômenos Fisiológicos Celulares , Biologia Computacional/métodos , Software , Conjuntos de Dados como Assunto , Humanos , Cinética , Microscopia
12.
J Immunol Methods ; 416: 84-93, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25445324

RESUMO

Integrative analytical approaches are needed to study and understand T cell motility as it is a highly coordinated and complex process. Several computational algorithms and tools are available to track motile cells in time-lapse microscopy images. In contrast, there has only been limited effort towards the development of tools that take advantage of multi-channel microscopy data and facilitate integrative analysis of cell-motility. We have implemented algorithms for detecting, tracking, and analyzing cell motility from multi-channel time-lapse microscopy data. We have integrated these into a MATLAB-based toolset we call TIAM (Tool for Integrative Analysis of Motility). The cells are detected by a hybrid approach involving edge detection and Hough transforms from transmitted light images. Cells are tracked using a modified nearest-neighbor association followed by an optimization routine to join shorter segments. Cell positions are used to perform local segmentation for extracting features from transmitted light, reflection and fluorescence channels and associating them with cells and cell-tracks to facilitate integrative analysis. We found that TIAM accurately captures the motility behavior of T cells and performed better than DYNAMIK, Icy, Imaris, and Volocity in detecting and tracking motile T cells. Extraction of cell-associated features from reflection and fluorescence channels was also accurate with less than 10% median error in measurements. Finally, we obtained novel insights into T cell motility that were critically dependent on the unique capabilities of TIAM. We found that 1) the CD45RO subset of human CD8 T cells moved faster and exhibited an increased propensity to attach to the substratum during CCL21-driven chemokinesis when compared to the CD45RA subset; and 2) attachment area and arrest coefficient during antigen-induced motility of the CD45A subset is correlated with surface density of integrin LFA1 at the contact.


Assuntos
Linfócitos T CD8-Positivos/fisiologia , Movimento Celular/fisiologia , Microscopia Confocal/métodos , Algoritmos , Linfócitos T CD8-Positivos/imunologia , Movimento Celular/imunologia , Quimiocina CCL21/imunologia , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Imageamento Tridimensional/mortalidade , Antígenos Comuns de Leucócito/imunologia , Software
13.
BMC Syst Biol ; 8: 97, 2014 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-25183062

RESUMO

BACKGROUND: The extraordinary success of imatinib in the treatment of BCR-ABL1 associated cancers underscores the need to identify novel functional gene fusions in cancer. RNA sequencing offers a genome-wide view of expressed transcripts, uncovering biologically functional gene fusions. Although several bioinformatics tools are already available for the detection of putative fusion transcripts, candidate event lists are plagued with non-functional read-through events, reverse transcriptase template switching events, incorrect mapping, and other systematic errors. Such lists lack any indication of oncogenic relevance, and they are too large for exhaustive experimental validation. RESULTS: We have designed and implemented a pipeline, Pegasus, for the annotation and prediction of biologically functional gene fusion candidates. Pegasus provides a common interface for various gene fusion detection tools, reconstruction of novel fusion proteins, reading-frame-aware annotation of preserved/lost functional domains, and data-driven classification of oncogenic potential. Pegasus dramatically streamlines the search for oncogenic gene fusions, bridging the gap between raw RNA-Seq data and a final, tractable list of candidates for experimental validation. CONCLUSION: We show the effectiveness of Pegasus in predicting new driver fusions in 176 RNA-Seq samples of glioblastoma multiforme (GBM) and 23 cases of anaplastic large cell lymphoma (ALCL).


Assuntos
Biologia Computacional/métodos , Fusão Gênica/genética , Anotação de Sequência Molecular/métodos , Neoplasias/genética , Software , Bases de Dados Genéticas , Árvores de Decisões , Humanos
14.
Nucleic Acids Res ; 42(16): 10265-77, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25120267

RESUMO

The bacterial transcription factor LacI loops DNA by binding to two separate locations on the DNA simultaneously. Despite being one of the best-studied model systems for transcriptional regulation, the number and conformations of loop structures accessible to LacI remain unclear, though the importance of multiple coexisting loops has been implicated in interactions between LacI and other cellular regulators of gene expression. To probe this issue, we have developed a new analysis method for tethered particle motion, a versatile and commonly used in vitro single-molecule technique. Our method, vbTPM, performs variational Bayesian inference in hidden Markov models. It learns the number of distinct states (i.e. DNA-protein conformations) directly from tethered particle motion data with better resolution than existing methods, while easily correcting for common experimental artifacts. Studying short (roughly 100 bp) LacI-mediated loops, we provide evidence for three distinct loop structures, more than previously reported in single-molecule studies. Moreover, our results confirm that changes in LacI conformation and DNA-binding topology both contribute to the repertoire of LacI-mediated loops formed in vitro, and provide qualitatively new input for models of looping and transcriptional regulation. We expect vbTPM to be broadly useful for probing complex protein-nucleic acid interactions.


Assuntos
DNA/química , Repressores Lac/metabolismo , Artefatos , Teorema de Bayes , Cinética , Repressores Lac/química , Cadeias de Markov , Movimento (Física) , Conformação de Ácido Nucleico
15.
Biophys J ; 106(6): 1327-37, 2014 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-24655508

RESUMO

Many single-molecule experiments aim to characterize biomolecular processes in terms of kinetic models that specify the rates of transition between conformational states of the biomolecule. Estimation of these rates often requires analysis of a population of molecules, in which the conformational trajectory of each molecule is represented by a noisy, time-dependent signal trajectory. Although hidden Markov models (HMMs) may be used to infer the conformational trajectories of individual molecules, estimating a consensus kinetic model from the population of inferred conformational trajectories remains a statistically difficult task, as inferred parameters vary widely within a population. Here, we demonstrate how a recently developed empirical Bayesian method for HMMs can be extended to enable a more automated and statistically principled approach to two widely occurring tasks in the analysis of single-molecule fluorescence resonance energy transfer (smFRET) experiments: 1), the characterization of changes in rates across a series of experiments performed under variable conditions; and 2), the detection of degenerate states that exhibit the same FRET efficiency but differ in their rates of transition. We apply this newly developed methodology to two studies of the bacterial ribosome, each exemplary of one of these two analysis tasks. We conclude with a discussion of model-selection techniques for determination of the appropriate number of conformational states. The code used to perform this analysis and a basic graphical user interface front end are available as open source software.


Assuntos
Transferência Ressonante de Energia de Fluorescência/métodos , Teorema de Bayes , Cadeias de Markov , Subunidades Ribossômicas Menores de Bactérias/química
17.
JMLR Workshop Conf Proc ; 28(2): 361-369, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-26985282

RESUMO

We address the problem of analyzing sets of noisy time-varying signals that all report on the same process but confound straightforward analyses due to complex inter-signal heterogeneities and measurement artifacts. In particular we consider single-molecule experiments which indirectly measure the distinct steps in a biomolecular process via observations of noisy time-dependent signals such as a fluorescence intensity or bead position. Straightforward hidden Markov model (HMM) analyses attempt to characterize such processes in terms of a set of conformational states, the transitions that can occur between these states, and the associated rates at which those transitions occur; but require ad-hoc post-processing steps to combine multiple signals. Here we develop a hierarchically coupled HMM that allows experimentalists to deal with inter-signal variability in a principled and automatic way. Our approach is a generalized expectation maximization hyperparameter point estimation procedure with variational Bayes at the level of individual time series that learns an single interpretable representation of the overall data generating process.

18.
Methods Mol Biol ; 880: 273-322, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23361990

RESUMO

Recent single-cell experiments have revived interest in the unavoidable or intrinsic noise in biochemical and genetic networks arising from the small number of molecules of the participating species. That is, rather than modeling regulatory networks in terms of the deterministic dynamics of concentrations, we model the dynamics of the probability of a given copy number of the reactants in single cells. Most of the modeling activity of the last decade has centered on stochastic simulation, i.e., Monte Carlo methods for generating stochastic time series. Here we review the mathematical description in terms of probability distributions, introducing the relevant derivations and illustrating several cases for which analytic progress can be made either instead of or before turning to numerical computation. Analytic progress can be useful both for suggesting more efficient numerical methods and for obviating the computational expense of, for example, exploring parametric dependence.


Assuntos
Modelos Biológicos , Transdução de Sinais/fisiologia , Simulação por Computador , Modelos Estatísticos , Método de Monte Carlo , Probabilidade , Análise de Célula Única/métodos , Processos Estocásticos
19.
Proc Natl Acad Sci U S A ; 108(2): 446-51, 2011 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-21183719

RESUMO

Over the past decade, a number of researchers in systems biology have sought to relate the function of biological systems to their network-level descriptions--lists of the most important players and the pairwise interactions between them. Both for large networks (in which statistical analysis is often framed in terms of the abundance of repeated small subgraphs) and for small networks which can be analyzed in greater detail (or even synthesized in vivo and subjected to experiment), revealing the relationship between the topology of small subgraphs and their biological function has been a central goal. We here seek to pose this revelation as a statistical task, illustrated using a particular setup which has been constructed experimentally and for which parameterized models of transcriptional regulation have been studied extensively. The question "how does function follow form" is here mathematized by identifying which topological attributes correlate with the diverse possible information-processing tasks which a transcriptional regulatory network can realize. The resulting method reveals one form-function relationship which had earlier been predicted based on analytic results, and reveals a second for which we can provide an analytic interpretation. Resulting source code is distributed via http://formfunction.sourceforge.net.


Assuntos
Modelos Estatísticos , Transcrição Gênica , Algoritmos , Regulação da Expressão Gênica , Modelos Biológicos , Modelos Teóricos , Biologia de Sistemas
20.
BMC Bioinformatics ; 11 Suppl 8: S2, 2010 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-21034427

RESUMO

BACKGROUND: The recent explosion of experimental techniques in single molecule biophysics has generated a variety of novel time series data requiring equally novel computational tools for analysis and inference. This article describes in general terms how graphical modeling may be used to learn from biophysical time series data using the variational Bayesian expectation maximization algorithm (VBEM). The discussion is illustrated by the example of single-molecule fluorescence resonance energy transfer (smFRET) versus time data, where the smFRET time series is modeled as a hidden Markov model (HMM) with Gaussian observables. A detailed description of smFRET is provided as well. RESULTS: The VBEM algorithm returns the model's evidence and an approximating posterior parameter distribution given the data. The former provides a metric for model selection via maximum evidence (ME), and the latter a description of the model's parameters learned from the data. ME/VBEM provide several advantages over the more commonly used approach of maximum likelihood (ML) optimized by the expectation maximization (EM) algorithm, the most important being a natural form of model selection and a well-posed (non-divergent) optimization problem. CONCLUSIONS: The results demonstrate the utility of graphical modeling for inference of dynamic processes in single molecule biophysics.


Assuntos
Gráficos por Computador , DNA/química , Transferência Ressonante de Energia de Fluorescência/métodos , Simulação de Dinâmica Molecular , Software , Algoritmos , Teorema de Bayes , Bases de Dados Factuais , Sequências Repetidas Invertidas , Cadeias de Markov , Modelos Teóricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...