Search | VHL Regional Portal

1.

Robustness and reproducibility of simple and complex synthetic logic circuit designs using a DBTL loop.

Cummins, Breschine; Vrana, Justin; Moseley, Robert C; Eramian, Hamed; Deckard, Anastasia; Fontanarrosa, Pedro; Bryce, Daniel; Weston, Mark; Zheng, George; Nowak, Joshua; Motta, Francis C; Eslami, Mohammed; Johnson, Kara Layne; Goldman, Robert P; Myers, Chris J; Johnson, Tessa; Vaughn, Matthew W; Gaffney, Niall; Urrutia, Joshua; Gopaulakrishnan, Shweta; Biggers, Vanessa; Higa, Trissha R; Mosqueda, Lorraine A; Gameiro, Marcio; Gedeon, Tomás; Mischaikow, Konstantin; Beal, Jacob; Bartley, Bryan; Mitchell, Tom; Nguyen, Tramy T; Roehner, Nicholas; Haase, Steven B.

Synth Biol (Oxf) ; 8(1): ysad005, 2023.

Article in English | MEDLINE | ID: mdl-37073283

ABSTRACT

Computational tools addressing various components of design-build-test-learn (DBTL) loops for the construction of synthetic genetic networks exist but do not generally cover the entire DBTL loop. This manuscript introduces an end-to-end sequence of tools that together form a DBTL loop called Design Assemble Round Trip (DART). DART provides rational selection and refinement of genetic parts to construct and test a circuit. Computational support for experimental process, metadata management, standardized data collection and reproducible data analysis is provided via the previously published Round Trip (RT) test-learn loop. The primary focus of this work is on the Design Assemble (DA) part of the tool chain, which improves on previous techniques by screening up to thousands of network topologies for robust performance using a novel robustness score derived from dynamical behavior based on circuit topology only. In addition, novel experimental support software is introduced for the assembly of genetic circuits. A complete design-through-analysis sequence is presented using several OR and NOR circuit designs, with and without structural redundancy, that are implemented in budding yeast. The execution of DART tested the predictions of the design tools, specifically with regard to robust and reproducible performance under different experimental conditions. The data analysis depended on a novel application of machine learning techniques to segment bimodal flow cytometry distributions. Evidence is presented that, in some cases, a more complex build may impart more robustness and reproducibility across experimental conditions. Graphical Abstract.

2.

Highly-automated, high-throughput replication of yeast-based logic circuit design assessments.

Goldman, Robert P; Moseley, Robert; Roehner, Nicholas; Cummins, Breschine; Vrana, Justin D; Clowers, Katie J; Bryce, Daniel; Beal, Jacob; DeHaven, Matthew; Nowak, Joshua; Higa, Trissha; Biggers, Vanessa; Lee, Peter; Hunt, Jeremy P; Mosqueda, Lorraine; Haase, Steven B; Weston, Mark; Zheng, George; Deckard, Anastasia; Gopaulakrishnan, Shweta; Stubbs, Joseph F; Gaffney, Niall I; Vaughn, Matthew W; Maheshri, Narendra; Mikhalev, Ekaterina; Bartley, Bryan; Markeloff, Richard; Mitchell, Tom; Nguyen, Tramy; Sumorok, Daniel; Walczak, Nicholas; Myers, Chris; Zundel, Zach; Hatch, Benjamin; Scholz, James; Colonna-Romano, John.

Synth Biol (Oxf) ; 7(1): ysac018, 2022.

Article in English | MEDLINE | ID: mdl-36285185

ABSTRACT

We describe an experimental campaign that replicated the performance assessment of logic gates engineered into cells of Saccharomyces cerevisiae by Gander et al. Our experimental campaign used a novel high-throughput experimentation framework developed under Defense Advanced Research Projects Agency's Synergistic Discovery and Design program: a remote robotic lab at Strateos executed a parameterized experimental protocol. Using this protocol and robotic execution, we generated two orders of magnitude more flow cytometry data than the original experiments. We discuss our results, which largely, but not completely, agree with the original report and make some remarks about lessons learned. Graphical Abstract.

3.

A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists.

Garcia, Benjamin J; Urrutia, Joshua; Zheng, George; Becker, Diveena; Corbet, Carolyn; Maschhoff, Paul; Cristofaro, Alexander; Gaffney, Niall; Vaughn, Matthew; Saxena, Uma; Chen, Yi-Pei; Gordon, D Benjamin; Eslami, Mohammed.

Synth Biol (Oxf) ; 7(1): ysac012, 2022.

Article in English | MEDLINE | ID: mdl-36035514

ABSTRACT

Sequencing technologies, in particular RNASeq, have become critical tools in the design, build, test and learn cycle of synthetic biology. They provide a better understanding of synthetic designs, and they help identify ways to improve and select designs. While these data are beneficial to design, their collection and analysis is a complex, multistep process that has implications on both discovery and reproducibility of experiments. Additionally, tool parameters, experimental metadata, normalization of data and standardization of file formats present challenges that are computationally intensive. This calls for high-throughput pipelines expressly designed to handle the combinatorial and longitudinal nature of synthetic biology. In this paper, we present a pipeline to maximize the analytical reproducibility of RNASeq for synthetic biologists. We also explore the impact of reproducibility on the validation of machine learning models. We present the design of a pipeline that combines traditional RNASeq data processing tools with structured metadata tracking to allow for the exploration of the combinatorial design in a high-throughput and reproducible manner. We then demonstrate utility via two different experiments: a control comparison experiment and a machine learning model experiment. The first experiment compares datasets collected from identical biological controls across multiple days for two different organisms. It shows that a reproducible experimental protocol for one organism does not guarantee reproducibility in another. The second experiment quantifies the differences in experimental runs from multiple perspectives. It shows that the lack of reproducibility from these different perspectives can place an upper bound on the validation of machine learning models trained on RNASeq data. Graphical Abstract.

4.

Round Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis.

Bryce, Daniel; Goldman, Robert P; DeHaven, Matthew; Beal, Jacob; Bartley, Bryan; Nguyen, Tramy T; Walczak, Nicholas; Weston, Mark; Zheng, George; Nowak, Josh; Lee, Peter; Stubbs, Joe; Gaffney, Niall; Vaughn, Matthew W; Myers, Chris John; Moseley, Robert C; Haase, Steven; Deckard, Anastasia; Cummins, Bree; Leiby, Nick.

ACS Synth Biol ; 11(2): 608-622, 2022 02 18.

Article in English | MEDLINE | ID: mdl-35099189

ABSTRACT

Synthetic biology is a complex discipline that involves creating detailed, purpose-built designs from genetic parts. This process is often phrased as a Design-Build-Test-Learn loop, where iterative design improvements can be made, implemented, measured, and analyzed. Automation can potentially improve both the end-to-end duration of the process and the utility of data produced by the process. One of the most important considerations for the development of effective automation and quality data is a rigorous description of implicit knowledge encoded as a formal knowledge representation. The development of knowledge representation for the process poses a number of challenges, including developing effective human-machine interfaces, protecting against and repairing user error, providing flexibility for terminological mismatches, and supporting extensibility to new experimental types. We address these challenges with the DARPA SD2 Round Trip software architecture. The Round Trip is an open architecture that automates many of the key steps in the Test and Learn phases of a Design-Build-Test-Learn loop for high-throughput laboratory science. The primary contribution of the Round Trip is to assist with and otherwise automate metadata creation, curation, standardization, and linkage with experimental data. The Round Trip's focus on metadata supports fast, automated, and replicable analysis of experiments as well as experimental situational awareness and experimental interpretability. We highlight the major software components and data representations that enable the Round Trip to speed up the design and analysis of experiments by 2 orders of magnitude over prior ad hoc methods. These contributions support a number of experimental protocols and experimental types, demonstrating the Round Trip's breadth and extensibility. We describe both an illustrative use case using the Round Trip for an on-the-loop experimental campaign and overall contributions to reducing experimental analysis time and increasing data product volume in the SD2 program.

Subject(s)

Research Design , Software , Automation/methods , Humans , Reference Standards , Synthetic Biology/methods

5.

Prediction of whole-cell transcriptional response with machine learning.

Eslami, Mohammed; Borujeni, Amin Espah; Eramian, Hamed; Weston, Mark; Zheng, George; Urrutia, Joshua; Corbet, Carolyn; Becker, Diveena; Maschhoff, Paul; Clowers, Katie; Cristofaro, Alexander; Hosseini, Hamid Doost; Gordon, D Benjamin; Dorfan, Yuval; Singer, Jedediah; Vaughn, Matthew; Gaffney, Niall; Fonner, John; Stubbs, Joe; Voigt, Christopher A; Yeung, Enoch.

Bioinformatics ; 38(2): 404-409, 2022 01 03.

Article in English | MEDLINE | ID: mdl-34570169

ABSTRACT

MOTIVATION: Applications in synthetic and systems biology can benefit from measuring whole-cell response to biochemical perturbations. Execution of experiments to cover all possible combinations of perturbations is infeasible. In this paper, we present the host response model (HRM), a machine learning approach that maps response of single perturbations to transcriptional response of the combination of perturbations. RESULTS: The HRM combines high-throughput sequencing with machine learning to infer links between experimental context, prior knowledge of cell regulatory networks, and RNASeq data to predict a gene's dysregulation. We find that the HRM can predict the directionality of dysregulation to a combination of inducers with an accuracy of >90% using data from single inducers. We further find that the use of prior, known cell regulatory networks doubles the predictive performance of the HRM (an R2 from 0.3 to 0.65). The model was validated in two organisms, Escherichia coli and Bacillus subtilis, using new experiments conducted after training. Finally, while the HRM is trained with gene expression data, the direct prediction of differential expression makes it possible to also conduct enrichment analyses using its predictions. We show that the HRM can accurately classify >95% of the pathway regulations. The HRM reduces the number of RNASeq experiments needed as responses can be tested in silico prior to the experiment. AVAILABILITY AND IMPLEMENTATION: The HRM software and tutorial are available at https://github.com/sd2e/CDM and the configurable differential expression analysis tools and tutorials are available at https://github.com/SD2E/omics_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Machine Learning , Software , Systems Biology , Escherichia coli/genetics , High-Throughput Nucleotide Sequencing

6.

RNA Solutions: Synthesizing Information to Support Transcriptomics (RNASSIST).

Chen, Yi-Pei; Ferguson, Laura B; Salem, Nihal A; Zheng, George; Mayfield, R Dayne; Eslami, Mohammed.

Bioinformatics ; 38(2): 397-403, 2022 01 03.

Article in English | MEDLINE | ID: mdl-34570193

ABSTRACT

MOTIVATION: Transcriptomics is a common approach to identify changes in gene expression induced by a disease state. Standard transcriptomic analyses consider differentially expressed genes (DEGs) as indicative of disease states so only a few genes would be treated as signals when the effect size is small, such as in brain tissue. For tissue with small effect sizes, if the DEGs do not belong to a pathway known to be involved in the disease, there would be little left in the transcriptome for researchers to follow up with. RESULTS: We developed RNA Solutions: Synthesizing Information to Support Transcriptomics (RNASSIST), a new approach to identify hidden signals in transcriptomic data by linking differential expression and co-expression networks using machine learning. We applied our approach to RNA-seq data of post-mortem brains that compared the Alcohol Use Disorder (AUD) group with the control group. Many of the candidate genes are not differentially expressed so would likely be ignored by standard transcriptomic analysis pipelines. Through multiple validation strategies, we concluded that these RNASSIST-identified genes likely play a significant role in AUD. AVAILABILITY AND IMPLEMENTATION: The RNASSIST algorithm is available at https://github.com/netrias/rnassist and both the software and the data used in RNASSIST are available at https://figshare.com/articles/software/RNAssist_Software_and_Data/16617250. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

RNA , Transcriptome , RNA/genetics , Gene Expression Profiling , RNA-Seq , Software , Sequence Analysis, RNA

7.

Service-based analysis of biological pathways.

Zheng, George; Bouguettaya, Athman.

BMC Bioinformatics ; 10 Suppl 10: S6, 2009 Oct 01.

Article in English | MEDLINE | ID: mdl-19796403

ABSTRACT

BACKGROUND: Computer-based pathway discovery is concerned with two important objectives: pathway identification and analysis. Conventional mining and modeling approaches aimed at pathway discovery are often effective at achieving either objective, but not both. Such limitations can be effectively tackled leveraging a Web service-based modeling and mining approach. RESULTS: Inspired by molecular recognitions and drug discovery processes, we developed a Web service mining tool, named PathExplorer, to discover potentially interesting biological pathways linking service models of biological processes. The tool uses an innovative approach to identify useful pathways based on graph-based hints and service-based simulation verifying user's hypotheses. CONCLUSION: Web service modeling of biological processes allows the easy access and invocation of these processes on the Web. Web service mining techniques described in this paper enable the discovery of biological pathways linking these process service models. Algorithms presented in this paper for automatically highlighting interesting subgraph within an identified pathway network enable the user to formulate hypothesis, which can be tested out using our simulation algorithm that are also described in this paper.

Subject(s)

Computational Biology/methods , Algorithms , Databases, Factual , Internet , Proteome/metabolism , Software , User-Computer Interface

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL