Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
J Am Med Inform Assoc ; 25(3): 267-274, 2018 Mar 01.
Article in English | MEDLINE | ID: mdl-29040639

ABSTRACT

OBJECTIVE: We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. MATERIALS AND METHODS: High-end networking, packet-filter firewalls, network intrusion-detection systems. RESULTS: We describe a "Medical Science DMZ" concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs. DISCUSSION: The exponentially increasing amounts of "omics" data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research "Big Data." The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows. CONCLUSION: By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.

2.
J Med Chem ; 60(9): 3594-3605, 2017 05 11.
Article in English | MEDLINE | ID: mdl-28252959

ABSTRACT

Miniaturization and parallel processing play an important role in the evolution of many technologies. We demonstrate the application of miniaturized high-throughput experimentation methods to resolve synthetic chemistry challenges on the frontlines of a lead optimization effort to develop diacylglycerol acyltransferase (DGAT1) inhibitors. Reactions were performed on ∼1 mg scale using glass microvials providing a miniaturized high-throughput experimentation capability that was used to study a challenging SNAr reaction. The availability of robust synthetic chemistry conditions discovered in these miniaturized investigations enabled the development of structure-activity relationships that ultimately led to the discovery of soluble, selective, and potent inhibitors of DGAT1.


Subject(s)
Diacylglycerol O-Acyltransferase/antagonists & inhibitors , Enzyme Inhibitors/chemistry , Enzyme Inhibitors/pharmacology , Chromatography, Liquid , Mass Spectrometry , Proton Magnetic Resonance Spectroscopy
3.
J Am Med Inform Assoc ; 23(6): 1199-1201, 2016 11.
Article in English | MEDLINE | ID: mdl-27136944

ABSTRACT

OBJECTIVE: We describe use cases and an institutional reference architecture for maintaining high-capacity, data-intensive network flows (e.g., 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations. MATERIALS AND METHODS: High-end networking, packet filter firewalls, network intrusion detection systems. RESULTS: We describe a "Medical Science DMZ" concept as an option for secure, high-volume transport of large, sensitive data sets between research institutions over national research networks. DISCUSSION: The exponentially increasing amounts of "omics" data, the rapid increase of high-quality imaging, and other rapidly growing clinical data sets have resulted in the rise of biomedical research "big data." The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large data sets. Maintaining data-intensive flows that comply with HIPAA and other regulations presents a new challenge for biomedical research. Recognizing this, we describe a strategy that marries performance and security by borrowing from and redefining the concept of a "Science DMZ"-a framework that is used in physical sciences and engineering research to manage high-capacity data flows. CONCLUSION: By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements.


Subject(s)
Computer Communication Networks , Computer Security , Computing Methodologies , Computer Security/legislation & jurisprudence , Confidentiality/legislation & jurisprudence , Government Regulation , Health Insurance Portability and Accountability Act , Medical Records Systems, Computerized/legislation & jurisprudence , United States
4.
J Med Chem ; 57(2): 477-94, 2014 Jan 23.
Article in English | MEDLINE | ID: mdl-24383452

ABSTRACT

Systematic methods that speed-up the assignment of absolute configuration using vibrational circular dichrosim (VCD) and simplify its usage will advance this technique into a robust platform technology. Applying VCD to pharmaceutically relevant compounds has been handled in an ad hoc fashion, relying on fragment analysis and technical shortcuts to reduce the computational time required. We leverage a large computational infrastructure to provide adequate conformational exploration which enables an accurate assignment of absolute configuration. We describe a systematic approach for rapid calculation of VCD/IR spectra and comparison with corresponding measured spectra and apply this approach to assign the correct stereochemistry of nine test cases. We suggest moving away from the fragment approach when making VCD assignments. In addition to enabling faster and more reliable VCD assignments of absolute configuration, the ability to rapidly explore conformational space and sample conformations of complex molecules will have applicability in other areas of drug discovery.


Subject(s)
Circular Dichroism/methods , Molecular Conformation , Pharmaceutical Preparations/chemistry , Alkynes , Aprepitant , Azetidines/chemistry , Benzoxazines/chemistry , Camphor/chemistry , Computational Biology , Cyclohexane Monoterpenes , Cyclopropanes , Drug Discovery/methods , Ezetimibe , Ibuprofen/chemistry , Monoterpenes/chemistry , Morpholines/chemistry , Quantum Theory , Simvastatin/chemistry , Statistical Distributions , Stereoisomerism
5.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-21993624

ABSTRACT

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Mammals/genetics , Animals , Disease , Exons/genetics , Genomics , Health , Humans , Molecular Sequence Annotation , Phylogeny , RNA/classification , RNA/genetics , Selection, Genetic/genetics , Sequence Alignment , Sequence Analysis, DNA
6.
Org Lett ; 11(15): 3194-7, 2009 Aug 06.
Article in English | MEDLINE | ID: mdl-19572567

ABSTRACT

Treatment of omega-epoxynitriles with hydroxylamine affords cyclic aminonitrones in a single step and with high stereoselectivity. The scope of this novel transformation was explored in a series of examples. The aminonitrone products were shown to be useful substrates for further selective elaboration.


Subject(s)
HIV Integrase Inhibitors/chemistry , Pyrimidinones/chemistry , Crystallography, X-Ray , Cyclization , Drug Design , HIV Integrase Inhibitors/chemical synthesis , Molecular Structure , Pyrimidinones/chemical synthesis , Pyrrolidinones/chemistry , Raltegravir Potassium
7.
Proc Natl Acad Sci U S A ; 104(49): 19428-33, 2007 Dec 04.
Article in English | MEDLINE | ID: mdl-18040051

ABSTRACT

Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of approximately 24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs-specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to approximately 20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.


Subject(s)
Genetic Code , Genome, Human/genetics , Genomics , Open Reading Frames/genetics , Proteins/genetics , Animals , Base Sequence , DNA Transposable Elements/genetics , Dogs , Genes/genetics , Humans , Mice , Molecular Sequence Data , Pseudogenes/genetics , Sequence Analysis, DNA
8.
Genome Res ; 17(6): 760-74, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17567995

ABSTRACT

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.


Subject(s)
Evolution, Molecular , Genome, Human , Mammals/genetics , Open Reading Frames , Phylogeny , Sequence Alignment , Animals , Human Genome Project , Humans
9.
Nature ; 447(7141): 167-77, 2007 May 10.
Article in English | MEDLINE | ID: mdl-17495919

ABSTRACT

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.


Subject(s)
Evolution, Molecular , Genome/genetics , Genomics , Opossums/genetics , Animals , Base Composition , Conserved Sequence/genetics , DNA Transposable Elements/genetics , Humans , Polymorphism, Single Nucleotide/genetics , Protein Biosynthesis , Synteny/genetics , X Chromosome Inactivation/genetics
10.
Cell ; 125(2): 315-26, 2006 Apr 21.
Article in English | MEDLINE | ID: mdl-16630819

ABSTRACT

The most highly conserved noncoding elements (HCNEs) in mammalian genomes cluster within regions enriched for genes encoding developmentally important transcription factors (TFs). This suggests that HCNE-rich regions may contain key regulatory controls involved in development. We explored this by examining histone methylation in mouse embryonic stem (ES) cells across 56 large HCNE-rich loci. We identified a specific modification pattern, termed "bivalent domains," consisting of large regions of H3 lysine 27 methylation harboring smaller regions of H3 lysine 4 methylation. Bivalent domains tend to coincide with TF genes expressed at low levels. We propose that bivalent domains silence developmental genes in ES cells while keeping them poised for activation. We also found striking correspondences between genome sequence and histone methylation in ES cells, which become notably weaker in differentiated cells. These results highlight the importance of DNA sequence in defining the initial epigenetic landscape and suggest a novel chromatin-based mechanism for maintaining pluripotency.


Subject(s)
Chromatin/chemistry , Gene Expression Regulation, Developmental , Histones/metabolism , Nucleic Acid Conformation , Stem Cells/physiology , Animals , Cell Differentiation , Cells, Cultured , Chromatin/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Epigenesis, Genetic , Gene Expression Profiling , Histones/chemistry , Homeodomain Proteins/genetics , Homeodomain Proteins/metabolism , Male , Methylation , Mice , Mice, Inbred C57BL , Nanog Homeobox Protein , Octamer Transcription Factor-3/genetics , Octamer Transcription Factor-3/metabolism , Oligonucleotide Array Sequence Analysis , Stem Cells/cytology
11.
Nature ; 438(7069): 803-19, 2005 Dec 08.
Article in English | MEDLINE | ID: mdl-16341006

ABSTRACT

Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.


Subject(s)
Dogs/genetics , Evolution, Molecular , Genome/genetics , Genomics , Haplotypes/genetics , Animals , Conserved Sequence/genetics , Dog Diseases/genetics , Dogs/classification , Female , Humans , Hybridization, Genetic , Male , Mice , Mutagenesis/genetics , Polymorphism, Single Nucleotide/genetics , Rats , Short Interspersed Nucleotide Elements/genetics , Synteny/genetics
12.
Genome Res ; 14(5): 971-5, 2004 May.
Article in English | MEDLINE | ID: mdl-15123594

ABSTRACT

Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The project currently automatically annotates 10 complete genomes. This makes very large demands on compute resources, due to the vast number of sequence comparisons that need to be executed. To circumvent the financial outlay often associated with classical supercomputing environments, farms of multiple, lower-cost machines have now become the norm and have been deployed successfully with this project. The architecture and design of farms containing hundreds of compute nodes is complex and nontrivial to implement. This study will define and explain some of the essential elements to consider when designing such systems. Server architecture and network infrastructure are discussed with a particular emphasis on solutions that worked and those that did not (often with fairly spectacular consequences). The aim of the study is to give the reader, who may be implementing a large-scale biocompute project, an insight into some of the pitfalls that may be waiting ahead.


Subject(s)
Computational Biology/methods , Software , Computer Systems , Database Management Systems , Databases, Genetic , Online Systems , Software Design
13.
Genome Res ; 14(5): 925-8, 2004 May.
Article in English | MEDLINE | ID: mdl-15078858

ABSTRACT

Ensembl (http://www.ensembl.org/) is a bioinformatics project to organize biological information around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of individual genomes, and of the synteny and orthology relationships between them. It is also a framework for integration of any biological data that can be mapped onto features derived from the genomic sequence. Ensembl is available as an interactive Web site, a set of flat files, and as a complete, portable open source software system for handling genomes. All data are provided without restriction, and code is freely available. Ensembl's aims are to continue to "widen" this biological integration to include other model organisms relevant to understanding human biology as they become available; to "deepen" this integration to provide an ever more seamless linkage between equivalent components in different species; and to provide further classification of functional elements in the genome that have been previously elusive.


Subject(s)
Computational Biology/trends
14.
Bioinformatics ; 20(3): 426-7, 2004 Feb 12.
Article in English | MEDLINE | ID: mdl-14960472

ABSTRACT

Multiple sequence alignment remains a crucial method for understanding the function of groups of related nucleic acid and protein sequences. However, it is known that automatic multiple sequence alignments can often be improved by manual editing. Therefore, tools are needed to view and edit multiple sequence alignments. Due to growth in the sequence databases, multiple sequence alignments can often be large and difficult to view efficiently. The Jalview Java alignment editor is presented here, which enables fast viewing and editing of large multiple sequence alignments.


Subject(s)
Documentation , Hypermedia , Information Storage and Retrieval/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , User-Computer Interface , Algorithms , Database Management Systems , Word Processing
15.
Nature ; 420(6915): 520-62, 2002 Dec 05.
Article in English | MEDLINE | ID: mdl-12466850

ABSTRACT

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.


Subject(s)
Chromosomes, Mammalian/genetics , Evolution, Molecular , Genome , Mice/genetics , Physical Chromosome Mapping , Animals , Base Composition , Conserved Sequence/genetics , CpG Islands/genetics , Gene Expression Regulation , Genes/genetics , Genetic Variation/genetics , Genome, Human , Genomics , Humans , Mice/classification , Mice, Knockout , Mice, Transgenic , Models, Animal , Multigene Family/genetics , Mutagenesis , Neoplasms/genetics , Proteome/genetics , Pseudogenes/genetics , Quantitative Trait Loci/genetics , RNA, Untranslated/genetics , Repetitive Sequences, Nucleic Acid/genetics , Selection, Genetic , Sequence Analysis, DNA , Sex Chromosomes/genetics , Species Specificity , Synteny
SELECTION OF CITATIONS
SEARCH DETAIL
...