Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
Add more filters










Publication year range
1.
HLA ; 102(2): 206-212, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37286192

ABSTRACT

The Genotype List (GL) String grammar for reporting HLA and Killer-cell Immunoglobulin-like Receptor (KIR) genotypes in a text string was described in 2013. Since this initial description, GL Strings have been used to describe HLA and KIR genotypes for more than 40 million subjects, allowing these data to be recorded, stored and transmitted in an easily parsed, text-based format. After a decade of working with HLA and KIR data in GL String format, with advances in HLA and KIR genotyping technologies that have fostered the generation of full-gene sequence data, the need for an extension of the GL String system has become clear. Here, we introduce the new GL String delimiter "?," which addresses the need to describe ambiguity in assigning a gene sequence to gene paralogs. GL Strings that do not include a "?" delimiter continue to be interpreted as originally described. This extension represents version 1.1 of the GL String grammar.


Subject(s)
Immunoglobulins , Receptors, KIR , Humans , Alleles , Genotype , Receptors, KIR/genetics , Immunoglobulins/genetics , Gene Frequency
2.
Molecules ; 26(4)2021 Feb 09.
Article in English | MEDLINE | ID: mdl-33572207

ABSTRACT

The empirical Lewis picture of the chemical bond dominates the view chemists have of molecules, of their stability and reactivity. Within the mathematical framework of quantum mechanics, all this chemical information is hidden in the many-particle wave function Ψ. Thus, to reveal and understand it, there is great interest in enhancing the Lewis model and connecting it to computable quantities. As has previously been shown, the Lewis picture can often be recovered from the probability density |Ψ|2 with probabilities in agreement with valence bond weights: the structures appear as most likely positions in the all-electron configuration space. Here, we systematically expand this topological probability density analysis to molecules with multiple bonds and lone pairs, employing correlated Slater-Jastrow wave functions. In contrast to earlier studies, non-Lewis structures are obtained that disagree with the prevalent picture and have a potentially better predictive capability. While functional groups are still recovered with these ab initio structures, the boundary between bonds and lone pairs is mostly blurred or non-existent. In order to understand the newly found structures, the Lewis electron pairs are replaced with spin-coupled electron motifs as the fundamental electronic fragment. These electron motifs-which coincide with Lewis' electron pairs for many single bonds-arise naturally from the generally applicable analysis presented. An attempt is made to rationalize the geometry of the newly-found structures by considering the Coulomb force and the Pauli repulsion.


Subject(s)
Electrons , Models, Molecular , Quantum Theory , Thermodynamics , Hydrogen Bonding
3.
PLoS Comput Biol ; 15(2): e1006791, 2019 02.
Article in English | MEDLINE | ID: mdl-30735498

ABSTRACT

BioJava is an open-source project that provides a Java library for processing biological data. The project aims to simplify bioinformatic analyses by implementing parsers, data structures, and algorithms for common tasks in genomics, structural biology, ontologies, phylogenetics, and more. Since 2012, we have released two major versions of the library (4 and 5) that include many new features to tackle challenges with increasingly complex macromolecular structure data. BioJava requires Java 8 or higher and is freely available under the LGPL 2.1 license. The project is hosted on GitHub at https://github.com/biojava/biojava. More information and documentation can be found online on the BioJava website (http://www.biojava.org) and tutorial (https://github.com/biojava/biojava-tutorial). All inquiries should be directed to the GitHub page or the BioJava mailing list (http://lists.open-bio.org/mailman/listinfo/biojava-l).


Subject(s)
Computational Biology/methods , Access to Information , Algorithms , Gene Library , Genome/genetics , Genomics , Information Storage and Retrieval , Internet , Software
4.
F1000Res ; 82019.
Article in English | MEDLINE | ID: mdl-32025286

ABSTRACT

The Bioinformatics Open Source Conference is a volunteer-organized meeting that covers open source software development and open science in bioinformatics. Launched in 2000, BOSC has been held every year since. BOSC 2019, the 20th annual BOSC, took place as one of the Communities of Special Interest (COSIs) at the Intelligent Systems for Molecular Biology meeting (ISMB/ECCB 2019). The two-day meeting included a total of 46 talks and 55 posters, as well as eight Birds of a Feather interest groups. The keynote speaker was University of Cape Town professor Dr. Nicola Mulder, who spoke on "Building infrastructure for responsible open science in Africa". Immediately after BOSC 2019, about 50 people participated in the two-day CollaborationFest (CoFest for short), an open and free community-driven event at which participants work together to contribute to bioinformatics software, documentation, training materials, and use cases.


Subject(s)
Computational Biology , Software , Congresses as Topic , Workflow
5.
Blood Adv ; 2(19): 2419-2429, 2018 10 09.
Article in English | MEDLINE | ID: mdl-30262602

ABSTRACT

Allogeneic hematopoietic stem cell transplantation (allo-HCT) is a curative option for blood cancers, but the coupled effects of graft-versus-tumor and graft-versus-host disease (GVHD) limit its broader application. Outcomes improve with matching at HLAs, but other factors are required to explain residual risk of GVHD. In an effort to identify genetic associations outside the major histocompatibility complex, we conducted a genome-wide clinical outcomes study on 205 acute myeloid leukemia patients and their fully HLA-A-, HLA-B-, HLA-C-, HLA-DRB1-, and HLA-DQB1-matched (10/10) unrelated donors. HLA-DPB1 T-cell epitope permissibility mismatches were observed in less than half (45%) of acute GVHD cases, motivating a broader search for genetic factors affecting clinical outcomes. A novel bioinformatics workflow adapted from neoantigen discovery found no associations between acute GVHD and known, HLA-restricted minor histocompatibility antigens (MiHAs). These results were confirmed with microarray data from an additional 988 samples. On the other hand, Y-chromosome-encoded single-nucleotide polymorphisms in 4 genes (PCDH11Y, USP9Y, UTY, and NLGN4Y) did associate with acute GVHD in male patients with female donors. Males in this category with acute GVHD had more Y-encoded variant peptides per patient with higher predicted HLA-binding affinity than males without GVHD who matched X-paralogous alleles in their female donors. Methods and results described here have an immediate impact for allo-HCT, warranting further development and larger genomic studies where MiHAs are clinically relevant, including cancer immunotherapy, solid organ transplant, and pregnancy.


Subject(s)
Antigens/genetics , Genes, Y-Linked , Genetic Association Studies , Genetic Predisposition to Disease , Graft vs Host Disease/etiology , Hematopoietic Stem Cell Transplantation/adverse effects , Acute Disease , Alleles , Amino Acid Sequence , Antigens/immunology , Chromosome Mapping , Female , HLA Antigens/chemistry , HLA Antigens/genetics , HLA Antigens/immunology , Histocompatibility Testing , Humans , Male , Minor Histocompatibility Antigens/chemistry , Minor Histocompatibility Antigens/genetics , Minor Histocompatibility Antigens/immunology , Transplantation Conditioning/methods , Transplantation, Homologous
6.
J Chem Theory Comput ; 14(4): 2052-2062, 2018 Apr 10.
Article in English | MEDLINE | ID: mdl-29518323

ABSTRACT

Sampled structure sequences obtained, for instance, from real-time reactivity explorations or first-principles molecular dynamics simulations contain valuable information about chemical reactivity. Eventually, such sequences allow for the construction of reaction networks that are required for the kinetic analysis of chemical systems. For this purpose, however, the sampled information must be processed to obtain stable chemical structures and associated transition states. The manual extraction of valuable information from such reaction paths is straightforward but unfeasible for large and complex reaction networks. For real-time quantum chemistry, this implies automatization of the extraction and relaxation process while maintaining immersion in the virtual chemical environment. Here, we describe an efficient path processing scheme for the on-the-fly construction of an exploration network by approximating the explored paths as continuous basis-spline curves.

7.
Genome Med ; 9(1): 113, 2017 Dec 18.
Article in English | MEDLINE | ID: mdl-29254494

ABSTRACT

The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods.


Subject(s)
Genome-Wide Association Study/methods , Polymorphism, Genetic , Protein Conformation , Sequence Analysis, Protein/methods , Algorithms , Congresses as Topic , Genome-Wide Association Study/standards , Humans , Sequence Analysis, Protein/standards
8.
Hum Immunol ; 77(3): 249-256, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26621609

ABSTRACT

Genotype list (GL) Strings use a set of hierarchical character delimiters to represent allele and genotype ambiguity in HLA and KIR genotypes in a complete and accurate fashion. A RESTful web service called genotype list service was created to allow users to register a GL string and receive a unique identifier for that string in the form of a URI. By exchanging URIs and dereferencing them through the GL service, users can easily transmit HLA genotypes in a variety of useful formats. The GL service was developed to be secure, scalable, and persistent. An instance of the GL service is configured with a nomenclature and can be run in strict or non-strict modes. Strict mode requires alleles used in the GL string to be present in the allele database using the fully qualified nomenclature. Non-strict mode allows any GL string to be registered as long as it is syntactically correct. The GL service source code is free and open source software, distributed under the GNU Lesser General Public License (LGPL) version 3 or later.


Subject(s)
Databases, Genetic , Genotype , HLA-C Antigens/genetics , Receptors, KIR/genetics , Web Browser , Alleles , Computational Biology/methods , Genetic Loci , Humans , Software
9.
PeerJ ; 3: e1273, 2015.
Article in English | MEDLINE | ID: mdl-26421241

ABSTRACT

Genomic pipelines consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This makes it easy to distribute and execute pipelines in a portable manner across a wide range of computing platforms. Thus, the question that arises is to what extent the use of Docker containers might affect the performance of these pipelines. Here we address this question and conclude that Docker containers have only a minor impact on the performance of common genomic pipelines, which is negligible when the executed jobs are long in terms of computational time.

10.
Hum Immunol ; 76(12): 954-62, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26407912

ABSTRACT

The development of next-generation sequencing (NGS) technologies for HLA and KIR genotyping is rapidly advancing knowledge of genetic variation of these highly polymorphic loci. NGS genotyping is poised to replace older methods for clinical use, but standard methods for reporting and exchanging these new, high quality genotype data are needed. The Immunogenomic NGS Consortium, a broad collaboration of histocompatibility and immunogenetics clinicians, researchers, instrument manufacturers and software developers, has developed the Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines. MIRING is a checklist that specifies the content of NGS genotyping results as well as a set of messaging guidelines for reporting the results. A MIRING message includes five categories of structured information - message annotation, reference context, full genotype, consensus sequence and novel polymorphism - and references to three categories of accessory information - NGS platform documentation, read processing documentation and primary data. These eight categories of information ensure the long-term portability and broad application of this NGS data for all current histocompatibility and immunogenetics use cases. In addition, MIRING can be extended to allow the reporting of genotype data generated using pre-NGS technologies. Because genotyping results reported using MIRING are easily updated in accordance with reference and nomenclature databases, MIRING represents a bold departure from previous methods of reporting HLA and KIR genotyping results, which have provided static and less-portable data. More information about MIRING can be found online at miring.immunogenomics.org.


Subject(s)
Genotyping Techniques , HLA Antigens/genetics , High-Throughput Nucleotide Sequencing/methods , Histocompatibility Testing , Potassium Channels, Inwardly Rectifying/genetics , Research Report , Guidelines as Topic , High-Throughput Nucleotide Sequencing/standards , Humans , Research Report/standards
11.
Hum Immunol ; 76(12): 963-74, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26319908

ABSTRACT

We present an electronic format for exchanging data for HLA and KIR genotyping with extensions for next-generation sequencing (NGS). This format addresses NGS data exchange by refining the Histoimmunogenetics Markup Language (HML) to conform to the proposed Minimum Information for Reporting Immunogenomic NGS Genotyping (MIRING) reporting guidelines (miring.immunogenomics.org). Our refinements of HML include two major additions. First, NGS is supported by new XML structures to capture additional NGS data and metadata required to produce a genotyping result, including analysis-dependent (dynamic) and method-dependent (static) components. A full genotype, consensus sequence, and the surrounding metadata are included directly, while the raw sequence reads and platform documentation are externally referenced. Second, genotype ambiguity is fully represented by integrating Genotype List Strings, which use a hierarchical set of delimiters to represent allele and genotype ambiguity in a complete and accurate fashion. HML also continues to enable the transmission of legacy methods (e.g. site-specific oligonucleotide, sequence-specific priming, and Sequence Based Typing (SBT)), adding features such as allowing multiple group-specific sequencing primers, and fully leveraging techniques that combine multiple methods to obtain a single result, such as SBT integrated with NGS.


Subject(s)
Genotyping Techniques , HLA Antigens/genetics , High-Throughput Nucleotide Sequencing , Histocompatibility Testing , Potassium Channels, Inwardly Rectifying/genetics , Research Report/standards , Software , Alleles , Computational Biology/methods , Computational Biology/standards , Database Management Systems , Genotype , Humans , Reproducibility of Results , Sequence Analysis, DNA
12.
Bioinformatics ; 28(20): 2693-5, 2012 Oct 15.
Article in English | MEDLINE | ID: mdl-22877863

ABSTRACT

UNLABELLED: BioJava is an open-source project for processing of biological data in the Java programming language. We have recently released a new version (3.0.5), which is a major update to the code base that greatly extends its functionality. RESULTS: BioJava now consists of several independent modules that provide state-of-the-art tools for protein structure comparison, pairwise and multiple sequence alignments, working with DNA and protein sequences, analysis of amino acid properties, detection of protein modifications and prediction of disordered regions in proteins as well as parsers for common file formats using a biologically meaningful data model. AVAILABILITY: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.6 or higher. All inquiries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists.


Subject(s)
Proteins/chemistry , Sequence Analysis , Software , Amino Acids/chemistry , Computational Biology , Genomics , Protein Conformation , Protein Processing, Post-Translational , Sequence Alignment , Sequence Analysis, DNA , Sequence Analysis, Protein
13.
Cancer Discov ; 2(5): 401-4, 2012 May.
Article in English | MEDLINE | ID: mdl-22588877

ABSTRACT

The cBio Cancer Genomics Portal (http://cbioportal.org) is an open-access resource for interactive exploration of multidimensional cancer genomics data sets, currently providing access to data from more than 5,000 tumor samples from 20 cancer studies. The cBio Cancer Genomics Portal significantly lowers the barriers between complex genomic data and cancer researchers who want rapid, intuitive, and high-quality access to molecular profiles and clinical attributes from large-scale cancer genomics projects and empowers researchers to translate these rich data sets into biologic insights and clinical applications.


Subject(s)
Database Management Systems , Databases, Factual , Genomics , Neoplasms/genetics , Humans , Internet
14.
Nucleic Acids Res ; 38(6): 1767-71, 2010 Apr.
Article in English | MEDLINE | ID: mdl-20015970

ABSTRACT

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.


Subject(s)
Sequence Analysis, DNA , Software , Computational Biology/history , History, 20th Century , History, 21st Century , Sequence Analysis, DNA/history , Sequence Analysis, DNA/standards
15.
Plant Physiol ; 138(1): 38-46, 2005 May.
Article in English | MEDLINE | ID: mdl-15888676

ABSTRACT

An international consortium is sequencing the euchromatic genespace of Medicago truncatula. Extensive bioinformatic and database resources support the marker-anchored bacterial artificial chromosome (BAC) sequencing strategy. Existing physical and genetic maps and deep BAC-end sequencing help to guide the sequencing effort, while EST databases provide essential resources for genome annotation as well as transcriptome characterization and microarray design. Finished BAC sequences are joined into overlapping sequence assemblies and undergo an automated annotation process that integrates ab initio predictions with EST, protein, and other recognizable features. Because of the sequencing project's international and collaborative nature, data production, storage, and visualization tools are broadly distributed. This paper describes databases and Web resources for the project, which provide support for physical and genetic maps, genome sequence assembly, gene prediction, and integration of EST data. A central project Web site at medicago.org/genome provides access to genome viewers and other resources project-wide, including an Ensembl implementation at medicago.org, physical map and marker resources at mtgenome.ucdavis.edu, and genome viewers at the University of Oklahoma (www.genome.ou.edu), the Institute for Genomic Research (www.tigr.org), and Munich Information for Protein Sequences Center (mips.gsf.de).


Subject(s)
Databases, Genetic , Genome, Plant , Internet , Medicago truncatula/genetics , Transcription, Genetic , Base Sequence , Chromosomes, Artificial, Bacterial
16.
BMC Bioinformatics ; 5: 195, 2004 Dec 10.
Article in English | MEDLINE | ID: mdl-15588317

ABSTRACT

BACKGROUND: Gecko (Gene Expression: Computation and Knowledge Organization) is a complete, high-capacity centralized gene expression analysis system, developed in response to the needs of a distributed user community. RESULTS: Based on a client-server architecture, with a centralized repository of typically many tens of thousands of Affymetrix scans, Gecko includes automatic processing pipelines for uploading data from remote sites, a data base, a computational engine implementing approximately 50 different analysis tools, and a client application. Among available analysis tools are clustering methods, principal component analysis, supervised classification including feature selection and cross-validation, multi-factorial ANOVA, statistical contrast calculations, and various post-processing tools for extracting data at given error rates or significance levels. On account of its open architecture, Gecko also allows for the integration of new algorithms. The Gecko framework is very general: non-Affymetrix and non-gene expression data can be analyzed as well. A unique feature of the Gecko architecture is the concept of the Analysis Tree (actually, a directed acyclic graph), in which all successive results in ongoing analyses are saved. This approach has proven invaluable in allowing a large (approximately 100 users) and distributed community to share results, and to repeatedly return over a span of years to older and potentially very complex analyses of gene expression data. CONCLUSIONS: The Gecko system is being made publicly available as free software http://sourceforge.net/projects/geckoe. In totality or in parts, the Gecko framework should prove useful to users and system developers with a broad range of analysis needs.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Software , Carcinoma/classification , Carcinoma/genetics , Carcinoma/pathology , Cell Line, Tumor , Cluster Analysis , Computational Biology/statistics & numerical data , Gene Expression Profiling/classification , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation, Neoplastic/genetics , Genes, Neoplasm/genetics , Humans , Kidney Neoplasms/classification , Kidney Neoplasms/genetics , Kidney Neoplasms/pathology , Oligonucleotide Array Sequence Analysis/classification , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Software Design , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...