Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
1.
Nucleic Acids Res ; 50(W1): W108-W114, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35524558

ABSTRACT

Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations.


Subject(s)
Computer Simulation , Software , Humans , Bioengineering , Models, Biological , Registries , Research Personnel
2.
PLoS Biol ; 15(6): e2001414, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28662064

ABSTRACT

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.


Subject(s)
Biological Science Disciplines/methods , Computational Biology/methods , Data Mining/methods , Software Design , Software , Biological Science Disciplines/statistics & numerical data , Biological Science Disciplines/trends , Computational Biology/trends , Data Mining/statistics & numerical data , Data Mining/trends , Databases, Factual/statistics & numerical data , Databases, Factual/trends , Forecasting , Humans , Internet
3.
Proteins ; 61(4): 1075-88, 2005 Dec 01.
Article in English | MEDLINE | ID: mdl-16247798

ABSTRACT

Considering the limited success of the most sophisticated docking methods available and the amount of computation required for systematic docking, cataloging all the known interfaces may be an alternative basis for the prediction of protein tertiary and quaternary structures. We classify domain interfaces according to the geometry of domain-domain association. By applying a simple and efficient method called "interface tag clustering," more than 4,000 distinct types of domain interfaces are collected from Protein Quaternary Structure Server and Protein Data Bank. Given a pair of interacting domains, we define "face" as the set of interacting residues in each single domain and the pair of interacting faces as an "interface." We investigate how the geometry of interfaces relates to a network of interacting protein families, such as how many different binding orientations are possible between two families or whether a family uses distinct surfaces or the same surface when the family has diverse interaction partners from various families. We show there are, on average, 1.2-1.9 different types of interfaces between interacting domains and a significant number of family pairs associate in multiple orientations. In general, a family tends to use distinct faces for each partner when the family has diverse interaction partners. Each face is highly specific to its interaction partner and the binding orientation. The relative positions of interface residues are generally well conserved within the same type of interface even between remote homologs. The classification result is available at http://www.biotec.tu-dresden.de/~wkim/supplement.


Subject(s)
Proteins/chemistry , Amino Acid Sequence , Binding Sites , Crystallography, X-Ray , Markov Chains , Models, Theoretical , Protein Conformation , Sequence Alignment
4.
Appl Bioinformatics ; 4(2): 131-5, 2005.
Article in English | MEDLINE | ID: mdl-16128614

ABSTRACT

UNLABELLED: Receiver operating characteristic (ROC) analysis is a powerful and widely used technique for assessing predictive methods, yet there are no generic, open-source software tools for this that are freely available. Our ROCPLOT program performs ROC analysis on one or more files of search results (hits) and generates the following: (i) ROC values, giving a convenient numerical measure of method sensitivity and specificity; (ii) ROC plots graphically displaying sensitivity and specificity; (iii) classification plots to aid interpretation of the ROC plots and values; and (iv) a bar chart of the distribution of ROC values. ROCPLOT is generic and flexible: data in multiple hits files can be processed in series or parallel, allowing the results of multiple predictions to be viewed side-by-side or combined. AVAILABILITY: ROCPLOT is freely available for download as part of the European Molecular Biology Open Software Suite, EMBOSS (http://emboss.sourceforge.net/apps/rocplot.html).


Subject(s)
Computer Graphics , Data Interpretation, Statistical , Models, Biological , ROC Curve , Software , User-Computer Interface , Models, Statistical
5.
Protein Sci ; 14(1): 13-23, 2005 Jan.
Article in English | MEDLINE | ID: mdl-15608116

ABSTRACT

We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.


Subject(s)
Protein Structure, Tertiary , Proteins/chemistry , Proteins/classification , Algorithms , Amino Acid Sequence , Databases, Protein , Evaluation Studies as Topic , Globins/chemistry , Globins/classification , Molecular Sequence Data , Sequence Alignment/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...