Pesquisa | Portal Regional da BVS

1.

Quickest Detection of COVID-19 Pandemic Onset.

Braca, P; Gaglione, D; Marano, S; Millefiori, L M; Willett, P; Pattipati, K.

IEEE Signal Process Lett ; 28: 683-687, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34163125

RESUMO

This paper develops an easily-implementable version of Page's CUSUM quickest-detection test, designed to work in certain composite hypothesis scenarios with time-varying data statistics. The decision statistic can be cast in a recursive form and is particularly suited for on-line analysis. By back-testing our approach on publicly-available COVID-19 data we find reliable early warning of infection flare-ups, in fact sufficiently early that the tool may be of use to decision-makers on the timing of restrictive measures that may in the future need to be taken.

2.

A study on the use of Gumbel approximation with the Bernoulli spatial scan statistic.

Read, S; Bath, P A; Willett, P; Maheswaran, R.

Stat Med ; 32(19): 3300-13, 2013 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-23348825

RESUMO

The Bernoulli version of the spatial scan statistic is a well established method of detecting localised spatial clusters in binary labelled point data, a typical application being the epidemiological case-control study. A recent study suggests the inferential accuracy of several versions of the spatial scan statistic (principally the Poisson version) can be improved, at little computational cost, by using the Gumbel distribution, a method now available in SaTScan(TM) (www.satscan.org). We study in detail the effect of this technique when applied to the Bernoulli version and demonstrate that it is highly effective, albeit with some increase in false alarm rates at certain significance thresholds. We explain how this increase is due to the discrete nature of the Bernoulli spatial scan statistic and demonstrate that it can affect even small p-values. Despite this, we argue that the Gumbel method is actually preferable for very small p-values. Furthermore, we extend previous research by running benchmark trials on 12 000 synthetic datasets, thus demonstrating that the overall detection capability of the Bernoulli version (i.e. ratio of power to false alarm rate) is not noticeably affected by the use of the Gumbel method. We also provide an example application of the Gumbel method using data on hospital admissions for chronic obstructive pulmonary disease.

Assuntos

Análise por Conglomerados , Interpretação Estatística de Dados , Idoso , Idoso de 80 Anos ou mais , Simulação por Computador , Reações Falso-Positivas , Feminino , Hospitalização/estatística & dados numéricos , Humanos , Masculino , Pessoa de Meia-Idade , Doença Pulmonar Obstrutiva Crônica/epidemiologia

3.

Similarity-based approaches to virtual screening.

Willett, P.

Biochem Soc Trans ; 31(Pt 3): 603-6, 2003 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-12773164

RESUMO

Current similarity measures for virtual screening are based on the use of molecular fingerprints and the Tanimoto coefficient. This paper describes two ways in which one can increase the effectiveness of similarity-based virtual screening: using similarity coefficients other than the Tanimoto coefficient for the comparison of molecular fingerprints; and using a graph-theoretic similarity measure based on the largest substructure common to a pair of molecules.

Assuntos

Desenho de Fármacos , Interface Usuário-Computador , Algoritmos , Simulação por Computador , Bases de Dados Factuais , Avaliação Pré-Clínica de Medicamentos/métodos

4.

Protein structures and information extraction from biological texts: the PASTA system.

Gaizauskas, R; Demetriou, G; Artymiuk, P J; Willett, P.

Bioinformatics ; 19(1): 135-43, 2003 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-12499303

RESUMO

MOTIVATION: The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow. RESULTS: We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date. AVAILABILITY: PASTA makes its extraction results available via a browser-based front end: http://www.dcs.shef.ac.uk/nlp/pasta/. The evaluation resources (manually annotated corpora) are also available through the website: http://www.dcs.shef.ac.uk/nlp/pasta/results.html.

Assuntos

Bases de Dados Bibliográficas , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Proteínas/química , Indexação e Redação de Resumos/métodos , Algoritmos , Bases de Dados de Proteínas , MEDLINE , Publicações Periódicas como Assunto , Conformação Proteica , Proteínas/classificação , Proteínas/genética , Publicações , Alinhamento de Sequência/métodos , Relação Estrutura-Atividade

5.

Bayesian classification and feature reduction using uniform Dirichlet priors.

Lynch, R R; Willett, P K.

IEEE Trans Syst Man Cybern B Cybern ; 33(3): 448-64, 2003.

Artigo em Inglês | MEDLINE | ID: mdl-18238191

RESUMO

In this paper, a method of classification referred to as the Bayesian data reduction algorithm (BDRA) is developed. The algorithm is based on the assumption that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and it employs a "greedy" approach (which is similar to a backward sequential feature search) for reducing irrelevant features from the training data of each class. Notice that reducing irrelevant features is synonymous here with selecting those features that provide best classification performance; the metric for making data-reducing decisions is an analytic for the probability of error conditioned on the training data. To illustrate its performance, the BDRA is applied both to simulated and to real data, and it is also compared to other classification methods. Further, the algorithm is extended to deal with the problem of missing features in the data. Results demonstrate that the BDRA performs well despite its relative simplicity. This is significant because the BDRA differs from many other classifiers; as opposed to adjusting the model to obtain a "best fit" for the data, the data, through its quantization, is itself adjusted.

6.

Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings.

Holliday, J D; Hu, C-Y; Willett, P.

Comb Chem High Throughput Screen ; 5(2): 155-66, 2002 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-11966424

RESUMO

This paper compares 22 different similarity coefficients when they are used for searching databases of 2D fragment bit-strings. Experiments with the National Cancer Institute s AIDS and IDAlert databases show that the coefficients fall into several well-marked clusters, in which the members of a cluster will produce comparable rankings of a set of molecules. These clusters provide a basis for selecting combinations of coefficients for use in data fusion experiments. The results of these experiments provide a simple way of increasing the effectiveness of fragment-based similarity searching systems.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Armazenamento e Recuperação da Informação , Estrutura Molecular , Síndrome da Imunodeficiência Adquirida , Humanos

7.

Recent developments in chemoinformatics education.

Schofield, H; Wiggins, G; Willett, P.

Drug Discov Today ; 6(18): 931-934, 2001 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-11546604

8.

Visual and computational analysis of structure--activity relationships in high-throughput screening data.

Gedeck, P; Willett, P.

Curr Opin Chem Biol ; 5(4): 389-95, 2001 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-11470601

RESUMO

Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. Recent work in visualisation and data mining has been used to develop structure--activity relationships from such chemical-biological datasets.

Assuntos

Biologia Computacional , Relação Estrutura-Atividade

9.

Protein docking using a genetic algorithm.

Gardiner, E J; Willett, P; Artymiuk, P J.

Proteins ; 44(1): 44-56, 2001 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-11354005

RESUMO

A genetic algorithm (GA) for protein-protein docking is described, in which the proteins are represented by dot surfaces calculated using the Connolly program. The GA is used to move the surface of one protein relative to the other to locate the area of greatest surface complementarity between the two. Surface dots are deemed complementary if their normals are opposed, their Connolly shape type is complementary, and their hydrogen bonding or hydrophobic potential is fulfilled. Overlap of the protein interiors is penalized. The GA is tested on 34 large protein-protein complexes where one or both proteins has been crystallized separately. Parameters are established for which 30 of the complexes have at least one near-native solution ranked in the top 100. We have also successfully reassembled a 1,400-residue heptamer based on the top-ranking GA solution obtained when docking two bound subunits.

Assuntos

Algoritmos , Complexo Antígeno-Anticorpo/química , Inibidores Enzimáticos/química , Enzimas/química , Complexo Antígeno-Anticorpo/imunologia , Inibidores Enzimáticos/metabolismo , Enzimas/metabolismo , Substâncias Macromoleculares , Proteínas de Membrana/química , Modelos Moleculares , Subunidades Proteicas , Propriedades de Superfície

10.

SuperStar: improved knowledge-based interaction fields for protein binding sites.

Verdonk, M L; Cole, J C; Watson, P; Gillet, V; Willett, P.

J Mol Biol ; 307(3): 841-59, 2001 Mar 30.

Artigo em Inglês | MEDLINE | ID: mdl-11273705

RESUMO

SuperStar is an empirical method for identifying interaction sites in proteins, based entirely on experimental information about non-bonded interactions occurring in small-molecule crystal structures, taken from the IsoStar database. We describe recent modifications and additions to SuperStar, validating the results on a test set of 122 X-ray structures of protein-ligand complexes. In this validation, propensity maps are generated for all the binding sites of these proteins, using four different probes: a charged NH(+)(3) nitrogen atom, a carbonyl oxygen atom, a hydroxyl oxygen atom and a methyl carbon atom. Next, the maps are compared with the experimentally observed positions of ligand atoms of these types. A peak-searching algorithm is introduced that highlights potential interaction hot spots. For the three hydrogen-bonding probes - NH(+)(3) nitrogen atom, carbonyl oxygen atom and hydroxyl oxygen atom - the average distance from the ligand atom to the nearest SuperStar peak is 1.0-1.2 A (0.8-1.0 A for solvent-inaccessible ligand atoms). For the methyl carbon atom probe, this distance is about 1.5 A, probably because interactions to methyl groups are much less directional. The most important addition to SuperStar is the enabling of propensity maps around metal centres - Ca(2+), Mg(2+) and Zn(2+) - in protein binding sites. The results are validated on a test set of 24 protein-ligand complexes that have a metal ion in their binding site. Coordination geometries are derived automatically, using only the protein atoms that coordinate to the metal ion. The correct coordination geometry is derived in approximately 75 % of the cases. If the derived geometry is assumed during the SuperStar calculation, the average distance from a ligand atom coordinating to the metal ion to the nearest peak in the propensity map for an oxygen probe is 0.87(7) A. If the correct coordination geometry is imposed, this distance reduces to 0.59(7)A. This indicates that the SuperStar predictions around metal-binding sites are at least as good as those around other protein groups. Using clustering techniques, a non-redundant set of probes is selected from the set of probes available in the IsoStar database. The performance in SuperStar of all these probes is tested on the test set of protein-ligand complexes. With the exception of the "ether oxygen" probe and the "any NH(+)" probe, all new probes perform as well as the four probes introduced first.

Assuntos

Simulação por Computador , Metais/metabolismo , Proteínas/química , Proteínas/metabolismo , Algoritmos , Sítios de Ligação , Carbono/metabolismo , Análise por Conglomerados , Cristalografia por Raios X , Bases de Dados como Assunto , Hidrogênio/metabolismo , Ligantes , Modelos Moleculares , Nitrogênio/metabolismo , Oxigênio/metabolismo , Maleabilidade , Ligação Proteica , Conformação Proteica , Reprodutibilidade dos Testes , Água/química , Água/metabolismo

11.

Calculating the knowledge-based similarity of functional groups using crystallographic data.

Watson, P; Willett, P; Gillet, V J; Verdonk, M L.

J Comput Aided Mol Des ; 15(9): 835-57, 2001 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-11776294

RESUMO

A knowledge-based method for calculating the similarity of functional groups is described and validated. The method is based on experimental information derived from small molecule crystal structures. These data are used in the form of scatterplots that show the likelihood of a non-bonded interaction being formed between functional group A (the 'central group') and functional group B (the 'contact group' or 'probe'). The scatterplots are converted into three-dimensional maps that show the propensity of the probe at different positions around the central group. Here we describe how to calculate the similarity of a pair of central groups based on these maps. The similarity method is validated using bioisosteric functional group pairs identified in the Bioster database and Relibase. The Bioster database is a critical compilation of thousands of bioisosteric molecule pairs, including drugs, enzyme inhibitors and agrochemicals. Relibase is an object-oriented database containing structural data about protein-ligand interactions. The distributions of the similarities of the bioisosteric functional group pairs are compared with similarities for all the possible pairs in IsoStar, and are found to be significantly different. Enrichment factors are also calculated showing the similarity method is statistically significantly better than random in predicting bioisosteric functional group pairs.

Assuntos

Inteligência Artificial , Cristalografia , Sítios de Ligação , Simulação por Computador , Modelos Químicos , Modelos Moleculares

12.

Automatic generation of alignments for 3D QSAR analyses.

Jewell, N E; Turner, D B; Willett, P; Sexton, G J.

J Mol Graph Model ; 20(2): 111-21, 2001.

Artigo em Inglês | MEDLINE | ID: mdl-11774998

RESUMO

Many 3D QSAR methods require the alignment of the molecules in a dataset, which can require a fair amount of manual effort in deciding upon a rational basis for the superposition. This paper describes the use of FBSS, a program for field-based similarity searching in chemical databases, for generating such alignments automatically. The CoMFA and CoMSIA experiments with several literature datasets show that the QSAR models resulting from the FBSS alignments are broadly comparable in predictive performance with the models resulting from manual alignments.

Assuntos

Simulação por Computador , Desenho de Fármacos , Relação Quantitativa Estrutura-Atividade , Bases de Dados de Proteínas , Modelos Moleculares , Alinhamento de Sequência/estatística & dados numéricos , Software

13.

The EVA spectral descriptor.

Turner, D B; Willett, P.

Eur J Med Chem ; 35(4): 367-75, 2000 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-10858598

RESUMO

The EVA descriptor is derived from fundamental IR and Raman range molecular vibrational frequencies. EVA is sensitive to 3-D structure, but has an advantage over field-based 3-D QSAR methods inasmuch as it is invariant to both translation and rotation of the structures concerned and thus structural superposition is not required. The latter property and the demonstration of the effectiveness of the descriptor for QSAR means that EVA has been the subject of a great deal of interest from the modelling community. This review describes the derivation of the descriptor, details its main parameters and how to apply them, and provides an overview of the validation that has been done with the descriptor. A recent enhancement to the technique is described which involves the localised adjustment of variance in such a way that enhanced internal and external predictability may be obtained. Despite the statistical quality of EVA QSAR models, the main draw-back to the descriptor at present is the difficulty associated with back-tracking from a PLS model to an EVA pharmacophore. Brief comment is made on the use of the EVA descriptor for diversity studies and the similarity searching of chemical structure databases.

Assuntos

Análise Espectral/métodos , Análise Espectral/normas , Relação Estrutura-Atividade

14.

Bit-string methods for selective compound acquisition

Rhodes N; Willett P; Dunbar JB; Humblet C.

J Chem Inf Comput Sci ; 40(2): 210-4, 2000 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-10761120

RESUMO

Selective compound acquisition programs need to ensure that the compounds that are chosen do not contain undesirable functionality. This is easy to achieve if a supplier is prepared to provide unambiguous structure representations for the compounds that they have available: this paper discusses selection techniques that can be used when a supplier is prepared to make available only fragment bit-string representations for the compounds in their catalog. Experiments with three databases and three types of bit-string show that a simple k-nearest-neighbor searching method provides a surprisingly effective, although far from perfect, way of selecting compounds when only bit-string representations are available. A second approach, based on the use of a fragment weighting scheme analogous to those used in substructural analysis studies, proved to be noticeably less effective in operation.

15.

Graph-theoretic techniques for macromolecular docking

Gardiner EJ; Willett P; Artymiuk PJ.

J Chem Inf Comput Sci ; 40(2): 273-9, 2000 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-10761128

RESUMO

We propose a solution to the problem of docking two macromolecules. We represent each of two proteins as a set of potential hydrogen bond donors and acceptors and use a clique-detection algorithm to find maximally complementary sets of donor/acceptor pairs. Preliminary results are presented which demonstrate the feasibility of the method.

16.

Similarity searching in files of three-dimensional chemical structures: analysis of the BIOSTER database using two-dimensional fingerprints and molecular field descriptors

Schuffenhauer A; Gillet VJ; Willett P.

J Chem Inf Comput Sci ; 40(2): 295-307, 2000 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-10761131

RESUMO

This paper compares the effectiveness of similarity measures based on two-dimensional fingerprints and on molecular fields for identifying pairs of bioisosteric molecules in the BIOSTER database. The results suggest that the two types of descriptor are complementary in nature, each finding some bioisosteric pairs that are not found by the other. This conclusion is confirmed by studies of groups of BIOSTER molecules that share the same activity characteristics, and by experiments that involve combining the two types of similarity measure.

17.

Evaluation of the EVA descriptor for QSAR studies: 3. The use of a genetic algorithm to search for models with enhanced predictive properties (EVA_GA).

Turner, D B; Willett, P.

J Comput Aided Mol Des ; 14(1): 1-21, 2000 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-10702922

RESUMO

The EVA structural descriptor, based upon calculated fundamental molecular vibrational frequencies, has proved to be an effective descriptor for both QSAR and database similarity calculations. The descriptor is sensitive to 3D structure but has an advantage over field-based 3D-QSAR methods inasmuch as structural superposition is not required. The original technique involves a standardisation method wherein uniform Gaussians of fixed standard deviation (sigma) are used to smear out frequencies projected onto a linear scale. The smearing function permits the overlap of proximal frequencies and thence the extraction of a fixed dimensional descriptor regardless of the number and precise values of the frequencies. It is proposed here that there exist optimal localised values of sigma in different spectral regions; that is, the overlap of frequencies using uniform Gaussians may, at certain points in the spectrum, either be insufficient to pick up relationships where they exist or mix up information to such an extent that significant correlations are obscured by noise. A genetic algorithm is used to search for optimal localised sigma values using crossvalidated PLS regression scores as the fitness score to be optimised. The resultant models were then validated against a previously unseen test set of compounds and through data scrambling. The performance of EVA_GA is compared to that of EVA and analogous CoMFA studies; in the latter case a brief evaluation is made of the effect of grid resolution upon the stability of CoMFA PLS scores particularly in relation to test set predictions.

Assuntos

Algoritmos , Relação Estrutura-Atividade , Bases de Dados Factuais , Desenho de Fármacos , Ligantes , Modelos Genéticos , Receptores de Superfície Celular/metabolismo , Receptores Citoplasmáticos e Nucleares/metabolismo , Receptores de Melatonina , Software , Transcortina/metabolismo

18.

Chemoinformatics - similarity and diversity in chemical libraries.

Willett, P.

Curr Opin Biotechnol ; 11(1): 85-8, 2000 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-10679335

RESUMO

Molecular similarity and molecular diversity techniques lie at the heart of attempts to design structurally diverse combinatorial libraries for the identification of novel bioactive compounds. Recent advances include the development of new types of selection algorithm, the validation of such algorithms, the use of filtering systems to screen out undesirable molecules prior to the design of a library, and the integration of similarity and diversity analysis with other methods for computer-aided molecular design.

Assuntos

Técnicas de Química Combinatória/métodos , Desenho Assistido por Computador , Desenho de Fármacos , Algoritmos , Reprodutibilidade dos Testes , Relação Estrutura-Atividade

19.

Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures.

Edgar, S J; Holliday, J D; Willett, P.

J Mol Graph Model ; 18(4-5): 343-57, 2000.

Artigo em Inglês | MEDLINE | ID: mdl-11143554

RESUMO

This article reviews measures for evaluating the effectiveness of similarity searches in chemical databases, drawing principally upon the many measures that have been described previously for evaluating the performance of text search engines. The use of the various measures is exemplified by fragment-based 2D similarity searches on several databases for which both structural and bioactivity data are available. It is concluded that the cumulative recall and G-H score measures are the most useful of those tested.

Assuntos

Química , Bases de Dados Factuais , Sistemas de Informação , Fenômenos Químicos , Modelos Químicos , Estrutura Molecular

20.

Dissimilarity-based algorithms for selecting structurally diverse sets of compounds.

Willett, P.

J Comput Biol ; 6(3-4): 447-57, 1999.

Artigo em Inglês | MEDLINE | ID: mdl-10582578

RESUMO

This paper commences with a brief introduction to modern techniques for the computational analysis of molecular diversity and the design of combinatorial libraries. It then reviews dissimilarity-based algorithms for the selection of structurally diverse sets of compounds in chemical databases. Procedures are described for selecting a diverse subset of an entire database, and for selecting diverse combinatorial libraries using both reagent-based and product-based selection.

Assuntos

Algoritmos , Técnicas de Química Combinatória , Estrutura Molecular , Análise por Conglomerados , Bases de Dados Factuais , Desenho de Fármacos , Indicadores e Reagentes

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA