Search | VHL Regional Portal

Drug discovery using very large numbers of patents: general strategy with extensive use of match and edit operations.

Robson, Barry; Li, Jin; Dettinger, Richard; Peters, Amanda; Boyer, Stephen K.

J Comput Aided Mol Des ; 25(5): 427-41, 2011 May.

Article in English | MEDLINE | ID: mdl-21538091

ABSTRACT

A patent data base of 6.7 million compounds generated by a very high performance computer (Blue Gene) requires new techniques for exploitation when extensive use of chemical similarity is involved. Such exploitation includes the taxonomic classification of chemical themes, and data mining to assess mutual information between themes and companies. Importantly, we also launch candidates that evolve by "natural selection" as failure of partial match against the patent data base and their ability to bind to the protein target appropriately, by simulation on Blue Gene. An unusual feature of our method is that algorithms and workflows rely on dynamic interaction between match-and-edit instructions, which in practice are regular expressions. Similarity testing by these uses SMILES strings and, less frequently, graph or connectivity representations. Examining how this performs in high throughput, we note that chemical similarity and novelty are human concepts that largely have meaning by utility in specific contexts. For some purposes, mutual information involving chemical themes might be a better concept.

Subject(s)

Artificial Intelligence , Computer Simulation , Drug Discovery , Information Storage and Retrieval/methods , Patents as Topic/statistics & numerical data , Pattern Recognition, Automated/methods , Algorithms , Data Interpretation, Statistical , Databases, Factual , Humans , Image Interpretation, Computer-Assisted/instrumentation , Small Molecule Libraries

Rapid analysis of pharmacology for infectious diseases.

Hopkins, Andrew L; Bickerton, G Richard; Carruthers, Ian M; Boyer, Stephen K; Rubin, Harvey; Overington, John P.

Curr Top Med Chem ; 11(10): 1292-300, 2011.

Article in English | MEDLINE | ID: mdl-21401504

ABSTRACT

Pandemic, epidemic and endemic infectious diseases are united by a common problem: how do we rapidly and cost-effectively identify potential pharmacological interventions to treat infections? Given the large number of emerging and neglected infectious diseases and the fact that they disproportionately afflict the poorest members of the global society, new ways of thinking are required to developed high productivity discovery systems that can be applied to a larger number of pathogens. The growing availability of parasite genome data provides the basis for developing methods to prioritize, a priori, the potential drug target and pharmacological landscape of an infectious disease. Thus the overall objective of infectious disease informatics is to enable the rapid generation of plausible, novel medical hypotheses of testable pharmacological experiments, by uncovering undiscovered relationships in the wealth of biomedical literature and databases that were collected for other purposes. In particular our goal is to identify potential drug targets present in a pathogen genome and prioritize which pharmacological experiments are most likely to discover drug-like lead compounds rapidly against a pathogen (i.e. which specific compounds and drug targets should be screened, in which assays and where they can be sourced). An integral part of the challenge is the development and integration of methods to predict druggability, essentiality, synthetic lethality and polypharmacology in pathogen genomes, while simultaneously integrating the inevitable issues of chemical tractability and the potential for acquired drug resistance from the start.

Subject(s)

Communicable Diseases/drug therapy , Animals , Communicable Diseases/epidemiology , Drug Design , Epidemics , Genome , Genomics/trends , Humans

ChemBrowser: a flexible framework for mining chemical documents.

Wu, Xian; Zhang, Li; Chen, Ying; Rhodes, James; Griffin, Thomas D; Boyer, Stephen K; Alba, Alfredo; Cai, Keke.

Adv Exp Med Biol ; 680: 57-64, 2010.

Article in English | MEDLINE | ID: mdl-20865486

ABSTRACT

The ability to extract chemical and biological entities and relations from text documents automatically has great value to biochemical research and development activities. The growing maturity of text mining and artificial intelligence technologies shows promise in enabling such automatic chemical entity extraction capabilities (called "Chemical Annotation" in this paper). Many techniques have been reported in the literature, ranging from dictionary and rule-based techniques to machine learning approaches. In practice, we found that no single technique works well in all cases. A combinatorial approach that allows one to quickly compose different annotation techniques together for a given situation is most effective. In this paper, we describe the key challenges we face in real-world chemical annotation scenarios. We then present a solution called ChemBrowser which has a flexible framework for chemical annotation. ChemBrowser includes a suite of customizable processing units that might be utilized in a chemical annotator, a high-level language that describes the composition of various processing units that would form a chemical annotator, and an execution engine that translates the composition language to an actual annotator that can generate annotation results for a given set of documents. We demonstrate the impact of this approach by tailoring an annotator for extracting chemical names from patent documents and show how this annotator can be easily modified with simple configuration alone.

Subject(s)

Chemistry/statistics & numerical data , Data Mining , Search Engine , Algorithms , Artificial Intelligence , Computational Biology , Databases, Factual , Natural Language Processing , Patents as Topic , Terminology as Topic

Annotating patents with Medline MeSH codes via citation mapping.

Griffin, Thomas D; Boyer, Stephen K; Councill, Isaac G.

Adv Exp Med Biol ; 680: 737-44, 2010.

Article in English | MEDLINE | ID: mdl-20865561

ABSTRACT

Both patents and Medline are important document collections for discovering new relationships between chemicals and biology, searching for prior art for patent applications and retrieving background knowledge for current research activities. Finding relevance to a topic within patents is often made difficult by poor categorization, badly written descriptions, and even intentional obfuscation. Unlike patents, the Medline corpus has Medical Subject Heading (MeSH) keywords manually added to their articles, giving a medically relevant taxonomy to the 18 million article abstracts. Our work attempts to accurately recognize the citations made in patents to Medline-indexed articles, linking them to their corresponding PubMed ID and exploiting the associated MeSH to enhance patent search by annotating the referencing patents with their Medline citations' MeSH codes. The techniques, system features, and benefits are explained.

Subject(s)

MEDLINE/statistics & numerical data , Medical Subject Headings , Patents as Topic/statistics & numerical data , Biotechnology/statistics & numerical data , Computational Biology , Humans , Search Engine , United States

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL