Search | VHL Regional Portal

1.

Mining chemical information from open patents.

Jessop, David M; Adams, Sam E; Murray-Rust, Peter.

J Cheminform ; 3(1): 40, 2011 Oct 14.

Article in English | MEDLINE | ID: mdl-21999425

ABSTRACT

Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.

2.

OSCAR4: a flexible architecture for chemical text-mining.

Jessop, David M; Adams, Sam E; Willighagen, Egon L; Hawizy, Lezan; Murray-Rust, Peter.

J Cheminform ; 3(1): 41, 2011 Oct 14.

Article in English | MEDLINE | ID: mdl-21999457

ABSTRACT

The Open-Source Chemistry Analysis Routines (OSCAR) software, a toolkit for the recognition of named entities and data in chemistry publications, has been developed since 2002. Recent work has resulted in the separation of the core OSCAR functionality and its release as the OSCAR4 library. This library features a modular API (based on reduction of surface coupling) that permits client programmers to easily incorporate it into external applications. OSCAR4 offers a domain-independent architecture upon which chemistry specific text-mining tools can be built, and its development and usage are discussed.

3.

The semantic architecture of the World-Wide Molecular Matrix (WWMM).

Murray-Rust, Peter; Adams, Sam E; Downing, Jim; Townsend, Joe A; Zhang, Yong.

J Cheminform ; 3(1): 42, 2011 Oct 14.

Article in English | MEDLINE | ID: mdl-21999475

ABSTRACT

The World-Wide Molecular Matrix (WWMM) is a ten year project to create a peer-to-peer (P2P) system for the publication and collection of chemical objects, including over 250, 000 molecules. It has now been instantiated in a number of repositories which include data encoded in Chemical Markup Language (CML) and linked by URIs and RDF. The technical specification and implementation is now complete. We discuss the types of architecture required to implement nodes in the WWMM and consider the social issues involved in adoption.

4.

The semantics of Chemical Markup Language (CML): dictionaries and conventions.

Murray-Rust, Peter; Townsend, Joe A; Adams, Sam E; Phadungsukanan, Weerapong; Thomas, Jens.

J Cheminform ; 3: 43, 2011 Oct 14.

Article in English | MEDLINE | ID: mdl-21999509

ABSTRACT

The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs.

5.

Ami - The chemist's amanuensis.

Brooks, Brian J; Thorn, Adam L; Smith, Matthew; Matthews, Peter; Chen, Shaoming; O'Steen, Ben; Adams, Sam E; Townsend, Joe A; Murray-Rust, Peter.

J Cheminform ; 3: 45, 2011 Oct 14.

Article in English | MEDLINE | ID: mdl-21999587

ABSTRACT

The Ami project was a six month Rapid Innovation project sponsored by JISC to explore the Virtual Research Environment space. The project brainstormed with chemists and decided to investigate ways to facilitate monitoring and collection of experimental data.A frequently encountered use-case was identified of how the chemist reaches the end of an experiment, but finds an unexpected result. The ability to replay events can significantly help make sense of how things progressed. The project therefore concentrated on collecting a variety of dimensions of ancillary data - data that would not normally be collected due to practicality constraints. There were three main areas of investigation: 1) Development of a monitoring tool using infrared and ultrasonic sensors; 2) Time-lapse motion video capture (for example, videoing 5 seconds in every 60); and 3) Activity-driven video monitoring of the fume cupboard environs.The Ami client application was developed to control these separate logging functions. The application builds up a timeline of the events in the experiment and around the fume cupboard. The videos and data logs can then be reviewed after the experiment in order to help the chemist determine the exact timings and conditions used.The project experimented with ways in which a Microsoft Kinect could be used in a laboratory setting. Investigations suggest that it would not be an ideal device for controlling a mouse, but it shows promise for usages such as manipulating virtual molecules.

6.

Chapter 9 Molecular Similarity: Advances in Methods, Applications and Validations in Virtual Screening and QSAR.

Bender, Andreas; Jenkins, Jeremy L; Li, Qingliang; Adams, Sam E; Cannon, Edward O; Glen, Robert C.

Annu Rep Comput Chem ; 2: 141-168, 2006.

Article in English | MEDLINE | ID: mdl-32362803

ABSTRACT

This chapter discusses recent developments in some of the areas that exploit the molecular similarity principle, novel approaches to capture molecular properties by the use of novel descriptors, focuses on a crucial aspect of computational models-their validity, and discusses additional ways to examine data available, such as those from high-throughput screening (HTS) campaigns and to gain more knowledge from this data. The chapter also presents some of the recent applications of methods discussed focusing on the successes of virtual screening applications, database clustering and comparisons (such as drug- and in-house-likeness), and the recent large-scale validations of docking and scoring programs. While a great number of descriptors and modeling methods has been proposed until today, the recent trend toward proper model validation is very much appreciated. Although some of their limitations are surely because of underlying principles and limitations of fundamental concepts, others will certainly be eliminated in the future.

7.

Chemical documents: machine understanding and automated information extraction.

Townsend, Joe A; Adams, Sam E; Waudby, Christopher A; de Souza, Vanessa K; Goodman, Jonathan M; Murray-Rust, Peter.

Org Biomol Chem ; 2(22): 3294-300, 2004 Nov 21.

Article in English | MEDLINE | ID: mdl-15534707

ABSTRACT

Automatically extracting chemical information from documents is a challenging task, but an essential one for dealing with the vast quantity of data that is available. The task is least difficult for structured documents, such as chemistry department web pages or the output of computational chemistry programs, but requires increasingly sophisticated approaches for less structured documents, such as chemical papers. The identification of key units of information, such as chemical names, makes the extraction of useful information from unstructured documents possible.

Subject(s)

Chemistry/methods , Electronic Data Processing/methods , Software , Internet , Terminology as Topic

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL