Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Concurr Comput ; 23(17): 2258-2268, 2011 Dec 10.
Article in English | MEDLINE | ID: mdl-23335858

ABSTRACT

The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities. Copyright © 2011 John Wiley & Sons, Ltd.

2.
J Bioinform Comput Biol ; 8(6): 945-65, 2010 Dec.
Article in English | MEDLINE | ID: mdl-21121020

ABSTRACT

Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).


Subject(s)
Biomarkers , Computational Biology , Artificial Intelligence , Biomarkers/blood , Classification/methods , Databases, Factual , Gene Expression Profiling/statistics & numerical data , Humans , Microarray Analysis/statistics & numerical data , Models, Statistical , Oligonucleotide Array Sequence Analysis/statistics & numerical data
3.
Philos Trans A Math Phys Eng Sci ; 368(1926): 4133-45, 2010 Sep 13.
Article in English | MEDLINE | ID: mdl-20679127

ABSTRACT

OGSA-DAI (Open Grid Services Architecture Data Access and Integration) is a framework for building distributed data access and integration systems. Until recently, it lacked the built-in functionality that would allow easy creation of federations of distributed data sources. The latest release of the OGSA-DAI framework introduced the OGSA-DAI DQP (Distributed Query Processing) resource. The new resource encapsulates a distributed query processor, that is able to orchestrate distributed data sources when answering declarative user queries. The query processor has many extensibility points, making it easy to customize. We have also introduced a new OGSA-DAI Views resource that provides a flexible method for defining views over relational data. The interoperability of the two new resources, together with the flexibility of the OGSA-DAI framework, allows the building of highly customized data integration solutions.

4.
Philos Trans A Math Phys Eng Sci ; 367(1897): 2495-505, 2009 Jun 28.
Article in English | MEDLINE | ID: mdl-19451105

ABSTRACT

As large grid infrastructures, such as Enabling Grids for E-sciencE, mature, they are being used by scientists around the world in their daily work, running thousands of concurrent computational jobs and transferring large amounts of data. The successful and sustainable operation of such grid infrastructures is only possible through the use of monitoring tools. The underlying networks upon which grid infrastructures are built are critical to their operation; therefore, network monitoring becomes an important part of the overall grid monitoring strategy. In this paper, the design and implementation of a set of tools for providing access to federated network monitoring data are presented, based on standards developed within the Open Grid Forum Network Measurements Working Group (NM-WG). These tools give access to data collected by heterogeneous, NM-WG compliant network monitoring tools.

SELECTION OF CITATIONS
SEARCH DETAIL
...