ABSTRACT
The performance database (PDB) stores performance-related data gathered during workflow enactment. We argue that, by carefully understanding and manipulating these data, we can improve efficiency when enacting workflows. This paper describes the rationale behind the PDB, and proposes a systematic way to implement it. The prototype is built as part of the Advanced Data Mining and Integration Research for Europe project. We use workflows from real-world experiments to demonstrate the usage of PDB.
ABSTRACT
We provide an insight into the challenge of building and supporting a scientific data infrastructure with reference to our experience working with scientists from computational particle physics and molecular biology. We illustrate how, with modern high-performance computing resources, even small scientific groups can generate huge volumes (petabytes) of valuable scientific data and explain how grid technology can be used to manage, publish, share and curate these data. We describe the DiGS software application, which we have developed to meet the needs of smaller communities and we have highlighted the key elements of its functionality.