Search | VHL Regional Portal

Evaluating architecture impact on system energy efficiency.

Yu, Shijie; Yang, Hailong; Wang, Rui; Luan, Zhongzhi; Qian, Depei.

PLoS One ; 12(11): e0188428, 2017.

Article in English | MEDLINE | ID: mdl-29161317

ABSTRACT

As the energy consumption has been surging in an unsustainable way, it is important to understand the impact of existing architecture designs from energy efficiency perspective, which is especially valuable for High Performance Computing (HPC) and datacenter environment hosting tens of thousands of servers. One obstacle hindering the advance of comprehensive evaluation on energy efficiency is the deficient power measuring approach. Most of the energy study relies on either external power meters or power models, both of these two methods contain intrinsic drawbacks in their practical adoption and measuring accuracy. Fortunately, the advent of Intel Running Average Power Limit (RAPL) interfaces has promoted the power measurement ability into next level, with higher accuracy and finer time resolution. Therefore, we argue it is the exact time to conduct an in-depth evaluation of the existing architecture designs to understand their impact on system energy efficiency. In this paper, we leverage representative benchmark suites including serial and parallel workloads from diverse domains to evaluate the architecture features such as Non Uniform Memory Access (NUMA), Simultaneous Multithreading (SMT) and Turbo Boost. The energy is tracked at subcomponent level such as Central Processing Unit (CPU) cores, uncore components and Dynamic Random-Access Memory (DRAM) through exploiting the power measurement ability exposed by RAPL. The experiments reveal non-intuitive results: 1) the mismatch between local compute and remote memory node caused by NUMA effect not only generates dramatic power and energy surge but also deteriorates the energy efficiency significantly; 2) for multithreaded application such as the Princeton Application Repository for Shared-Memory Computers (PARSEC), most of the workloads benefit a notable increase of energy efficiency using SMT, with more than 40% decline in average power consumption; 3) Turbo Boost is effective to accelerate the workload execution and further preserve the energy, however it may not be applicable on system with tight power budget.

Subject(s)

Computing Methodologies , Efficiency , Electric Power Supplies , Algorithms , Architecture , Physical Phenomena

Speeding up profiling program's runtime characteristics for workload consolidation.

Wang, Lin; Qian, Depei; Luan, Zhongzhi; Wei, Guang; Wang, Rui; Yang, Hailong.

PLoS One ; 12(4): e0175861, 2017.

Article in English | MEDLINE | ID: mdl-28448575

ABSTRACT

Workload consolidation is a common method to increase resource utilization of the clusters or data centers while still trying to ensure the performance of the workloads. In order to get the maximum benefit from workload consolidation, the task scheduler has to understand the runtime characteristics of the individual program and schedule the programs with less resource conflict onto the same server. We propose a set of metrics to comprehensively depict the runtime characteristics of programs. The metrics set consists of two types of metrics: resource usage and resource sensitivity. The resource sensitivity refers to the performance degradation caused by insufficient resources. The resource usage of a program is easy to get by common performance analysis tools, but the resource sensitivity can not be obtained directly. The simplest and the most intuitive way to obtain the resource sensitivity of a program is to run the program in an environment with controllable resources and record the performance achieved under all possible resource conditions. However, such a process is very much time consuming when multiple resources are involved and each resource is controlled in fine granularity. In order to obtain the resource sensitivity of a program quickly, we propose a method to speed up the resource sensitivity profiling process. Our method is realized based on two level profiling acceleration strategies. First, taking advantage of the resource usage information, we set up the maximum resource usage of the program as the upper bound of the controlled resource. In this way, the range of controlling resource levels can be narrowed, and the number of experiments can be significantly reduced. Secondly, using a prediction model achieved by interpolation, we can reduce the time spent on profiling even further because the resource sensitivity in most of the resource conditions is obtained by interpolation instead of real program execution. These two profiling acceleration strategies have been implemented and applied in profiling program runtime characteristics. Our experiment results show that the proposed two-level profiling acceleration strategy not only shortens the process of profiling, but also guarantees the accuracy of the resource sensitivity. With the fast profiling method, the average absolute error of the resource sensitivity can be controlled within 0.05.

Subject(s)

Computers , Software/standards , Electronic Data Processing , Time Factors

IOPA: I/O-aware parallelism adaption for parallel programs.

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei.

PLoS One ; 12(3): e0173038, 2017.

Article in English | MEDLINE | ID: mdl-28278236

ABSTRACT

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.

Subject(s)

Software , Algorithms , Computers

Correction: Game Theoretical Analysis on Cooperation Stability and Incentive Effectiveness in Community Networks.

Song, Kaida; Wang, Rui; Liu, Yi; Qian, Depei; Zhang, Han; Cai, Jihong.

PLoS One ; 11(2): e0148688, 2016.

Article in English | MEDLINE | ID: mdl-26829125

Game Theoretical Analysis on Cooperation Stability and Incentive Effectiveness in Community Networks.

Song, Kaida; Wang, Rui; Liu, Yi; Qian, Depei; Zhang, Han; Cai, Jihong.

PLoS One ; 10(11): e0141755, 2015.

Article in English | MEDLINE | ID: mdl-26551649

ABSTRACT

Community networks, the distinguishing feature of which is membership admittance, appear on P2P networks, social networks, and conventional Web networks. Joining the network costs money, time or network bandwidth, but the individuals get access to special resources owned by the community in return. The prosperity and stability of the community are determined by both the policy of admittance and the attraction of the privileges gained by joining. However, some misbehaving users can get the dedicated resources with some illicit and low-cost approaches, which introduce instability into the community, a phenomenon that will destroy the membership policy. In this paper, we analyze on the stability using game theory on such a phenomenon. We propose a game-theoretical model of stability analysis in community networks and provide conditions for a stable community. We then extend the model to analyze the effectiveness of different incentive policies, which could be used when the community cannot maintain its members in certain situations. Then we verify those models through a simulation. Finally, we discuss several ways to promote community network's stability by adjusting the network's properties and give some proposal on the designs of these types of networks from the points of game theory and stability.

Subject(s)

Community Networks , Cooperative Behavior , Game Theory , Interpersonal Relations , Social Networking , Algorithms , Humans , Internet , Models, Theoretical , Motivation , Social Support

Balancing the Lifetime and Storage Overhead on Error Correction for Phase Change Memory.

An, Ning; Wang, Rui; Gao, Yuan; Yang, Hailong; Qian, Depei.

PLoS One ; 10(7): e0131964, 2015.

Article in English | MEDLINE | ID: mdl-26158524

ABSTRACT

As DRAM is facing the scaling difficulty in terms of energy cost and reliability, some nonvolatile storage materials were proposed to be the substitute or supplement of main memory. Phase Change Memory (PCM) is one of the most promising nonvolatile memory that could be put into use in the near future. However, before becoming a qualified main memory technology, PCM should be designed reliably so that it can ensure the computer system's stable running even when errors occur. The typical wear-out errors in PCM have been well studied, but the transient errors, that caused by high-energy particles striking on the complementary metal-oxide semiconductor (CMOS) circuit of PCM chips or by resistance drifting in multi-level cell PCM, have attracted little focus. In this paper, we propose an innovative mechanism, Local-ECC-Global-ECPs (LEGE), which addresses both soft errors and hard errors (wear-out errors) in PCM memory systems. Our idea is to deploy a local error correction code (ECC) section to every data line, which can detect and correct one-bit errors immediately, and a global error correction pointers (ECPs) buffer for the whole memory chip, which can be reloaded to correct more hard error bits. The local ECC is used to detect and correct the unknown one-bit errors, and the global ECPs buffer is used to store the corrected value of hard errors. In comparison to ECP-6, our method provides almost identical lifetimes, but reduces approximately 50% storage overhead. Moreover, our structure reduces approximately 3.55% access latency overhead by increasing 1.61% storage overhead compared to PAYG, a hard error only solution.

Subject(s)

Semiconductors , Computer Storage Devices , Metals/chemistry , Oxides/chemistry

GPU acceleration of Dock6's Amber scoring computation.

Yang, Hailong; Zhou, Qiongqiong; Li, Bo; Wang, Yongjian; Luan, Zhongzhi; Qian, Depei; Li, Hanlu.

Adv Exp Med Biol ; 680: 497-511, 2010.

Article in English | MEDLINE | ID: mdl-20865535

ABSTRACT

Dressing the problem of virtual screening is a long-term goal in the drug discovery field, which if properly solved, can significantly shorten new drugs' R&D cycle. The scoring functionality that evaluates the fitness of the docking result is one of the major challenges in virtual screening. In general, scoring functionality in docking requires a large amount of floating-point calculations, which usually takes several weeks or even months to be finished. This time-consuming procedure is unacceptable, especially when highly fatal and infectious virus arises such as SARS and H1N1, which forces the scoring task to be done in a limited time. This paper presents how to leverage the computational power of GPU to accelerate Dock6's (http://dock.compbio.ucsf.edu/DOCK_6/) Amber (J. Comput. Chem. 25: 1157-1174, 2004) scoring with NVIDIA CUDA (NVIDIA Corporation Technical Staff, Compute Unified Device Architecture - Programming Guide, NVIDIA Corporation, 2008) (Compute Unified Device Architecture) platform. We also discuss many factors that will greatly influence the performance after porting the Amber scoring to GPU, including thread management, data transfer, and divergence hidden. Our experiments show that the GPU-accelerated Amber scoring achieves a 6.5× speedup with respect to the original version running on AMD dual-core CPU for the same problem size. This acceleration makes the Amber scoring more competitive and efficient for large-scale virtual screening problems.

Subject(s)

Drug Discovery/statistics & numerical data , Drug Evaluation, Preclinical/statistics & numerical data , User-Computer Interface , Algorithms , Computational Biology , Computer Simulation , Humans , In Vitro Techniques , Ligands , Molecular Dynamics Simulation/statistics & numerical data , Software , Software Design

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL