|Programming model and runtime systems for Exascale computing||CloudComputing|
|On-Demand Resource Provisioning for Online Games||ASKALON|
An Exascale Programming, Multi-Objective Optimisation and Resilience Management Environment based on Nested Recursive Parallelism
The DPS group is coordinating the H2020 EU AllScale project which started on Oct 1, 2015 and will run for three years. The total funding for this project amounts to 3.3 Mio Euro.
AllScale will focus research in key areas to address critical areas of difficulty:
- Isolated parallelization that hampers global optimization
- Flat parallelism unfit for large-scale HPC
- Optimisation limited to single objectives
- Manual coordination to exploit all levels of parallelism
- Increased probability of errors in Exascale computing
- Post-mortem analysis of non-functional system behavior
AllScale will achieve a core set of objectives within the lifetime of the project:
- Single-source-to-anyscale application development
- Exploit the potential of nested recursive parallelism for HPC
- Multi-objective optimization for execution time, energy and resource usage
- Unified runtime system
- Mitigating increase risk of HW failures
- Scalable Online analysis of non-functional system behavior
AllScale follows three design principles:
- Use a single parallel programming model to target all the levels of hardware parallelism available in extreme scale computing systems.
- Leverage the inherent advantages of nested recursive parallelism for adaptive parallelisation, automatic resilience management and auto-tuning for multiple optimisation objectives.
- Provide a programming interface that will be fully compatible with widely used industry standards and existing toolchains.
The DPS Group got a project proposal approved for the H2020 call related to Cloud computing. DPS is the leader of the 2.7 million Euro project entitled: EntICE: dEcentralized repositories for traNsparent and efficienT vIrtual maChine opErations. Entice aims to provide a VM repository and operational environment for federated cloud infrastructure which: (i) simplifies the creation of lightweight and highly optimised VM images tuned for functional descriptions of customer applications; (ii) automatically decomposes and distribute VM images based on multi-objective optimisation (performance, economic costs, storage size, and QoS needs) ; (iii) facilitates the elastic auto-scale of applications across Cloud resources. Using Entice the Customers will benefit from the Federated Cloud Infrastructure deploying their applications across sites quickly and without provider lock-in, in order to finally fulfil the promises that virtualization technology has failed to deliver so far.
WORK PACKAGES Entice is structured into seven Work Packages, five of them are related to the following project’s research objectives. WP-1 (Objective 1): Creation of lightweight VM images The VM images will be created and optimized by the Entice environment on the base of a functional description provided by the user. This will free the latter by the burden of a manual optimization which requires expertise on OS and VM image creation. WP-2 (Objective 2): Distributed lightweight VM image storage. Entice will enable the decomposition of VM images and their storage in a distributed across cloud sites repository. The images will not be anymore monolithic blocks but the result of a composition of smaller parts in order to reduce their storage cost and improve their reusability. WP-3 (Objective 3): Autonomous multi-objective repository optimisation. The DPS team will research heuristics for multi-objective distribution and placement of VM images across the decentralised repository that optimises multiple conflicting objectives including performance-related goals (e.g. VM deployment and instantiation overheads, data communication, application QoS metrics), operational costs and storage space. WP-4 (Objective 4): Elastic resource provisioning. The lightweight VM creation and the optimized researched repository will improve the elasticity for on-demand scaling of industrial and business applications in Clouds in response to their fluctuating compute and storage requirements. The DPS research team is focused on the definition of a metric for elasticity such as to measure the ability of the system to respond to the applications’ demand. WP-5 (Objective 5): Information infrastructure for strategic and dynamic reasoning. Entice will develop a knowledge model of all entities and relationships concerning Cloud applications, including functional and non-functional properties of their underlying software components, QoS metrics, OS, VM type, and federated Cloud (e.g. SLAs), to support the optimised VM creation in the repository. weblink here: http://www.entice-project.eu
Programming model and runtime systems for Exascale computing
Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. We have developed libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. This project has two scientific objectives: · First, we plan to exploit libWater’s DAG representation in order to support new optimizations focusing on latency hiding, optimized data movement and critical path-based DAG transformation · Secondly, we want to integrate in libWater our automatic task partitioning approach for heterogeneous nodes, already implemented into the Insieme Compiler Framework. We plan to evaluate the scalability of our system on the large-scale compute clusters available from the PRACE platform. Project team: Biagio Cosenza, Ivan Grasso, Klaus Kofler, Thomas Fahringer Acknowledgement: We acknowledge PRACE (http://www.prace-ri.eu) for awarding us access to computational resources based in Spain at BSC and in France at CEA.
On-Demand Resource Provisioning for Online Games
Online entertainment including gaming is a strongly growing sector worldwide. Massively Multiplayer Online Games (MMOG) grew from 10 thousand subscribers in 1997 to 6.7 million in 2003 and the rate is accelerating estimated to 60 million people by 2011. Today’s MMOGs operate as client-server architectures, with server Hosters providing a large static infrastructure with hundreds to thousands of computers for each game in order to deliver the required Quality of Service (QoS) to all the players. Since the demand of a MMOG is highly dynamic, a large portion of the resources is unused most of the time, even for large providers that operate several game titles in parallel. This inefficient resource utilisation has negative economic impacts by preventing any but the largest hosting centres from joining the market and dramatically increases prices.
Today, a new research direction coined by the term Cloud computing proposes a cheaper hosting alternative by leasing virtualised resources large specialised data centres only when and for how long they are needed, instead of buying expensive own hardware with costly maintenance and fast deprecation. Despite the existence of many vendors that aggregate a potentially unbounded number of resources, Cloud computing remains a domain dominated by Web hosting or data-intensive applications, and whose suitability for computationally-intensive applications remains largely unexplored.
In this project we propose to conduct basic research that investigates new generic Cloud computing techniques to support QoS-enabled resource provisioning for computationally-intensive real-time applications by investigating:
- Performance models for virtualised Cloud resources, including characterisation of the Cloud virtualisation and software deployment overheads;
- Proactive dynamic scheduling strategies based on QoS negotiation, monitoring, and enforcement techniques;
- SLA provisioning models based on an optimised the balance between risks, rewards, and penalties;
- Resource provisioning methods based on time/space/cost renting policies, including a comparative analysis between Cloud resource renting and conventional parallel/Grid resource operation;
Our ultimate goal is to apply and validate our generic methods to MMOGs, as a novel class of socially important applications with severe real-time computational requirements to achieve three innovative objectives:
Improved scalability of a game session hosted on a distributed Cloud infrastructure to a larger number of online users than the current state-of-the-art (i.e. 64 – 128 for FPS action games);
- Cheaper on-demand provisioning of Cloud resources to game sessions based on exhibited load;
- QoS enforcement with seamless load balancing and transparent migration of players from overloaded servers to underutilised ones within and across different Cloud provider resources.
The Cloud Computing paradigm holds good promise for the performance hungry scientific community. Clouds promise to be a cheap alternative to supercomputers and specialized clusters, a much more reliable platform than grids, and a much more scalable platform than the largest of commodity clusters or resource pools. More Infos here…
is a European Network of Excellence on High Performance and Embedded Architecture and Compilation, funded by the Computing Systems research objective of the European FP7-ICT programme
This network coordinates nine research clusters that will explore on-chip multi-cores technology and customisation, leading to heterogeneous multi-core systems. HiPEAC’s main objective is to harmonise European research efforts in the area of computer systems by developing a common research vision, as well as organising meetings at regular intervals and stimulating European co-operation.
The research clusters focus on the following areas:
- multi-core architecture;
- programming models and operating systems;
- adaptive compilation;
- reconfigurable computing;
- design methodology and tools;
- binary translation and virtualisation;
- simulation platform;
- compilation platform.
Research in computer systems finds itself at a turning point, project partners state: more and more, the supercomputer market, the commodity market, including laptops and other consumer electronics such as mobile phone, PDAs and navigation systems and the embedded market are interconnected. Increasingly, the same components are used in all of these systems, creating new business opportunities for the European computer industry.
Yet innovation in the field of high-performance processors is subject to physical limitations. As a result, these processors shifted towards parallelism, using multi-core systems, so that many instructions can be carried out simultaneously. While in theory this shift to parallel computing will increase performance and cut energy consumption at the same time, multi-core structures create new problems for computer architects.
The activities envisioned in the network ‘will lead to the permanent creation of a solid and integrated virtual centre of excellence consisting of several highly visible departments, and this virtual centre of excellence will have the critical mass to really make a difference for the future of computing systems,’ the project partners believe.
Between 2008 and 2011, the HiPEAC consortium will receive €4.8 million under the Seventh Framework Programme (FP7). The group of Prof. Thomas Fahringer, Inst. of Computer Science, University of Innsbruck participates as a HiPEAC member (as of May 2009) in the research areas: programming models, compilers, tools as well as contributes to the ROADMAP and the task force on applications.
More information can be found at http://www.hipeac.net.
ASKALON is a programming environment for Cloud Computing. ASKALON provide tools for:
- automatic performance bottleneck analysis
- performance modeling and prediction
- performance instrumentation, measurement, and analysis
- a Java-based coordination and visualization system