|Programming model and runtime systems for Exascale computing||EASE|
|On-Demand Resource Provisioning for Online Games||ASKALON|
An Exascale Programming, Multi-Objective Optimisation and Resilience Management Environment based on Nested Recursive Parallelism
The DPS group is coordinating the H2020 EU AllScale project which started on Oct 1, 2015 and will run for three years. The total funding for this project amounts to 3.3 Mio Euro.
AllScale will focus research in key areas to address critical areas of difficulty:
- Isolated parallelization that hampers global optimization
- Flat parallelism unfit for large-scale HPC
- Optimisation limited to single objectives
- Manual coordination to exploit all levels of parallelism
- Increased probability of errors in Exascale computing
- Post-mortem analysis of non-functional system behavior
AllScale will achieve a core set of objectives within the lifetime of the project:
- Single-source-to-anyscale application development
- Exploit the potential of nested recursive parallelism for HPC
- Multi-objective optimization for execution time, energy and resource usage
- Unified runtime system
- Mitigating increase risk of HW failures
- Scalable Online analysis of non-functional system behavior
AllScale follows three design principles:
- Use a single parallel programming model to target all the levels of hardware parallelism available in extreme scale computing systems.
- Leverage the inherent advantages of nested recursive parallelism for adaptive parallelisation, automatic resilience management and auto-tuning for multiple optimisation objectives.
- Provide a programming interface that will be fully compatible with widely used industry standards and existing toolchains.
The DPS Group got a project proposal approved for the H2020 call related to Cloud computing. DPS is the leader of the 2.7 million Euro project entitled: EntICE: dEcentralized repositories for traNsparent and efficienT vIrtual maChine opErations. Entice aims to provide a VM repository and operational environment for federated cloud infrastructure which: (i) simplifies the creation of lightweight and highly optimised VM images tuned for functional descriptions of customer applications; (ii) automatically decomposes and distribute VM images based on multi-objective optimisation (performance, economic costs, storage size, and QoS needs) ; (iii) facilitates the elastic auto-scale of applications across Cloud resources. Using Entice the Customers will benefit from the Federated Cloud Infrastructure deploying their applications across sites quickly and without provider lock-in, in order to finally fulfil the promises that virtualization technology has failed to deliver so far.
WORK PACKAGES Entice is structured into seven Work Packages, five of them are related to the following project’s research objectives. WP-1 (Objective 1): Creation of lightweight VM images The VM images will be created and optimized by the Entice environment on the base of a functional description provided by the user. This will free the latter by the burden of a manual optimization which requires expertise on OS and VM image creation. WP-2 (Objective 2): Distributed lightweight VM image storage. Entice will enable the decomposition of VM images and their storage in a distributed across cloud sites repository. The images will not be anymore monolithic blocks but the result of a composition of smaller parts in order to reduce their storage cost and improve their reusability. WP-3 (Objective 3): Autonomous multi-objective repository optimisation. The DPS team will research heuristics for multi-objective distribution and placement of VM images across the decentralised repository that optimises multiple conflicting objectives including performance-related goals (e.g. VM deployment and instantiation overheads, data communication, application QoS metrics), operational costs and storage space. WP-4 (Objective 4): Elastic resource provisioning. The lightweight VM creation and the optimized researched repository will improve the elasticity for on-demand scaling of industrial and business applications in Clouds in response to their fluctuating compute and storage requirements. The DPS research team is focused on the definition of a metric for elasticity such as to measure the ability of the system to respond to the applications’ demand. WP-5 (Objective 5): Information infrastructure for strategic and dynamic reasoning. Entice will develop a knowledge model of all entities and relationships concerning Cloud applications, including functional and non-functional properties of their underlying software components, QoS metrics, OS, VM type, and federated Cloud (e.g. SLAs), to support the optimised VM creation in the repository. weblink here: http://www.entice-project.eu
Programming model and runtime systems for Exascale computing
Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. We have developed libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. This project has two scientific objectives: · First, we plan to exploit libWater’s DAG representation in order to support new optimizations focusing on latency hiding, optimized data movement and critical path-based DAG transformation · Secondly, we want to integrate in libWater our automatic task partitioning approach for heterogeneous nodes, already implemented into the Insieme Compiler Framework. We plan to evaluate the scalability of our system on the large-scale compute clusters available from the PRACE platform. Project team: Biagio Cosenza, Ivan Grasso, Klaus Kofler, Thomas Fahringer Acknowledgement: We acknowledge PRACE (http://www.prace-ri.eu) for awarding us access to computational resources based in Spain at BSC and in France at CEA.
Energy Aware Auto-Tuning for Scientific Applications
EASE (Energy Aware Auto-Tuning for Scientific Applications), an energy-aware framework, enacts energy aware auto-tuning mechanisms for HPC applications. The framework considers performance prediction and energy modeling aspects of HPC applications with the support of insieme compiler and EnergyAnalyzer tool.
This project is a bilateral collaborative project which was funded by FWF (Austria) and DST (India) to the tune of Rs.3.5crores (Approx). This project is sanctioned for 3 years from May 2014. The project has 5 workpackages: WP1 – Management (Both Indian and Austrian Side); WP2 – Compiler support (Austria Side); WP3 – Runtime system (Austrian Side); WP4 – Energy Modeling (Indian Side); WP5 – Online Performance Analysis (Indian Side).
Principal Investigators: Prof. Dr. Thomas Fahringer (Austrian Side) and Dr. Shajulin Benedict Ph.D, PostDoc(Germany) (Indian Side)
Project Members (Indian Side): Mrs. Rejitha R.S. and Mrs. Preethi M.
Supporting Project Members (Indian Side): Mrs. Suja A. Alex
Project Members (Austrian Side): Dr. Radu Prodan (Asso.Prof) and Mr. Philipp Gschwandtner Project weblink here.
RainCloud: Scientific Computing in the Cloud, Standortagentur Tirol
Despite the existence of many vendors that, similar to Grid computing, aggregate a potentially unbounded number of compute resources, Cloud computing remains a domain dominated by business applications (e.g. Web hosting, database servers) and whose suitability for scientific computing remains largely unexplored. In this project we plan to research basic methods that investigate the potential of Cloud infrastructures for high-performance scientific computing and apply them for operational daily use in a domain with high computational and QoS demands: meteorological weather forecast.
- Investigate the performance of the computational resources offered by commercial Cloud providers and device models that assess whether their performance is sufficient for scientific computing;
- Research resource management and scheduling methods for scientific workflows on Cloud platforms by extending an existing Grid application development and computing environment;
- Research economically-viable SLAs that encapsulate a balance between the QoS offered by the resource providers and the cost of resource use;
- Quantify the benefits of using leased Cloud resources for scientific applications with respect to performance, reliability, and price, compared to traditionally owned supercomputers, clusters, and Grids;
- Validate the research methods for two real scientific applications from the meteorological and astrophysics domains;
- Use the researched methods and infrastructure in operational daily use at the Avalanche Service Tyrol and the Tyrolean Hydrographical Service for obtaining precipitation forecasts in mountainous terrains with a spatial resolution of 0.5 km and with extensive information about the uncertainty of the forecasts.
Benefits of this project:
For University Innsbruck
The results from this project can represent essential input for the University of Innsbruck in shifting its business model from operating an expensive self-owned data/supercomputing centre towards renting on-demand resources from specialised companies in the right amount only when and for how long it is needed. Through this ability, University of Innsbruck can avoid capital expenditure on hardware (operation, maintenance, and over-provisioning), software, and services, rather paying a provider only for what they use. Consumption is billed on a utility (e.g. resources consumed, like electricity) or subscription (e.g. time-based, like a newspaper) basis with little or no upfront cost. By combining a professionally run data centre with an economy of scale, Clouds promise to become a cheap alternative to supercomputers and specialised clusters, a much more reliable platform than Grids, and a much more scalable platform than the largest of commodity clusters or resource pools. This new paradigm can produce significant budget savings which can be redirected by the University of Innsbruck towards the real science (e.g. hiring additional research stuff) rather than investing on non-profitable infrastructure hardware.
For Tirol Region
Two public services, which provide a user platform for the meteorological application of this project, will already test the product during its development phase:
- The daily avalanche bulletin of the Avalanche Service Tyrol (Lawinenwarndienst, LWD) has a huge potential impact on day-to-day operations of ski areas, tourism centres, and everyday live throughout Tirol in winter, e.g. through road blocks, necessary avalanche blasting, etc. The LWD product is based on automatic and human observations, as well as numerical weather forecasts. Providing additional support with probability information helps to make the decisions and forecasts more precise. The LWD needs twice-daily updated visualization in a user friendly manner of the precipitation fields as well as the computed and simulated certainties/uncertainties. All this information has to be available before a specific time each day, as the LWD outputs its avalanche bulletin at 7:30 AM;
- The Tyrolean Hydrographical Service (Hydrographschen Dienst) has among many duties the warning of risks of flooding and landslides. Especially for extreme precipitation events with low recurrence periods, having additional fine-scale probability information about the expected amounts instead of just one (or a few) precipitation sums is very helpful. It can support emergency services for prevention measures and planning in case of such an event
On-Demand Resource Provisioning for Online Games
Online entertainment including gaming is a strongly growing sector worldwide. Massively Multiplayer Online Games (MMOG) grew from 10 thousand subscribers in 1997 to 6.7 million in 2003 and the rate is accelerating estimated to 60 million people by 2011. Today’s MMOGs operate as client-server architectures, with server Hosters providing a large static infrastructure with hundreds to thousands of computers for each game in order to deliver the required Quality of Service (QoS) to all the players. Since the demand of a MMOG is highly dynamic, a large portion of the resources is unused most of the time, even for large providers that operate several game titles in parallel. This inefficient resource utilisation has negative economic impacts by preventing any but the largest hosting centres from joining the market and dramatically increases prices.
Today, a new research direction coined by the term Cloud computing proposes a cheaper hosting alternative by leasing virtualised resources large specialised data centres only when and for how long they are needed, instead of buying expensive own hardware with costly maintenance and fast deprecation. Despite the existence of many vendors that aggregate a potentially unbounded number of resources, Cloud computing remains a domain dominated by Web hosting or data-intensive applications, and whose suitability for computationally-intensive applications remains largely unexplored.
In this project we propose to conduct basic research that investigates new generic Cloud computing techniques to support QoS-enabled resource provisioning for computationally-intensive real-time applications by investigating:
- Performance models for virtualised Cloud resources, including characterisation of the Cloud virtualisation and software deployment overheads;
- Proactive dynamic scheduling strategies based on QoS negotiation, monitoring, and enforcement techniques;
- SLA provisioning models based on an optimised the balance between risks, rewards, and penalties;
- Resource provisioning methods based on time/space/cost renting policies, including a comparative analysis between Cloud resource renting and conventional parallel/Grid resource operation;
Our ultimate goal is to apply and validate our generic methods to MMOGs, as a novel class of socially important applications with severe real-time computational requirements to achieve three innovative objectives:
Improved scalability of a game session hosted on a distributed Cloud infrastructure to a larger number of online users than the current state-of-the-art (i.e. 64 – 128 for FPS action games);
- Cheaper on-demand provisioning of Cloud resources to game sessions based on exhibited load;
- QoS enforcement with seamless load balancing and transparent migration of players from overloaded servers to underutilised ones within and across different Cloud provider resources.
The Cloud Computing paradigm holds good promise for the performance hungry scientific community. Clouds promise to be a cheap alternative to supercomputers and specialized clusters, a much more reliable platform than grids, and a much more scalable platform than the largest of commodity clusters or resource pools. More Infos here…
is a European Network of Excellence on High Performance and Embedded Architecture and Compilation, funded by the Computing Systems research objective of the European FP7-ICT programme
This network coordinates nine research clusters that will explore on-chip multi-cores technology and customisation, leading to heterogeneous multi-core systems. HiPEAC’s main objective is to harmonise European research efforts in the area of computer systems by developing a common research vision, as well as organising meetings at regular intervals and stimulating European co-operation.
The research clusters focus on the following areas:
- multi-core architecture;
- programming models and operating systems;
- adaptive compilation;
- reconfigurable computing;
- design methodology and tools;
- binary translation and virtualisation;
- simulation platform;
- compilation platform.
Research in computer systems finds itself at a turning point, project partners state: more and more, the supercomputer market, the commodity market, including laptops and other consumer electronics such as mobile phone, PDAs and navigation systems and the embedded market are interconnected. Increasingly, the same components are used in all of these systems, creating new business opportunities for the European computer industry.
Yet innovation in the field of high-performance processors is subject to physical limitations. As a result, these processors shifted towards parallelism, using multi-core systems, so that many instructions can be carried out simultaneously. While in theory this shift to parallel computing will increase performance and cut energy consumption at the same time, multi-core structures create new problems for computer architects.
The activities envisioned in the network ‘will lead to the permanent creation of a solid and integrated virtual centre of excellence consisting of several highly visible departments, and this virtual centre of excellence will have the critical mass to really make a difference for the future of computing systems,’ the project partners believe.
Between 2008 and 2011, the HiPEAC consortium will receive €4.8 million under the Seventh Framework Programme (FP7). The group of Prof. Thomas Fahringer, Inst. of Computer Science, University of Innsbruck participates as a HiPEAC member (as of May 2009) in the research areas: programming models, compilers, tools as well as contributes to the ROADMAP and the task force on applications.
More information can be found at http://www.hipeac.net.
ASKALON is a programming environment for Cluster and Grid Computing. ASKALON provide tools for:
- automatic performance bottleneck analysis
- performance modeling and prediction
- performance instrumentation, measurement, and analysis
- a Java-based coordination and visualization system