Current Theses





Distributed GPGPU on Cloud GPU Clusters Martin Schuchardt Juan J. Durillo details
Bringing ESS on a diet Ivan Hell Simon Ostermann details
WebCL@home: Volunteer Computing with WebCL Michael Gasser Biagio Cosenza details
Scalable Lighting for Global Illumination Michael Walch Biagio Cosenza details
Compiler für eine Mini-Pseudocodesprache Rainer Breuss Radu Prodan details
Code Region Instrumentation for Energy Consumption Measurements Thomas Eiter Radu Prodan details
Scientific computing in the Cloud with Apache Hadoop Martin Illecker Radu Prodan details

Finite element method on GPU using OpenCL

Manfred Gratt Radu Prodan, Klaus Kofler details

OpenCL: Matrix Library

Adi Schütz, Richard Weinberger Herbert Jordan, Radu Prodan details
Parallel sorting algorithms in OpenCL Martin Thaler Radu Prodan details
Workflow design provenance system for ASKALON Friedrich Wachter Radu Prodan


Title Distributed GPGPU on Cloud GPU Clusters
Language Englisch
Supervisors Juan J. Durillo
Student Martin Schuchardt
Description Cloud instances newly offer GPU instances.
Using GPGPU for problems, which can be solved via massive parallel algorithms, may lead to performance gains on appropriate hardware.
The high number of instances rentable from a cloud provide an interesting basis for powerful distributed systems.
Combining both technologies by distributing chunks of a problem to many instances, and using the GPU power of each instance to compute,
could provide an immense computation power if the problem scales well.
  • Becoming familiar with the cloud infrastructure and GPU instances
  • Write some benchmarks for GPUs
  • Perform benchmarks and compare the cloud instance with stand-alone hardware
  • Write code to simplify and automate creation and configuration of cloud instances
  • Write library/broker to distribute a computation over n instances with m GPUs
  • Evaluate and benchmark using an existing deep learning algorithm provided by IIS/LFU
  • Evaluate and benchmark using another algorithm provided by JAIST, optimize and verify with results from the deep learning algorithm
Theoretical skills
  • Interests in parallelizable algorithms
  • Distributed Systems
  • Parallel Programming (OpenCL)
Practical skills
  • Good C and OpenCL knowledge


Title Bringing ESS on a diet
Language Englisch
Supervisors Simon Ostermann
Student Ivan Hell
Description ESS is an internally software developed by “Elektrisola Athesina”, that is used for general materials management tasks like procurement, logistics, planning and production. It’s main architecture is based on a fat client approach that was implemented in Java. Since the company is planning to increasingly use mobile devices in the future, a more server-dominant/distributed approach is needed.
The aim of this project is to redesign ESS so that most of the data processing is done on the server-side. To achieve this, as much code as possible should be reused from the old system. Beside high availability, the new architecture must allow a platform independent usage of ESS (mobile, desktop, …).
In addition, the new clients have to be much more resource-saving than the current ones. It should be also possible to deploy minor updates on the system, without restarting all connected clients.
To guarantee high availability as well as high throughput, the server should end up working in a cluster that is able to distribute the incoming workload evenly between all running instances.
  • Familiarize with ESS
  • Find appropriate architecture
  • Prepare the code for division
  • Implement server backend
  • Bring the server to a cluster
  • Speedup data-processing by caching
  • Adopt current native client
  • Implement a mobile clien
Theoretical skills Distributed and parallel systems
Practical skills
  • Java
  • Javascript
Additional information


Title WebCL@home: Volunteer Computing with WebCL
Language Englisch
Supervisors Biagio Cosenza
Student Michael Gasser
Description Volunteer computing is a type of distributed computing in which computer owners donate their computing resources (such as processing power and storage) to one or more “projects”. Famous Volunteer Computing projects include SETI@home and Folding@home, w computation of big challenging problems was distributed through computational grid.
OpenCL is a parallel programming language that takes advantage of nowadays multi-processors. WebCL is an OpenCL extension targeting to web applications. With WebCL is possible to harness GPU and multi-core CPU parallel processing from within a Web browser, enabling significant acceleration of applications such as image and video processing and advanced physics for WebGL games.
Nokia Research released an open source Firefox extension, which implement the WebCL standard. The goal of this project is to implement a web-based “Volunteer Computing” infrastructure on which each user donates computing resources by just visiting a web site with a WebCL-enabled browser. The project includes the implementation of at least two computational intensive problems taken from an already available collection of codes (i.e. astrophysics, chemistry, graphics and math).
This work is in collaboration with Tomi Aarnio (NRC Tampere, Finland) which is actually leading the WebCL development at Nokia Research.
  •   Learn and familiarize with OpenCL and WebCL
  •   Test current available WebCL applications.
  •   Implement a web infrastructure which schedules OpenCL tasks over distributed web-based clients
  •   Test the final infrastructure with at least two different test codes
Theoretical skills Parallel Computing
Practical skills
  • Javascript
  • OpenCL (attending\have attended Einführung in das Parallelrechnen und parallele Algorithmen is a plus)
Additional information Nokia WebCL implementation


Scalable Lighting for Global Illumination
Student Michael Walch
Language English
Supervisor Biagio Cosenza and Vlad Nae
Description Global Illumination (GI) algorithms provide a way to produce very realistic images of 3-dimensinal scenery. They take in account not just direct lighting from light sources, but also indirect light contribution from object surfaces. The resulting 3D rendering is very realistic, substantially better than today’s most eye-catching computer games.

An interesting rendering technique is Instant Radiosity (Keller 1997). IR is a method that approximates the indirect lighting, as part of global illumination, by creating additional light sources (called Virtual Point Lights). The algorithm is fairly simple: each light source it casts N photon rays into the scene. At each intersection the photon either bounces and another ray is cast or, through Russian roulette, is killed. At each of the intersections it creates a Virtual Point Light (VPL) that has the same radiance value as the photon.

Although IR can produce extremely realistic results, they can only be achieved with exponentially increasing the number of VPLs. A solution to scale the computation needed for the huge number of VPLs is provided by Lightcuts (Walter et al. 2005). The idea is to group lights in a binary light tree, then to use a perceptual metric to adaptively partition the lights into groups which can be rendered at the desired quality level. This algorithm offers a user-controllable tradeoff between the desired quality and the computational cost (i.e. execution time).

The goal of this thesis is to develop a parallel version of Lightcut and to evaluate its performance on different scenes.

The project should use an internal optimized implementation of the ray tracing algorithm.

  • Familiarize yourself with the provided ray tracing implementation
  • Study and implementation of Instant Radiosity
  • Study and implementation of Lightcuts
    • Light tree building
    • Perceptual cuts
    • Reconstruction of cuts
  • Test the system with different configuration on at least three scenes.
Theoretical skills
  • Rendering algorithms
  • Concurrent Programming
Practical skills
  • C/C++
  • OpenGL (basic knowledge)
  • OpenCL (basic knowledge)
Additional information  

Global Illumination with the Instant Radiosity technique.
Instant Radiosity, by Alexander Keller
Lightcuts, by Walter et al.


Title Compiler für eine Mini-Pseudocodesprache
Language Deutsch/Englisch
Supervisors Radu Prodan
Student Rainer Breuss
Description Im Rahmen der STEOP-Lehrveranstaltung “Einführung in der Praktische Informatik” wurde eine einfache Pseudocode Programmiersprache eingeführt, mit der die Studenten ihre erste Programme auf Papier entwickeln und üben. Leider haben die Studenten und die Lehrer nicht die Möglichkeit, ihre Programme auf Korrektheit zu prüfen und auszuführen. Im Rahmen dieser Masterarbeit soll ein Mini-Compiler für diese Pseudocodesprache entwickelt werden.
  • Spezifikation der Pseudocode-Grammatik
  • Implementierung eines Scanners und Parsers
  • Entwurf und Erzeugung eines abstrakten Syntaxbaumes und einer Symboltabelle
  • Implementierung einer semantischen Analyse
  • Kleine Programmtransformationen und Optimierungen
  • Code Generierung in der C-Programmiersprache


Theoretical skills Lex, Yacc
Practical skills
  • C oder C++


Title Code Region Instrumentation for Energy Consumption Measurements
Students Thomas Eiter
Language English
Supervisors Radu Prodan, Vlad Nae
Description The goal of this thesis is the design, implementation and evaluation of a library enabling energy consumption measurements for program code regions.
  • study state-of-the-art research in the fields of program code instrumentation and energy efficiency;
  • familiarize yourself with the power measuring device and its data collection interfaces;
  • implement a utility library for connecting to the power measuring device;
  • design and implement the program code instrumentation library for power measurements;
  • implement a test suite for accurate evaluation of power consumption for small program code regions.
Theoretical skills Basic knowledge of energy consumption in computing systems
Practical skills C programming
Additional information The provided power measuring device is an accurate device from a professional range and offers multiple methods of collecting data which makes it easy to work with.


Scientific computing in the Cloud with Apache Hadoop
Number of students Martin Illecker
Language English or German
Supervisor Radu Prodan
Description The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes a MapReduce software framework for distributed processing of large data sets on compute clusters.

The goal of these thesis is develop a framework for building scientific applications in Cloud computing infrastructures based on Apache Hadoop technology.

Theoretical skills  Distributed and parallel systems
Practical skills  Java
Additional information


Finite element method on GPU using OpenCL
Student Manfred Gratt
Language Deutsch
Supervisor Radu Prodan, Klaus Kofler
Beschreibung Die Finite Elemente Methode (FEM) ist ein Standardwerkzeug im heutigen Ingenieurswesen. Es handelt sich dabei um ein numerisches Verfahren zum Lösen von partiellen Differentialgleichungen, wobei meistens sehr grosse Gleichungssysteme wiederholt zu lösen sind. Bereits das Erstellen dieser Gleichungssysteme ist sehr rechenintensiv. Dieser Vorgang soll durch eine massiv parallele Implementierung mittels GPUs beschleunigt werden. Ziel dieser Master-Arbeit die Verwendung von GPUs zur Erstellung von FEM-Gleichungssystemen und Analyse des Parallelisierungspotentials.
  • Einarbeiten in OpenCL
  • Einarbeiten in Grundlagen der FEM
  • paralleles Assemblieren der Steifigkeitsmatrix
  • Performace-Analyse
  • Implementierung verschiedener Elementstypen in OpenCL
Practical skills
  • C++
  • OpenCL
  • Grundlagen Matrizen



OpenCL: Matrix Library
Number of students Adi Schütz, Richard Weinberger
Language English, German
Supervisor Heiko Studt, Herbert Jordan, Radu Prodan
Description OpenCL, which is a brand new industrial standard for programming CPUs, GPUs and other accelerators, does not have any library for big matrix operations yet. The library interface should be alike Gnu MP, creating some workflow. This workflow will be evaluated into OpenCL minimizing memory transfer.
Operators to implement will be “simple” ones like add/sub/mul/inv/… The main focus is on memory optimizing and OpenCL implementation of these algorithms.
The “new” part in your work will be that you are using OpenCL as back-end and try to optimise memory flow where possible.OpenCL, welches ein neuer Industriestandard fuer das Programmieren von CPUs, Grafikprozessoren und anderen Beschleunigern (Cell) ist, hat derzeit noch keine Bibliothek fuer grosse Matrizen. Der Student soll, basierend auf dem Gnu MP Interface, einen “Workflow” generieren lassen, der dann mit Hilfe von OpenCL, optimiert auf wenig Speichertransfers ausgefuehrt wird.
Die zu implementierenden Algorithmen sind hierbei nebenrangig (add/sub/mul/inv/…), besonderen Wert wird hingegen auf die Speicheroptimierung und auf die OpenCL Implementation gelegt.
Das “Neuartige” der Masterarbeit ist, dass OpenCL als Back-End verwendet wird und auf Speicherhierachien optimiert wird.
  • Read into OpenCL
  • Read into matrix problems and Gnu MP
  • Write some test programs using Gnu MP
  • Model the workflow data structures
  • Build “simple” implementation
  • Benchmark
  • Optimize workflow on memory layers (data transfers)
  • (Optimize/Parallelize with known algorithms)
  • Benchmark
  • Unit-Test library extensively
Theoretical skills
  • Interests in optimized algorithms
  • Interests in workflow based systems
  • (Grid Workflows)
  • (Parallel Systems)
Practical skills
  • Good C/C++
  • (OpenCL)


Parallel sorting algorithms in OpenCL
Number of students Martin Thaler
Supervisor Radu Prodan
Language German, English
Description Open Computing Language (OpenCL) is a framework for writing task and data parallel programs that execute across heterogeneous multicore platforms consisting of multicore devices such as CPUs, Graphical Processing Units (GPU) accelerators, and Cell Broadband co-processors. OpenCL includes a language (based on C99) for writing so called kernels which are SIMD functions that execute on OpenCL devices, plus APIs that are used to define and then control the platforms.

The objective of this thesis is to investigate the potential of using hybrid multicore architectures consisting of CPUs, GPUs, and Cell processors for parallelising list sorting algorithms such as bubble sort, quick sort, rank sort, bucket sort, selection sort, merge sort, etc.

  • Implementation of several (3-4) parallel sorting algorithms in OpenCL
  • Scheduling and optimisation of the parallel sorting algorithms on hybrid multiprocessor platforms
  • Speedup and efficiency analysis
  • Overhead analysis
Theoretical skills Compiler construction, Parallel systems, Computer architecture
Practical skills C, C++
Additional information OpenCL