Available Theses

Title Student(s) Supervisor Description
Automatic Data Dependence Analysis for Simple C Programs 1 Thomas Fahringer details
Deployment of genomic pipeline and databases for cloud computing 1 Thomas Fahringer details
 Agile development of serverless functions with portable function templates  1 Sashko Ristov details
 Cross-layered resource management in Cloud continuum 1 Sashko Ristov details
 Experiments and data analysis for serverless computing 1 Sashko Ristov, details
 Experimente und Datenanalyse für Clouds 1 Thomas Fahringer details

Title Automatic Data Dependence Analysis for Simple C Programs
Number of students 1
Language German or English
Supervisor Thomas Fahringer
Description We have developed a simple dependence testing tool for simple C programs. The goal of this project is to detect errors and fix them and to extend the data dependence ability. Among other the compiler of this tool should be extended for countable dependence testing which inserts code to instrument and monitor array subscript expressions which are written into a trace file. At the end of the execution of such programs, the trace file is analyzed and dependencies are determined based on the content of the trace file.
Next would be to include a new dependence tester, such as the polyhedral library and replace the existing dependence tester in the above mentioned tool with the objective to improve the accuracy of dependence testing.
Tasks
  • Understand the internals of the existing tool, test and debug where necessary.
  • Update the tool for countable dependence testing based on compiler technology.
  • Add the polyhedral dependence tester as a new dependence test to improve the accuracy of the dependence testing.
  • Visualization of results.
  • Development of a test suite and extensive testing.
Theoretical Skills data dependence analysis
compiler technology such as flex and bison
Practical Skills scripting language, C oder C++
Additonal Information

Title Deployment of genomic pipeline and databases for cloud computing
Number of students 1
Language English
Supervisor Thomas Fahringer
Description The field of microbial genomics has made significant advances during the past decade with important clinical, environmental, and industrial applications. However, tools and data remain mostly confined to rigid systems within local servers or High-Performance Computing (HPC) centers. We have recently developed the Microbial Genomes Atlas (MiGA), a computational framework that allows the automated processing, quality evaluation, comparison, and classification of complete genomes. MiGA features flexible scheduling and data management systems, and the objective of the current master thesis will be to deploy the MiGA system (including the pipeline and the databases) for cloud computing.
Tasks
  • Setup of the MiGA processing system in the Google Cloud through the CloudyCluster technology
  • Deployment of the Rails web app MiGA Web in Google Cloud and configuration to interact with the processing system
  • Evaluation of the system for cost-performance and usability with real genomic datasets (provided)
  • Depending on the student’s interests, development of new features in the system and/or analysis of genomic data using the system
Theoretical Skills Distributed Systems, Cloud Computing, High Performance Computing, Interest in genomics (no previous knowledge necessary)
Practical Skills HPC Scheduler Systems, Basic knowledge on Bash is preferred
Additonal Information This Master Thesis will be supervised by Thomas Fahringer (IFI/DPS) and Rodriguez Rojas Miguel (DiSC).
The student will have the opportunity and cloud computing resources to develop new features in the system and/or engage in genomic data analysis using the system depending on their interests. The student will also have access to CloudyCluster consultants and technical support dedicated to the project. We offer the successful applicant experience in the fields of Big Data management, Genomic Data processing, and Cloud Computing. The student will be exposed to commercial cloud systems as well as HPC technologies.

Title Cross-layered resource management in Cloud continuum
Number of students 1
Language English
Supervisor Sashko Ristov
Description Cloud continuum offers a variety of heterogeneous computing resources, each with specific properties in terms of scalability, latency, performance, capacity, provisioning delay, economic costs, flexibility, portability, etc. For example, VMs are cheaper and more flexible, but with much higher provisioning delay compared to serverless functions. Many of the existing computing engines use resources of a single Cloud provider or of a single resource type, which locks the user with their pros and cons. This thesis will research methods to develop CrossFlow, a scalable and portable platform to run complex applications across various types of cloud continuum resources. This will allow the user to obtain maximum features of each resource type. The applications are built with the existing AFCL language developed by the DPS group and run with the existing enactment engine for serverless applications, which should be extended for cross-layered resources.
Tasks
  • Analyze the existing languages for “infrastructure as code”
  • Develop an abstract and cross-layered language for cloud resources (which will support the master thesis “Agile development of serverless functions with portable function templates”)
  • Develop a fault-tolerant resource manager for recommendation, provisioning, monitoring, accounting, billing, and terminating of cross-layered cloud continuum resources
  • Evaluate the system (cost-performance trade-off) with real life applications
Theoretical Skills Distributed Systems, Cloud Computing, Functions as a Service, Fault tolerance.
Practical Skills Java, Cloud providers APIs.
Additonal Information The following material / tools are useful for this thesis:

  1. S. Ristov, S. Pedratscher, T. Fahringer, “AFCL: An Abstract Function Choreography Language for serverless workflow specification,” Future Generation Computer Systems, Volume 114, 2021, Pages 368-382, ISSN 0167-739X, https://doi.org/10.1016/j.future.2020.08.012
  2. Enactment engine to run serverless workflows https://github.com/sashkoristov/enactmentengine
  3. A multi-FaaS toolkit to facilitate development of portable applications https://github.com/sashkoristov/jFaaS

Title  Agile development of serverless functions with portable function templates 
Number of students  1
Language  English
Supervisor  Sashko Ristov
Description Porting a serverless function from one to another cloud provider is a complex task as it may require a huge development effort to rewrite all cloud services that the function uses (e.g. S3, RDS, …). For example, a user may prefer the migrated function from IBM to AWS to use S3 rather than IBM Cloud Storage in order to reduce latency. This requires that the function developer should rewrite the function to use S3 instead of IBM storage. The goal of this master thesis is to research methods to simplify the portability of serverless functions by developing a dependency aware faasifier that allows developers to develop “function templates” with annotations, independently from cloud providers. Students will explore and learn how to model cloud service types that a function template uses in order to abstract them from a cloud provider. Once a function template is developed, the faasifier will adapt the code of the function template into function implementations for the cloud FaaS providers where function implementations should run.
Tasks
  • Create an “annotation schema” for function templates and cloud service types
  • Develop a convertor from a function template into a function implementation for various cloud providers (e.g. AWS, Google, IBM)
  • Develop a deployer for function implementations for the cloud providers
  • Develop function templates and multiple function implementations and build an FC for a real-life application
  • Evaluate the system with real life applications and optimal selection of the proper function implementation
Theoretical Skills  Distributed Systems, Cloud Computing, Functions as a Service.
Practical Skills  Java, Node.js, Cloud providers APIs.
Additonal Information  Tools / material that can help for this master thesis:

  1. S. Ristov, S. Pedratscher, J. Wallnöfer and T. Fahringer, “DAF: Dependency-Aware FaaSifier for Node.js Monolithic Applications,” in IEEE Software, doi: 10.1109/MS.2020.3018334.
  2. Middleware services to support workflow execution in a Multi-FaaS environment

student: Jakob Wallnöfer, supervisor: Sashko Ristov, https://github.com/qngapparat/js2faas

Title  Cross-layered resource management in Cloud continuum
Number of students  1
Language  English
Supervisor  Sashko Ristov
Description  Cloud continuum offers a variety of heterogeneous computing resources, each with specific properties in terms of scalability, latency, performance, capacity, provisioning delay, economic costs, flexibility, portability, etc. For example, VMs are cheaper and more flexible, but with much higher provisioning delay compared to serverless functions. Many of the existing computing engines use resources of a single Cloud provider or of a single resource type, which locks the user with their pros and cons. This thesis will research methods to develop CrossFlow, a scalable and portable platform to run complex applications across various types of cloud continuum resources. This will allow the user to obtain maximum features of each resource type. The applications are built with the existing AFCL language developed by the DPS group and run with the existing enactment engine for serverless applications, which should be extended for cross-layered resources.
Tasks
  • Analyze the existing languages for “infrastructure as code”
  • Develop an abstract and cross-layered language for cloud resources (which will support the master thesis “Agile development of serverless functions with portable function templates”)
  • Develop a fault-tolerant resource manager for recommendation, provisioning, monitoring, accounting, billing, and terminating of cross-layered cloud continuum resources
  • Evaluate the system (cost-performance trade-off) with real life applications
Theoretical Skills  Distributed Systems, Cloud Computing, Functions as a Service, Fault tolerance.
Practical Skills  Java, Cloud providers APIs.
Additonal Information  The following material / tools are useful for this thesis:

  1. S. Ristov, S. Pedratscher, T. Fahringer, “AFCL: An Abstract Function Choreography Language for serverless workflow specification,” Future Generation Computer Systems, Volume 114, 2021, Pages 368-382, ISSN 0167-739X, https://doi.org/10.1016/j.future.2020.08.012
  2. Enactment engine to run serverless workflows https://github.com/sashkoristov/enactmentengine
  3. A multi-FaaS toolkit to facilitate development of portable applications https://github.com/sashkoristov/jFaaS

Title  Experiments and data analysis for serverless computing
Number of students  1
Language  English
Supervisor  Sashko Ristov
Description  The aim of this master thesis is to conduct a series of experiments to evaluate the properties and constraints of multiple regions of widely-known FaaS systems (e.g. AWS Lambda, IBM Cloud Functions, Google Cloud Functions, Alibaba Function Compute, etc.). Numerous function implementations of serverless applications represented as functions choreographies (FCs) will be tested for various configurations (concurrency, assigned memory, latency, region, programming language, etc). The times for the functions and FCs are measured and then evaluated. The measured times include: time until a function request is submitted and function is started, time for the execution of the function and the whole FC (with measurement of memory and CPU consumption), time to receive the response from the FaaS system, and much more. A large number of experiments are started. The measured data must be stored in a database and then statistically evaluated and visualized. A special feature is the consideration of highly scalable FCs, which run e.g. tens of thousands functions. The aim of this work is a better understanding of serverless computing for different FCs and FaaS systems. The trade-off between performance and costs will be examined more closely. The applications are built with the existing AFCL language developed by the DPS group and run with the existing enactment engine.
Tasks
  • Design a common data model for measurement data from various FaaS systems
  • Develop scripts for parsing logs from FaaS systems and saving measurement data in a database
  • Execution of experiments on multiple widely-known FaaS systems
  • Evaluation and visualization of the measured data
  • Develop interfaces for some existing tracing system (e.g. https://workflowhub.org/ or https://wta.atlarge-research.com/)
Theoretical Skills  Distributed Systems, Cloud Computing, Functions as a Service.
Practical Skills  Java, Cloud providers APIs.
Additonal Information  The following material / tools are useful for this thesis:

  1. S. Ristov, S. Pedratscher, T. Fahringer, “AFCL: An Abstract Function Choreography Language for serverless workflow specification,” Future Generation Computer Systems, Volume 114, 2021, Pages 368-382, ISSN 0167-739X, https://doi.org/10.1016/j.future.2020.08.012.
  2. Enactment engine to run serverless workflows https://github.com/sashkoristov/enactmentengine
  3. A multi-FaaS toolkit to facilitate development of portable applications https://github.com/sashkoristov/jFaaS

Title Experimente und Datenanalyse für Clouds
Number of students 1
Language German
Supervisors Thomas Fahringer
Description Das Ziel dieser Arbeit ist die Durchführung einer Serie von Experimenten, um die Eigenschaften und Fähigkeiten von Cloud Infrastrukturen (z.B. Amazon EC2) zu evaluieren. Es werden dabei zahlreiche Virtual Machine Instanzen (VMs) für kleinere Programme getestet. Dabei werden die Zeiten für die VMs und die Programme gemessen und anschließend ausgewertet. Zu den gemessenen Zeiten gehören: Zeit bis eine VM zugewiesen und gestartet wird, Zeit für die Ausführung der Programme (mit Messung von Speicher und CPU Verbrauch), Zeit um die VM wieder freizugeben, uva. Es werden dabei eine große Zahl von Experimenten gestartet (Script Programm). VMs und Programme müssen vorher instrumentiert werden. Die gemessenen Daten müssen in einer Datenbank abgelegt und dann statistisch ausgewertet und visualisiert werden. Eine Besonderheit ist dabei die Berücksichtigung von Spot Instances, die besonders billig aber vom Cloud Provider jederzeit entzogen werden können. Um solche Spot VMs zu bekommen, muss ein sogenanntes Bieterverfahren implementiert werden. Das Ziel dieser Arbeit ist ein besseres Verständnis von Cloud Ressourcen für verschiedene Programme. Dabei soll der Trade-off zwischen Performance und Kosten genauer untersucht werden.
Tasks
  • Script Programm zum Instrumentieren von VMs und Programmen
  • Script Programm zum Lesen und Speichern von Messdaten in einer Datenbank
  • Implementieren eines Bieterverfahrens für Cloud Spot Instances
  • Ausführen von Experimenten auf einer realen Cloud Infrastruktur
  • Auswertung und Visualisierung der gemessenen Daten
Theoretical skills einfache Kenntnisse im Bereich der Statistik
Practical skills Script Sprache, Datenbanken, Visualisierung von Daten
Additional information