Available Theses

Title Student(s) Supervisor Description
Serverless Architectures for Scalable Stream Processing 1 or 2 Thomas Fahringer, Abolfazl Younesi details
Streaming Anomaly Detection and Fault Tolerance in Stream Processing
1 or 2 Thomas Fahringer, Abolfazl Younesi details
Optimizing Data Partitioning and Parallelism for Scalable Stream Processing
1 or 2 Thomas Fahringer, Abolfazl Younesi details
Resource-Aware Scaling and Auto-Tuning in Distributed Stream Processing Systems
1 or 2 Thomas Fahringer, Abolfazl Younesi details
Resilient State Management and Fault Recovery in Stream Processing
1 or 2 Thomas Fahringer, Abolfazl Younesi details
Self-Similarity-Aware Task Partitioning for Multi-DAG Systems
1 Thomas Fahringer, Abolfazl Younesi details
Optimizing Multi-Objective Distributed Workflow Scheduling
1 or 2 Thomas Fahringer, Abolfazl Younesi details
Extension of a novel programming language for the Cloud-Edge-IoT continuum 1 or 2 Juan Aznar, Marlon Etheredge details
Distributing High-Impact Scientific Workflows with Apollo 1 Juan Aznar details
Event-based Invocation of Workflow Applications on the Edge 1 Juan Aznar details
Detecting critical events on smart buildings using edge-cloud resources 1 Juan Aznar details
Python Frontend for Serverless Workflows 1 Juan Aznar details
Additional List of Bachelor Theses offered by Peter Thoman
Peter Thoman details

Title Serverless Architectures for Scalable Stream Processing
Number of students 1 – 2 (preferred)
Language English
Supervisors Thomas Fahringer, Abolfazl Younesi
Focus Investigating how serverless computing models (e.g., AWS Lambda, Azure Functions) can be leveraged to build scalable, cost-efficient, and event-driven stream processing so- lutions. Emphasis on elasticity, resource management, and fault tolerance in serverless environments.
Description This project explores the intersection of serverless computing and stream processing frameworks to handle large-scale data streams with minimal operational overhead. Students will experiment with various platforms and orchestration strategies to ensure efficient scaling, low latency, and robust fault tolerance without relying on dedicated servers.
Tasks Serverless Integration: Evaluate and integrate stream processing frameworks (e.g., Apache Flink, Kafka Streams) with serverless platforms.
Elastic Scaling: Design and test auto-scaling policies that adapt to fluctuating workloads in real time.
Cost Optimization: Investigate cost-performance trade-offs in serverless deployments for continuous data processing.
Fault Tolerance and State Management: Explore strategies for stateful stream processing in stateless serverless functions.
Theoretical skills Distributed Systems Concepts, Cloud Computing and Serverless Paradigms, Performance Modeling and Cost Analysis
Practical skills Experience with AWS Lambda, Azure Functions, or Google Cloud Functions, Familiarity with Apache Flink/Kafka Streams, Scripting and DevOps (CI/CD, Infrastructure as Code)

Title Streaming Anomaly Detection and Fault Tolerance in Stream Processing
Number of students 1 – 2 (preferred)
Language English
Supervisors Thomas Fahringer, Abolfazl Younesi
Focus Developing robust methods for detecting anomalies in real time while ensuring fault tolerance within distributed stream processing systems. Integration of advanced anomaly detection algorithms with self-healing and recovery strategies to maintain system integrity and performance amid data irregularities and system failures.
Description This project investigates the challenges of processing continuous data streams where unexpected anomalies or failures may occur. Students will design and implement algorithms that not only detect unusual patterns or outliers in streaming data but also trigger corrective actions (e.g., checkpointing, reconfiguration, or alerting) to ensure overall system reliability.
Tasks Anomaly Detection Algorithms: Develop and evalu- ate both statistical and machine learning methods tailored for streaming data.
Fault Tolerance Mechanisms: Design self-healing and recovery strategies to maintain consistency during system failures.
Integration of Detection and Recovery: Create mechanisms that trigger automated fault-tolerance procedures upon anomaly detection.
Benchmarking and Evaluation: Set up real-world scenarios and synthetic benchmarks to assess performance, latency, and reliability under various fault conditions.
Theoretical skills Statistical Analysis, Time-Series Modeling, Machine Learning for Streaming Data, Distributed Systems and Fault Tolerance Theories
Practical skills Programming in Python/Java/Scala, Experience with Streaming Frameworks (e.g., Apache Flink, Spark Streaming), Familiarity with Containerization (Docker, Kubernetes)

Title Optimizing Data Partitioning and Parallelism for Scalable Stream Processing
Number of students 1 – 2 (preferred)
Language English
Supervisors Thomas Fahringer, Abolfazl Younesi
Focus Investigate and develop novel data partitioning strategies and parallel processing techniques to enhance scalability and performance in distributed stream processing systems.
Description This project centers on improving how streaming data is partitioned and parallelized across distributed nodes to maximize throughput and minimize latency. Students will analyze current partitioning methods, identify bottlenecks, and propose advanced algorithms that adapt dynamically to workload variations and data skew.
Tasks Algorithm Design: Develop adaptive partitioning and parallelism algorithms tailored for real-time streaming environments.
System Integration: Implement the proposed algo- rithms within existing frameworks (e.g., Apache Flink, Spark Streaming).
Performance Evaluation: Benchmark the new strate- gies against traditional partitioning techniques under various load conditions.
Case Studies: Evaluate effectiveness with both synthetic and real-world streaming datasets.
Theoretical skills Distributed Algorithms, Parallel Computing, Load Balancing, and Data Partitioning Theory, Performance Modeling
Practical skills Programming in Java/Scala/Python, Experience with Distributed Stream Processing Frameworks, Data Analysis and Benchmarking
Additional Info This project can generate multiple research outputs, including innovative partitioning algorithms, comparative studies on parallelism strategies, and comprehensive performance benchmarks.

Title Resource-Aware Scaling and Auto-Tuning in Distributed Stream Processing Systems
Number of students 1 – 2 (preferred)
Language English
Supervisors Thomas Fahringer, Abolfazl Younesi
Focus Develop dynamic, resource-aware scaling and auto-tuning mechanisms for distributed stream processing systems to optimize resource utilization, reduce costs, and maintain low latency under varying workload conditions.
Description The project aims to create systems that automatically adjust resource allocation and system parameters based on real- time workload monitoring. Students will design models and algorithms that predict workload patterns, manage resource provisioning in cloud or hybrid environments, and auto-tune configurations for optimal performance.
Tasks Workload Prediction: Implement machine learning models or statistical methods to forecast incoming data rates and resource demands.
Dynamic Resource Management: Develop strategies for auto-scaling compute and memory resources based on predictions.
Auto-Tuning Mechanisms: Create algorithms that continuously adjust system parameters (e.g., buffer sizes, parallelism levels) to optimize throughput and latency.
Evaluation and Benchmarking: Test the proposed solutions under various real-time scenarios and compare with static configurations.
Theoretical skills Cloud and Distributed Systems Concepts, Predictive Modeling, Machine Learning, Optimization Theory, and Control Systems
Practical skills Familiarity with Cloud Platforms (AWS, Azure, etc.), Experience with Apache Flink/Spark Streaming, Scripting and Automation (DevOps tools, CI/CD pipelines)
Additional Info Research outputs may include novel auto-scaling algorithms,case studies on resource optimization, and guidelines for building resource-aware distributed stream processing architectures.

Title Resilient State Management and Fault Recovery in Stream Processing
Number of students 1 – 2 (preferred)
Language English
Supervisors Thomas Fahringer, Abolfazl Younesi
Focus Designing robust state management strategies and fault recovery mechanisms that ensure data consistency and minimal processing disruption in the event of system failures. Emphasis on efficient checkpointing, state replication, and dynamic recovery techniques.
Description This project investigates the critical role of state management in fault-tolerant stream processing systems. Students will research and implement novel approaches for maintaining and recovering system state, such as incremental check- pointing and distributed state replication, to address challenges posed by network failures, node crashes, and data inconsistencies.
Tasks Efficient Checkpointing: Develop lightweight and in- cremental checkpointing mechanisms for real-time state capture.
State Replication and Consistency: Design strategies for distributed state replication ensuring strong or even- tual consistency.
Dynamic Recovery Techniques: Implement adaptive fault recovery methods that minimize downtime and data loss.
Experimental Evaluation: Benchmark the proposed solutions against existing state management frameworks using synthetic and real-world datasets.
Theoretical skills Distributed Systems, Consistency Models, Fault Tolerance Theories, and State Management Algorithms
Practical skills Proficiency in Java/Scala/Python, Experience with stream processing frameworks (e.g., Apache Flink, Spark Streaming), Familiarity with distributed storage and replication protocols
Additional Info Potential research outputs include novel checkpointing algorithms, enhanced state replication protocols, and comprehensive performance evaluations under various failure scenarios.

Title Self-Similarity-Aware Task Partitioning for Multi-DAG Systems
Number of students 1
Language English
Supervisors Thomas Fahringer, Abolfazl Younesi
Focus Develop a scheduling and partitioning algorithm that leverages self-similarity in Directed Acyclic Graphs (DAGs) to optimize task grouping and reduce scheduling complexity.
Description This project investigates the recurring patterns in DAG structures to identify self-similarity. By applying clustering and pattern recognition techniques, the system groups similar tasks across different DAGs, thus reducing execution time and simplifying dependency management. The framework integrates entropy-based partitioning with self- similarity detection to enhance overall scheduling efficiency.
Tasks • Identify self-similar patterns in DAG structures using hi- erarchical clustering or pattern recognition.
• Design a scheduling algorithm that groups similar tasks to reduce scheduling steps.
• Integrate the self-similarity-aware approach with entropy- based partitioning.
• Evaluate the impact on resource usage and task execution time using example DAGs and real-world benchmarks.
Theoretical skills Graph Theory, Clustering Algorithms, Entropy-Based Partitioning, Scheduling Algorithms
Practical skills Programming in Python/Java, Data Visualization, Experience with DAG-Based Systems, Simulation and Benchmarking
Additional Info The project may include diagrams and pseudocode to illustrate the algorithm, along with experimental evaluations that demonstrate the benefits of self-similarity-aware task grouping.

Title Optimizing Multi-Objective Distributed Workflow Scheduling
Number of students 1 – 2 (preferred)
Language English
Supervisors Thomas Fahringer, Abolfazl Younesi
Focus Propose a novel scheduling framework that simultaneously optimizes multiple objectives such as latency, cost, and energy consumption in hybrid cloud-edge environments.
Description This project will develop and evaluate advanced schedulingalgorithms for distributed workflows. The goal is to balance competing objectives by utilizing both cloud and edge re- sources. The work will include modeling, algorithm design, and extensive simulation/experimentation.
Tasks • Develop a multi-objective optimization model for work- flow scheduling.
• Design and implement a novel scheduling framework.
• Evaluate performance using simulation and real-world benchmarks.
• Analyze trade-offs between latency, cost, and energy con- sumption.
Theoretical skills Distributed Algorithms, Optimization Theory, MultiObjective Optimization
Practical skills Programming in Java/Python, Cloud and Edge Computing Platforms, Simulation and Benchmarking Tools

Title Extension of a novel programming language for the Cloud-Edge-IoT continuum
Number of students  1 or 2, 2 preferred
Language  English
Supervisors  Juan Aznar, Marlon Etheredge
Description  For a novel programming model for the Cloud-Edge-IoT continuum, we require an extension of our system, focusing on developer tools to ease the development of applications. In this project, the topics under tasks are explored and researched.
Tasks
  • Visual programming languages, to provide a visual counterpart to an existing language.
  • Validation/verification of applications written in the language.
  • Live visualization and performance analysis of deployed applications.
  • Development of novel use cases/applications using the programming model and comparison against other well-known programming models.
Theoretical skills
  • Cloud Computing
  • Visual Programming Languages
  • Validation/Verification
Practical skills
  • Java Programming
  • General Software Development Skills
Additional information The scope of the project can encompass multiple topics. We prefer two students working on the same project.
Inspiration can be derived from NodeRED, Simulink, Ballerina.

Title Distributing High-Impact Scientific Workflows with Apollo
Number of students 1
Language English
Supervisors Juan Aznar
Description In this thesis, you should execute two to three real scientific workflows (WF) using the FaaS paradigm through the Apollo runtime system [1] and do research with real biological and experimental input datasets. For instance, the 1000genome [2] WF enables identifying genome mutations according to numerous population features for the later study of associated diseases. Another example is Cycles (CW) [3], which is one of the most environmental-friendly workflows. Basically, CW simulates agricultural experiments that enable scientists to evaluate the behavior of crops under different environmental conditions, protecting nature from unnecessary and damaging tests and promoting sustainable agriculture while saving vast amounts of time and resources.[1] https://apollowf.github.io/learn.html
[2] https://github.com/wfcommons/pegasus-instances/tree/master/1000genome
[3] https://github.com/wfcommons/pegasus-instances/tree/master/cycles
Tasks
  • Port the tasks of different (two or three) scientific WFs onto an edge/cloud infrastructure and orchestrate them with Apollo in a distributed fashion.
  • Process and prepare experimental datasets to be used as WF input.
  • Study the performance (e.g., time, cost, memory and energy consumption) and scalability of the executed WF for different hardware settings (e.g., edge, cloud, both) and data input sizes.
  • Optimize the performance of the WF execution and explain performance behavior.
Theoretical skills Cloud computing, FaaS, Serverless
Practical skills Java, Python (Biopython, Pandas, Numpy), git, GitHub
Additional information

Title Event-based Invocation of Workflow Applications on the Edge
Number of students 1
Language English
Supervisors Juan Aznar
Description In this thesis you will execute realistic complex tasks and data processing as workflow applications [1] on an edge-cloud infrastructure. To this end, you should trigger the execution of workflows in Apollo [2] using event data in common format [3] (i.e., name, source, type, kind, correlation, dataOnly, and metadata fields), thus providing interoperability across services, platforms and systems. There is numerous event frameworks. In this thesis, you should systematically compare and then select based on the following requirements:
(i) runs on IoT, edge and cloud,
(ii) can be configured for arbitrary events,
(iii) scales for large events,
(iv) builds on cloud events standard [3], and
(v) is open-source.[1] https://github.com/serverlessworkflow/
[2] https://apollowf.github.io/learn.html
[3] https://github.com/cloudevents/spec
Tasks
  • Rigorously study different event frameworks or platforms.
  • Create one or more workflows whose tasks will be orchestrated by Apollo in a distributed fashion. Optionally you can propose your own workflow application.
  • Integrate Apollo with your selected event infrastructure
  • Define a set of important events and invoke the above mentioned workflows
  • Stress the system with different amount of events and analyze the resulting performance.
Theoretical skills Cloud computing, Serverless, Docker
Practical skills Java, Python, git, GitHub, Raspberry Pi, Arduino, or any other IoT/Edge hardware
Additional information

Title Detecting critical events on smart buildings using edge-cloud resources
Number of students 1
Language English
Supervisors Juan Aznar
Description The goal of this bachelor thesis is to develop fully operational edge devices used to recognize critical events on smart buildings (SB), such as fire, smoke, water leakages, inadequate social distance, unmasked people, among others. Edge devices should be implemented by using commercial and low cost devices (e.g., Raspberry Pi [1]), (thermal) cameras, and Open-Source Machine Learning (ML) libraries [2]. The smart building should react immediately and act in consequence when undesired events occur. To this end, edge devices will be orchestrated by the Apollo system [3] to exploit parallelism, scalability, and load balancing.[1] https://www.raspberrypi.com/
[2] https://opencv.org/
[3] https://apollowf.github.io/learn.html
Tasks
  • Use Raspberry Pi as an edge device and execute serverless functions.
  • Detection and recognize different critical events using a camera and open-source ML libraries.
  • Create one or more simple workflows (WF) whose tasks will be orchestrated by Apollo in a distributed fashion.
  • Deploy the designed WF to the edge cloud infrastructure mentioned above (Raspberry Pis and (thermal) cameras).
  • Study the performance and scalability of the proposed solution under different levels of stress on the SB (e.g., multiple fires, smoke and crowding at a time).
  • Extensively test with multiple data sets on various hardware settings.
Theoretical skills Cloud computing, Serverless, Machine Learning, Electronics
Practical skills Python, git, GitHub
Additional information

Title Python Frontend for Serverless Workflows
Number of students 1
Language English
Supervisors Juan Aznar
Description Apollo (https://apollowf.github.io/) is the DPS research orchestration and runtime system for Edge-Cloud infrastructures. We are using AFCL (https://apollowf.github.io/learn.html) to describe serverless workflows for distributed applications. As part of this thesis, you will have to create a Python version for AFCL thus application developers can create Python programs to build workflows instead of using AFCL directly. Furthermore, you have to create a transformation system that automatically converts the Python programs into AFCL which is input to APOLLO.
Tasks
  • Create a Python specification that fully represents the AFCL language constructs thus all AFCL programs can also be represented by this Python specification. Every AFCL program thus should have a Python representation.
  • There are multiple solution paths to this problem, for instance, building a parser or a transformation system that converts the Python representation into AFCL. Other solutions may be possible as well.
  • Your solution should be modular and easy to extend in case of any changes to AFCL.
  • Convert at least 3 AFCL use cases into the Python representation.
Theoretical skills
Practical skills Advanced Python programmer, git and GitHub, JSON or YAML
Additional information It is not mandatory but of great help if you passed the lecture and PS on Verteilte Systeme in the computer science bachelor program.This Bachelor Thesis will be supervised by Juan Aznar (IFI/DPS).The student will have the opportunity to work with a state-of-the-art Apollo Edge-Cloud infrastructure. The developed Python frontend will be reused for international projects and published as open-source. Collaborative work in an international project is possible if the student is interested. In the best case this work can also be published and student can travel to conference and present his/her work.