Available Theses

Title	Student(s)	Supervisor	Description
Serverless Architectures for Scalable Stream Processing	1 or 2	Thomas Fahringer, Abolfazl Younesi	details
Streaming Anomaly Detection and Fault Tolerance in Stream Processing	1 or 2	Thomas Fahringer, Abolfazl Younesi	details
Optimizing Data Partitioning and Parallelism for Scalable Stream Processing	1 or 2	Thomas Fahringer, Abolfazl Younesi	details
Resource-Aware Scaling and Auto-Tuning in Distributed Stream Processing Systems	1 or 2	Thomas Fahringer, Abolfazl Younesi	details
Resilient State Management and Fault Recovery in Stream Processing	1 or 2	Thomas Fahringer, Abolfazl Younesi	details
Self-Similarity-Aware Task Partitioning for Multi-DAG Systems	1	Thomas Fahringer, Abolfazl Younesi	details
Optimizing Multi-Objective Distributed Workflow Scheduling	1 or 2	Thomas Fahringer, Abolfazl Younesi	details
Extension of a novel programming language for the Cloud-Edge-IoT continuum	1 or 2	Juan Aznar, Marlon Etheredge	details
Distributing High-Impact Scientific Workflows with Apollo	1	Juan Aznar	details
Event-based Invocation of Workflow Applications on the Edge	1	Juan Aznar	details
Detecting critical events on smart buildings using edge-cloud resources	1	Juan Aznar	details
Python Frontend for Serverless Workflows	1	Juan Aznar	details
Additional List of Bachelor Theses offered by Peter Thoman		Peter Thoman	details

Title	Serverless Architectures for Scalable Stream Processing
Number of students	1 – 2 (preferred)
Language	English
Supervisors	Thomas Fahringer, Abolfazl Younesi
Focus	Investigating how serverless computing models (e.g., AWS Lambda, Azure Functions) can be leveraged to build scalable, cost-efficient, and event-driven stream processing so- lutions. Emphasis on elasticity, resource management, and fault tolerance in serverless environments.
Description	This project explores the intersection of serverless computing and stream processing frameworks to handle large-scale data streams with minimal operational overhead. Students will experiment with various platforms and orchestration strategies to ensure efficient scaling, low latency, and robust fault tolerance without relying on dedicated servers.
Tasks	• Serverless Integration: Evaluate and integrate stream processing frameworks (e.g., Apache Flink, Kafka Streams) with serverless platforms. • Elastic Scaling: Design and test auto-scaling policies that adapt to fluctuating workloads in real time. • Cost Optimization: Investigate cost-performance trade-offs in serverless deployments for continuous data processing. • Fault Tolerance and State Management: Explore strategies for stateful stream processing in stateless serverless functions.
Theoretical skills	Distributed Systems Concepts, Cloud Computing and Serverless Paradigms, Performance Modeling and Cost Analysis
Practical skills	Experience with AWS Lambda, Azure Functions, or Google Cloud Functions, Familiarity with Apache Flink/Kafka Streams, Scripting and DevOps (CI/CD, Infrastructure as Code)

Title	Streaming Anomaly Detection and Fault Tolerance in Stream Processing
Number of students	1 – 2 (preferred)
Language	English
Supervisors	Thomas Fahringer, Abolfazl Younesi
Focus	Developing robust methods for detecting anomalies in real time while ensuring fault tolerance within distributed stream processing systems. Integration of advanced anomaly detection algorithms with self-healing and recovery strategies to maintain system integrity and performance amid data irregularities and system failures.
Description	This project investigates the challenges of processing continuous data streams where unexpected anomalies or failures may occur. Students will design and implement algorithms that not only detect unusual patterns or outliers in streaming data but also trigger corrective actions (e.g., checkpointing, reconfiguration, or alerting) to ensure overall system reliability.
Tasks	• Anomaly Detection Algorithms: Develop and evalu- ate both statistical and machine learning methods tailored for streaming data. • Fault Tolerance Mechanisms: Design self-healing and recovery strategies to maintain consistency during system failures. • Integration of Detection and Recovery: Create mechanisms that trigger automated fault-tolerance procedures upon anomaly detection. • Benchmarking and Evaluation: Set up real-world scenarios and synthetic benchmarks to assess performance, latency, and reliability under various fault conditions.
Theoretical skills	Statistical Analysis, Time-Series Modeling, Machine Learning for Streaming Data, Distributed Systems and Fault Tolerance Theories
Practical skills	Programming in Python/Java/Scala, Experience with Streaming Frameworks (e.g., Apache Flink, Spark Streaming), Familiarity with Containerization (Docker, Kubernetes)

Title	Optimizing Data Partitioning and Parallelism for Scalable Stream Processing
Number of students	1 – 2 (preferred)
Language	English
Supervisors	Thomas Fahringer, Abolfazl Younesi
Focus	Investigate and develop novel data partitioning strategies and parallel processing techniques to enhance scalability and performance in distributed stream processing systems.
Description	This project centers on improving how streaming data is partitioned and parallelized across distributed nodes to maximize throughput and minimize latency. Students will analyze current partitioning methods, identify bottlenecks, and propose advanced algorithms that adapt dynamically to workload variations and data skew.
Tasks	• Algorithm Design: Develop adaptive partitioning and parallelism algorithms tailored for real-time streaming environments. • System Integration: Implement the proposed algo- rithms within existing frameworks (e.g., Apache Flink, Spark Streaming). • Performance Evaluation: Benchmark the new strate- gies against traditional partitioning techniques under various load conditions. • Case Studies: Evaluate effectiveness with both synthetic and real-world streaming datasets.
Theoretical skills	Distributed Algorithms, Parallel Computing, Load Balancing, and Data Partitioning Theory, Performance Modeling
Practical skills	Programming in Java/Scala/Python, Experience with Distributed Stream Processing Frameworks, Data Analysis and Benchmarking
Additional Info	This project can generate multiple research outputs, including innovative partitioning algorithms, comparative studies on parallelism strategies, and comprehensive performance benchmarks.

Title	Resource-Aware Scaling and Auto-Tuning in Distributed Stream Processing Systems
Number of students	1 – 2 (preferred)
Language	English
Supervisors	Thomas Fahringer, Abolfazl Younesi
Focus	Develop dynamic, resource-aware scaling and auto-tuning mechanisms for distributed stream processing systems to optimize resource utilization, reduce costs, and maintain low latency under varying workload conditions.
Description	The project aims to create systems that automatically adjust resource allocation and system parameters based on real- time workload monitoring. Students will design models and algorithms that predict workload patterns, manage resource provisioning in cloud or hybrid environments, and auto-tune configurations for optimal performance.
Tasks	• Workload Prediction: Implement machine learning models or statistical methods to forecast incoming data rates and resource demands. • Dynamic Resource Management: Develop strategies for auto-scaling compute and memory resources based on predictions. • Auto-Tuning Mechanisms: Create algorithms that continuously adjust system parameters (e.g., buffer sizes, parallelism levels) to optimize throughput and latency. • Evaluation and Benchmarking: Test the proposed solutions under various real-time scenarios and compare with static configurations.
Theoretical skills	Cloud and Distributed Systems Concepts, Predictive Modeling, Machine Learning, Optimization Theory, and Control Systems
Practical skills	Familiarity with Cloud Platforms (AWS, Azure, etc.), Experience with Apache Flink/Spark Streaming, Scripting and Automation (DevOps tools, CI/CD pipelines)
Additional Info	Research outputs may include novel auto-scaling algorithms,case studies on resource optimization, and guidelines for building resource-aware distributed stream processing architectures.

Title	Resilient State Management and Fault Recovery in Stream Processing
Number of students	1 – 2 (preferred)
Language	English
Supervisors	Thomas Fahringer, Abolfazl Younesi
Focus	Designing robust state management strategies and fault recovery mechanisms that ensure data consistency and minimal processing disruption in the event of system failures. Emphasis on efficient checkpointing, state replication, and dynamic recovery techniques.
Description	This project investigates the critical role of state management in fault-tolerant stream processing systems. Students will research and implement novel approaches for maintaining and recovering system state, such as incremental check- pointing and distributed state replication, to address challenges posed by network failures, node crashes, and data inconsistencies.
Tasks	• Efficient Checkpointing: Develop lightweight and in- cremental checkpointing mechanisms for real-time state capture. • State Replication and Consistency: Design strategies for distributed state replication ensuring strong or even- tual consistency. • Dynamic Recovery Techniques: Implement adaptive fault recovery methods that minimize downtime and data loss. • Experimental Evaluation: Benchmark the proposed solutions against existing state management frameworks using synthetic and real-world datasets.
Theoretical skills	Distributed Systems, Consistency Models, Fault Tolerance Theories, and State Management Algorithms
Practical skills	Proficiency in Java/Scala/Python, Experience with stream processing frameworks (e.g., Apache Flink, Spark Streaming), Familiarity with distributed storage and replication protocols
Additional Info	Potential research outputs include novel checkpointing algorithms, enhanced state replication protocols, and comprehensive performance evaluations under various failure scenarios.

Title	Self-Similarity-Aware Task Partitioning for Multi-DAG Systems
Number of students	1
Language	English
Supervisors	Thomas Fahringer, Abolfazl Younesi
Focus	Develop a scheduling and partitioning algorithm that leverages self-similarity in Directed Acyclic Graphs (DAGs) to optimize task grouping and reduce scheduling complexity.
Description	This project investigates the recurring patterns in DAG structures to identify self-similarity. By applying clustering and pattern recognition techniques, the system groups similar tasks across different DAGs, thus reducing execution time and simplifying dependency management. The framework integrates entropy-based partitioning with self- similarity detection to enhance overall scheduling efficiency.
Tasks	• Identify self-similar patterns in DAG structures using hi- erarchical clustering or pattern recognition. • Design a scheduling algorithm that groups similar tasks to reduce scheduling steps. • Integrate the self-similarity-aware approach with entropy- based partitioning. • Evaluate the impact on resource usage and task execution time using example DAGs and real-world benchmarks.
Theoretical skills	Graph Theory, Clustering Algorithms, Entropy-Based Partitioning, Scheduling Algorithms
Practical skills	Programming in Python/Java, Data Visualization, Experience with DAG-Based Systems, Simulation and Benchmarking
Additional Info	The project may include diagrams and pseudocode to illustrate the algorithm, along with experimental evaluations that demonstrate the benefits of self-similarity-aware task grouping.

Title	Optimizing Multi-Objective Distributed Workflow Scheduling
Number of students	1 – 2 (preferred)
Language	English
Supervisors	Thomas Fahringer, Abolfazl Younesi
Focus	Propose a novel scheduling framework that simultaneously optimizes multiple objectives such as latency, cost, and energy consumption in hybrid cloud-edge environments.
Description	This project will develop and evaluate advanced schedulingalgorithms for distributed workflows. The goal is to balance competing objectives by utilizing both cloud and edge re- sources. The work will include modeling, algorithm design, and extensive simulation/experimentation.
Tasks	• Develop a multi-objective optimization model for work- flow scheduling. • Design and implement a novel scheduling framework. • Evaluate performance using simulation and real-world benchmarks. • Analyze trade-offs between latency, cost, and energy con- sumption.
Theoretical skills	Distributed Algorithms, Optimization Theory, MultiObjective Optimization
Practical skills	Programming in Java/Python, Cloud and Edge Computing Platforms, Simulation and Benchmarking Tools

Title	Extension of a novel programming language for the Cloud-Edge-IoT continuum
Number of students	1 or 2, 2 preferred
Language	English
Supervisors	Juan Aznar, Marlon Etheredge
Description	For a novel programming model for the Cloud-Edge-IoT continuum, we require an extension of our system, focusing on developer tools to ease the development of applications. In this project, the topics under tasks are explored and researched.
Tasks	Visual programming languages, to provide a visual counterpart to an existing language. Validation/verification of applications written in the language. Live visualization and performance analysis of deployed applications. Development of novel use cases/applications using the programming model and comparison against other well-known programming models.
Theoretical skills	Cloud Computing Visual Programming Languages Validation/Verification
Practical skills	Java Programming General Software Development Skills
Additional information	The scope of the project can encompass multiple topics. We prefer two students working on the same project. Inspiration can be derived from NodeRED, Simulink, Ballerina.

Title	Distributing High-Impact Scientific Workflows with Apollo
Number of students	1
Language	English
Supervisors	Juan Aznar
Description	In this thesis, you should execute two to three real scientific workflows (WF) using the FaaS paradigm through the Apollo runtime system [1] and do research with real biological and experimental input datasets. For instance, the 1000genome [2] WF enables identifying genome mutations according to numerous population features for the later study of associated diseases. Another example is Cycles (CW) [3], which is one of the most environmental-friendly workflows. Basically, CW simulates agricultural experiments that enable scientists to evaluate the behavior of crops under different environmental conditions, protecting nature from unnecessary and damaging tests and promoting sustainable agriculture while saving vast amounts of time and resources.[1] https://apollowf.github.io/learn.html [2] https://github.com/wfcommons/pegasus-instances/tree/master/1000genome [3] https://github.com/wfcommons/pegasus-instances/tree/master/cycles
Tasks	Port the tasks of different (two or three) scientific WFs onto an edge/cloud infrastructure and orchestrate them with Apollo in a distributed fashion. Process and prepare experimental datasets to be used as WF input. Study the performance (e.g., time, cost, memory and energy consumption) and scalability of the executed WF for different hardware settings (e.g., edge, cloud, both) and data input sizes. Optimize the performance of the WF execution and explain performance behavior.
Theoretical skills	Cloud computing, FaaS, Serverless
Practical skills	Java, Python (Biopython, Pandas, Numpy), git, GitHub
Additional information

Title	Event-based Invocation of Workflow Applications on the Edge
Number of students	1
Language	English
Supervisors	Juan Aznar
Description	In this thesis you will execute realistic complex tasks and data processing as workflow applications [1] on an edge-cloud infrastructure. To this end, you should trigger the execution of workflows in Apollo [2] using event data in common format [3] (i.e., name, source, type, kind, correlation, dataOnly, and metadata fields), thus providing interoperability across services, platforms and systems. There is numerous event frameworks. In this thesis, you should systematically compare and then select based on the following requirements: (i) runs on IoT, edge and cloud, (ii) can be configured for arbitrary events, (iii) scales for large events, (iv) builds on cloud events standard [3], and (v) is open-source.[1] https://github.com/serverlessworkflow/ [2] https://apollowf.github.io/learn.html [3] https://github.com/cloudevents/spec
Tasks	Rigorously study different event frameworks or platforms. Create one or more workflows whose tasks will be orchestrated by Apollo in a distributed fashion. Optionally you can propose your own workflow application. Integrate Apollo with your selected event infrastructure Define a set of important events and invoke the above mentioned workflows Stress the system with different amount of events and analyze the resulting performance.
Theoretical skills	Cloud computing, Serverless, Docker
Practical skills	Java, Python, git, GitHub, Raspberry Pi, Arduino, or any other IoT/Edge hardware
Additional information

Title	Detecting critical events on smart buildings using edge-cloud resources
Number of students	1
Language	English
Supervisors	Juan Aznar
Description	The goal of this bachelor thesis is to develop fully operational edge devices used to recognize critical events on smart buildings (SB), such as fire, smoke, water leakages, inadequate social distance, unmasked people, among others. Edge devices should be implemented by using commercial and low cost devices (e.g., Raspberry Pi [1]), (thermal) cameras, and Open-Source Machine Learning (ML) libraries [2]. The smart building should react immediately and act in consequence when undesired events occur. To this end, edge devices will be orchestrated by the Apollo system [3] to exploit parallelism, scalability, and load balancing.[1] https://www.raspberrypi.com/ [2] https://opencv.org/ [3] https://apollowf.github.io/learn.html
Tasks	Use Raspberry Pi as an edge device and execute serverless functions. Detection and recognize different critical events using a camera and open-source ML libraries. Create one or more simple workflows (WF) whose tasks will be orchestrated by Apollo in a distributed fashion. Deploy the designed WF to the edge cloud infrastructure mentioned above (Raspberry Pis and (thermal) cameras). Study the performance and scalability of the proposed solution under different levels of stress on the SB (e.g., multiple fires, smoke and crowding at a time). Extensively test with multiple data sets on various hardware settings.
Theoretical skills	Cloud computing, Serverless, Machine Learning, Electronics
Practical skills	Python, git, GitHub
Additional information

Title	Python Frontend for Serverless Workflows
Number of students	1
Language	English
Supervisors	Juan Aznar
Description	Apollo (https://apollowf.github.io/) is the DPS research orchestration and runtime system for Edge-Cloud infrastructures. We are using AFCL (https://apollowf.github.io/learn.html) to describe serverless workflows for distributed applications. As part of this thesis, you will have to create a Python version for AFCL thus application developers can create Python programs to build workflows instead of using AFCL directly. Furthermore, you have to create a transformation system that automatically converts the Python programs into AFCL which is input to APOLLO.
Tasks	Create a Python specification that fully represents the AFCL language constructs thus all AFCL programs can also be represented by this Python specification. Every AFCL program thus should have a Python representation. There are multiple solution paths to this problem, for instance, building a parser or a transformation system that converts the Python representation into AFCL. Other solutions may be possible as well. Your solution should be modular and easy to extend in case of any changes to AFCL. Convert at least 3 AFCL use cases into the Python representation.
Theoretical skills
Practical skills	Advanced Python programmer, git and GitHub, JSON or YAML
Additional information	It is not mandatory but of great help if you passed the lecture and PS on Verteilte Systeme in the computer science bachelor program.This Bachelor Thesis will be supervised by Juan Aznar (IFI/DPS).The student will have the opportunity to work with a state-of-the-art Apollo Edge-Cloud infrastructure. The developed Python frontend will be reused for international projects and published as open-source. Collaborative work in an international project is possible if the student is interested. In the best case this work can also be published and student can travel to conference and present his/her work.