Title | Student(s) | Supervisor | Description |
Serverless Architectures for Scalable Stream Processing | 1 or 2 | Thomas Fahringer, Abolfazl Younesi | details |
Streaming Anomaly Detection and Fault Tolerance in Stream Processing |
1 or 2 | Thomas Fahringer, Abolfazl Younesi | details |
Optimizing Data Partitioning and Parallelism for Scalable Stream Processing |
1 or 2 | Thomas Fahringer, Abolfazl Younesi | details |
Resource-Aware Scaling and Auto-Tuning in Distributed Stream Processing Systems |
1 or 2 | Thomas Fahringer, Abolfazl Younesi | details |
Resilient State Management and Fault Recovery in Stream Processing |
1 or 2 | Thomas Fahringer, Abolfazl Younesi | details |
Self-Similarity-Aware Task Partitioning for Multi-DAG Systems |
1 | Thomas Fahringer, Abolfazl Younesi | details |
Optimizing Multi-Objective Distributed Workflow Scheduling |
1 or 2 | Thomas Fahringer, Abolfazl Younesi | details |
Extension of a novel programming language for the Cloud-Edge-IoT continuum | 1 or 2 | Juan Aznar, Marlon Etheredge | details |
Distributing High-Impact Scientific Workflows with Apollo | 1 | Juan Aznar | details |
Event-based Invocation of Workflow Applications on the Edge | 1 | Juan Aznar | details |
Detecting critical events on smart buildings using edge-cloud resources | 1 | Juan Aznar | details |
Python Frontend for Serverless Workflows | 1 | Juan Aznar | details |
Additional List of Bachelor Theses offered by Peter Thoman |
Peter Thoman | details |
Title | Serverless Architectures for Scalable Stream Processing |
Number of students | 1 – 2 (preferred) |
Language | English |
Supervisors | Thomas Fahringer, Abolfazl Younesi |
Focus | Investigating how serverless computing models (e.g., AWS Lambda, Azure Functions) can be leveraged to build scalable, cost-efficient, and event-driven stream processing so- lutions. Emphasis on elasticity, resource management, and fault tolerance in serverless environments. |
Description | This project explores the intersection of serverless computing and stream processing frameworks to handle large-scale data streams with minimal operational overhead. Students will experiment with various platforms and orchestration strategies to ensure efficient scaling, low latency, and robust fault tolerance without relying on dedicated servers. |
Tasks | • Serverless Integration: Evaluate and integrate stream processing frameworks (e.g., Apache Flink, Kafka Streams) with serverless platforms. • Elastic Scaling: Design and test auto-scaling policies that adapt to fluctuating workloads in real time. • Cost Optimization: Investigate cost-performance trade-offs in serverless deployments for continuous data processing. • Fault Tolerance and State Management: Explore strategies for stateful stream processing in stateless serverless functions. |
Theoretical skills | Distributed Systems Concepts, Cloud Computing and Serverless Paradigms, Performance Modeling and Cost Analysis |
Practical skills | Experience with AWS Lambda, Azure Functions, or Google Cloud Functions, Familiarity with Apache Flink/Kafka Streams, Scripting and DevOps (CI/CD, Infrastructure as Code) |
Title | Streaming Anomaly Detection and Fault Tolerance in Stream Processing |
Number of students | 1 – 2 (preferred) |
Language | English |
Supervisors | Thomas Fahringer, Abolfazl Younesi |
Focus | Developing robust methods for detecting anomalies in real time while ensuring fault tolerance within distributed stream processing systems. Integration of advanced anomaly detection algorithms with self-healing and recovery strategies to maintain system integrity and performance amid data irregularities and system failures. |
Description | This project investigates the challenges of processing continuous data streams where unexpected anomalies or failures may occur. Students will design and implement algorithms that not only detect unusual patterns or outliers in streaming data but also trigger corrective actions (e.g., checkpointing, reconfiguration, or alerting) to ensure overall system reliability. |
Tasks | • Anomaly Detection Algorithms: Develop and evalu- ate both statistical and machine learning methods tailored for streaming data. • Fault Tolerance Mechanisms: Design self-healing and recovery strategies to maintain consistency during system failures. • Integration of Detection and Recovery: Create mechanisms that trigger automated fault-tolerance procedures upon anomaly detection. • Benchmarking and Evaluation: Set up real-world scenarios and synthetic benchmarks to assess performance, latency, and reliability under various fault conditions. |
Theoretical skills | Statistical Analysis, Time-Series Modeling, Machine Learning for Streaming Data, Distributed Systems and Fault Tolerance Theories |
Practical skills | Programming in Python/Java/Scala, Experience with Streaming Frameworks (e.g., Apache Flink, Spark Streaming), Familiarity with Containerization (Docker, Kubernetes) |
Title | Optimizing Data Partitioning and Parallelism for Scalable Stream Processing |
Number of students | 1 – 2 (preferred) |
Language | English |
Supervisors | Thomas Fahringer, Abolfazl Younesi |
Focus | Investigate and develop novel data partitioning strategies and parallel processing techniques to enhance scalability and performance in distributed stream processing systems. |
Description | This project centers on improving how streaming data is partitioned and parallelized across distributed nodes to maximize throughput and minimize latency. Students will analyze current partitioning methods, identify bottlenecks, and propose advanced algorithms that adapt dynamically to workload variations and data skew. |
Tasks | • Algorithm Design: Develop adaptive partitioning and parallelism algorithms tailored for real-time streaming environments. • System Integration: Implement the proposed algo- rithms within existing frameworks (e.g., Apache Flink, Spark Streaming). • Performance Evaluation: Benchmark the new strate- gies against traditional partitioning techniques under various load conditions. • Case Studies: Evaluate effectiveness with both synthetic and real-world streaming datasets. |
Theoretical skills | Distributed Algorithms, Parallel Computing, Load Balancing, and Data Partitioning Theory, Performance Modeling |
Practical skills | Programming in Java/Scala/Python, Experience with Distributed Stream Processing Frameworks, Data Analysis and Benchmarking |
Additional Info | This project can generate multiple research outputs, including innovative partitioning algorithms, comparative studies on parallelism strategies, and comprehensive performance benchmarks. |
Title | Resource-Aware Scaling and Auto-Tuning in Distributed Stream Processing Systems |
Number of students | 1 – 2 (preferred) |
Language | English |
Supervisors | Thomas Fahringer, Abolfazl Younesi |
Focus | Develop dynamic, resource-aware scaling and auto-tuning mechanisms for distributed stream processing systems to optimize resource utilization, reduce costs, and maintain low latency under varying workload conditions. |
Description | The project aims to create systems that automatically adjust resource allocation and system parameters based on real- time workload monitoring. Students will design models and algorithms that predict workload patterns, manage resource provisioning in cloud or hybrid environments, and auto-tune configurations for optimal performance. |
Tasks | • Workload Prediction: Implement machine learning models or statistical methods to forecast incoming data rates and resource demands. • Dynamic Resource Management: Develop strategies for auto-scaling compute and memory resources based on predictions. • Auto-Tuning Mechanisms: Create algorithms that continuously adjust system parameters (e.g., buffer sizes, parallelism levels) to optimize throughput and latency. • Evaluation and Benchmarking: Test the proposed solutions under various real-time scenarios and compare with static configurations. |
Theoretical skills | Cloud and Distributed Systems Concepts, Predictive Modeling, Machine Learning, Optimization Theory, and Control Systems |
Practical skills | Familiarity with Cloud Platforms (AWS, Azure, etc.), Experience with Apache Flink/Spark Streaming, Scripting and Automation (DevOps tools, CI/CD pipelines) |
Additional Info | Research outputs may include novel auto-scaling algorithms,case studies on resource optimization, and guidelines for building resource-aware distributed stream processing architectures. |
Title | Resilient State Management and Fault Recovery in Stream Processing |
Number of students | 1 – 2 (preferred) |
Language | English |
Supervisors | Thomas Fahringer, Abolfazl Younesi |
Focus | Designing robust state management strategies and fault recovery mechanisms that ensure data consistency and minimal processing disruption in the event of system failures. Emphasis on efficient checkpointing, state replication, and dynamic recovery techniques. |
Description | This project investigates the critical role of state management in fault-tolerant stream processing systems. Students will research and implement novel approaches for maintaining and recovering system state, such as incremental check- pointing and distributed state replication, to address challenges posed by network failures, node crashes, and data inconsistencies. |
Tasks | • Efficient Checkpointing: Develop lightweight and in- cremental checkpointing mechanisms for real-time state capture. • State Replication and Consistency: Design strategies for distributed state replication ensuring strong or even- tual consistency. • Dynamic Recovery Techniques: Implement adaptive fault recovery methods that minimize downtime and data loss. • Experimental Evaluation: Benchmark the proposed solutions against existing state management frameworks using synthetic and real-world datasets. |
Theoretical skills | Distributed Systems, Consistency Models, Fault Tolerance Theories, and State Management Algorithms |
Practical skills | Proficiency in Java/Scala/Python, Experience with stream processing frameworks (e.g., Apache Flink, Spark Streaming), Familiarity with distributed storage and replication protocols |
Additional Info | Potential research outputs include novel checkpointing algorithms, enhanced state replication protocols, and comprehensive performance evaluations under various failure scenarios. |
Title | Self-Similarity-Aware Task Partitioning for Multi-DAG Systems |
Number of students | 1 |
Language | English |
Supervisors | Thomas Fahringer, Abolfazl Younesi |
Focus | Develop a scheduling and partitioning algorithm that leverages self-similarity in Directed Acyclic Graphs (DAGs) to optimize task grouping and reduce scheduling complexity. |
Description | This project investigates the recurring patterns in DAG structures to identify self-similarity. By applying clustering and pattern recognition techniques, the system groups similar tasks across different DAGs, thus reducing execution time and simplifying dependency management. The framework integrates entropy-based partitioning with self- similarity detection to enhance overall scheduling efficiency. |
Tasks | • Identify self-similar patterns in DAG structures using hi- erarchical clustering or pattern recognition. • Design a scheduling algorithm that groups similar tasks to reduce scheduling steps. • Integrate the self-similarity-aware approach with entropy- based partitioning. • Evaluate the impact on resource usage and task execution time using example DAGs and real-world benchmarks. |
Theoretical skills | Graph Theory, Clustering Algorithms, Entropy-Based Partitioning, Scheduling Algorithms |
Practical skills | Programming in Python/Java, Data Visualization, Experience with DAG-Based Systems, Simulation and Benchmarking |
Additional Info | The project may include diagrams and pseudocode to illustrate the algorithm, along with experimental evaluations that demonstrate the benefits of self-similarity-aware task grouping. |
Title | Optimizing Multi-Objective Distributed Workflow Scheduling |
Number of students | 1 – 2 (preferred) |
Language | English |
Supervisors | Thomas Fahringer, Abolfazl Younesi |
Focus | Propose a novel scheduling framework that simultaneously optimizes multiple objectives such as latency, cost, and energy consumption in hybrid cloud-edge environments. |
Description | This project will develop and evaluate advanced schedulingalgorithms for distributed workflows. The goal is to balance competing objectives by utilizing both cloud and edge re- sources. The work will include modeling, algorithm design, and extensive simulation/experimentation. |
Tasks | • Develop a multi-objective optimization model for work- flow scheduling. • Design and implement a novel scheduling framework. • Evaluate performance using simulation and real-world benchmarks. • Analyze trade-offs between latency, cost, and energy con- sumption. |
Theoretical skills | Distributed Algorithms, Optimization Theory, MultiObjective Optimization |
Practical skills | Programming in Java/Python, Cloud and Edge Computing Platforms, Simulation and Benchmarking Tools |
Title | Extension of a novel programming language for the Cloud-Edge-IoT continuum |
Number of students | 1 or 2, 2 preferred |
Language | English |
Supervisors | Juan Aznar, Marlon Etheredge |
Description | For a novel programming model for the Cloud-Edge-IoT continuum, we require an extension of our system, focusing on developer tools to ease the development of applications. In this project, the topics under tasks are explored and researched. |
Tasks |
|
Theoretical skills |
|
Practical skills |
|
Additional information | The scope of the project can encompass multiple topics. We prefer two students working on the same project. Inspiration can be derived from NodeRED, Simulink, Ballerina. |
Title | Distributing High-Impact Scientific Workflows with Apollo |
Number of students | 1 |
Language | English |
Supervisors | Juan Aznar |
Description | In this thesis, you should execute two to three real scientific workflows (WF) using the FaaS paradigm through the Apollo runtime system [1] and do research with real biological and experimental input datasets. For instance, the 1000genome [2] WF enables identifying genome mutations according to numerous population features for the later study of associated diseases. Another example is Cycles (CW) [3], which is one of the most environmental-friendly workflows. Basically, CW simulates agricultural experiments that enable scientists to evaluate the behavior of crops under different environmental conditions, protecting nature from unnecessary and damaging tests and promoting sustainable agriculture while saving vast amounts of time and resources.[1] https://apollowf.github.io/learn.html [2] https://github.com/wfcommons/pegasus-instances/tree/master/1000genome [3] https://github.com/wfcommons/pegasus-instances/tree/master/cycles |
Tasks |
|
Theoretical skills | Cloud computing, FaaS, Serverless |
Practical skills | Java, Python (Biopython, Pandas, Numpy), git, GitHub |
Additional information |
Title | Event-based Invocation of Workflow Applications on the Edge |
Number of students | 1 |
Language | English |
Supervisors | Juan Aznar |
Description | In this thesis you will execute realistic complex tasks and data processing as workflow applications [1] on an edge-cloud infrastructure. To this end, you should trigger the execution of workflows in Apollo [2] using event data in common format [3] (i.e., name, source, type, kind, correlation, dataOnly, and metadata fields), thus providing interoperability across services, platforms and systems. There is numerous event frameworks. In this thesis, you should systematically compare and then select based on the following requirements: (i) runs on IoT, edge and cloud, (ii) can be configured for arbitrary events, (iii) scales for large events, (iv) builds on cloud events standard [3], and (v) is open-source.[1] https://github.com/serverlessworkflow/ [2] https://apollowf.github.io/learn.html [3] https://github.com/cloudevents/spec |
Tasks |
|
Theoretical skills | Cloud computing, Serverless, Docker |
Practical skills | Java, Python, git, GitHub, Raspberry Pi, Arduino, or any other IoT/Edge hardware |
Additional information |
Title | Detecting critical events on smart buildings using edge-cloud resources |
Number of students | 1 |
Language | English |
Supervisors | Juan Aznar |
Description | The goal of this bachelor thesis is to develop fully operational edge devices used to recognize critical events on smart buildings (SB), such as fire, smoke, water leakages, inadequate social distance, unmasked people, among others. Edge devices should be implemented by using commercial and low cost devices (e.g., Raspberry Pi [1]), (thermal) cameras, and Open-Source Machine Learning (ML) libraries [2]. The smart building should react immediately and act in consequence when undesired events occur. To this end, edge devices will be orchestrated by the Apollo system [3] to exploit parallelism, scalability, and load balancing.[1] https://www.raspberrypi.com/ [2] https://opencv.org/ [3] https://apollowf.github.io/learn.html |
Tasks |
|
Theoretical skills | Cloud computing, Serverless, Machine Learning, Electronics |
Practical skills | Python, git, GitHub |
Additional information |
Title | Python Frontend for Serverless Workflows |
Number of students | 1 |
Language | English |
Supervisors | Juan Aznar |
Description | Apollo (https://apollowf.github.io/) is the DPS research orchestration and runtime system for Edge-Cloud infrastructures. We are using AFCL (https://apollowf.github.io/learn.html) to describe serverless workflows for distributed applications. As part of this thesis, you will have to create a Python version for AFCL thus application developers can create Python programs to build workflows instead of using AFCL directly. Furthermore, you have to create a transformation system that automatically converts the Python programs into AFCL which is input to APOLLO. |
Tasks |
|
Theoretical skills | |
Practical skills | Advanced Python programmer, git and GitHub, JSON or YAML |
Additional information | It is not mandatory but of great help if you passed the lecture and PS on Verteilte Systeme in the computer science bachelor program.This Bachelor Thesis will be supervised by Juan Aznar (IFI/DPS).The student will have the opportunity to work with a state-of-the-art Apollo Edge-Cloud infrastructure. The developed Python frontend will be reused for international projects and published as open-source. Collaborative work in an international project is possible if the student is interested. In the best case this work can also be published and student can travel to conference and present his/her work. |