Projects

Below are some projects that our group is actively working on. If you are a Ph.D. student, email Sanidhya.

Our group also has some specific semester and optional projects for students at EPFL.

Transient Operating System Design

The main goal of this big project is to dynamically modify various subsystems to cater to heterogeneous hardware and varying application requirements. Most of the prior works focus on IO, while our focus is mostly on the concurrency aspect. In particular, we are exploring how applications can fine-tune the concurrency control mechanisms and underlying stack to improve their performance. Some of the projects are as follows:

  1. A concurrency control runtime to efficiently switch between locks at various granularity.
  2. New low-level language to support lock design while ensuring lock properties, such as mutual exclusion, starvation avoidance, and fairness.
  3. A lightweight hypervisor that caters to various forms of virtualization: bare-metal to serverless.
  4. Re-architecting OS for microsecond IO.

We will further extend this project to reason about data structures' concurrency and consistency.

Scalable Storage Stack

With blazing fast IO devices, saturating them is becoming a difficult task. Unfortunately, the current OS stack is the major bottleneck that is still operating in the era of 2000s. As a part of this big project, we are looking at ways to redesign the OS stack to support fast storage devices. We are working on designing new ways to improves the design of file systems. Some of the projects are as follows:

  1. Designing new techniques to saturate and scale operations for various storage media.
  2. Understanding the implication of storage class memory over traditional storage media, such as SSDs.
  3. Designing new storage engines for upcoming storage media, such as ZNS SSDs.
  4. Offloading file system stack to computational SSDs.

Concurrency Primitives and Frameworks

With our particular interest in designing new synchronization primitives and concurrency frameworks, we are looking at designing new primitives that further squeeze the performance out of hardware for two scenarios: heterogeneous hardware (such as BIG/Little architectures and high bandwidth memory) and rack-scale systems. We are revisiting some of the primitives and trying to reason about their practicality. Some of the ongoing projects are as follows:

  1. Revisiting the design of locking primitives for very large multicore machines.
  2. Redesigning concurrency primitives for microsecond scale application in a rack scale environment.
  3. Reasoning about various bugs in a concurrent environment.

Projects for Bachelors and Masters students

[Performance/eBPF] Performance Modeling of eBPF Programs (Tao Lyu)

eBPF enables 1) customizing the policies of kernel subsystems (e.g., scheduling policy, page eviction policy) and 2) offloading application logic to the kernel as a fast path. In both scenarios, eBPF programs (also known as kernel extensions) reside at the performance-critical paths. Therefore, they should be as performant as possible. Otherwise, eBPF programs can be counterproductive: instead of improving kernel performance, they actually degrade it.

Unfortunately, performance issues are common because (1) eBPF applications are updated frequently, often without rigorous performance review, making them more error-prone; and (2) many eBPF developers aren’t kernel experts and lack familiarity with the performance characteristics and micro-optimizations of hot kernel paths. Unlike safety, which the eBPF verifier forcefully guarantees, the ecosystem overlooks this equally important concern: performance due to the overshadowed of the focus on safety.

In this project, we call on the community to also focus on the performance clarity of eBPF programs. Ideally, in this project, we aim to design a scalable and accurate performance model for eBPF programs. This model, acting as a building block, can further be utilized in various use cases, including:

  • Performance quantification: Quantify the performance of each line of programs with explanations to help people understand their performance.
  • Performance debugging: based on the performance quantification, either manually or semi-automatically adjust programs to achieve better performance.
  • Performance check: Check the new program’s performance against that of functionally equivalent baselines, serving as oracles. This performance check is optional for developers, rather than being mandatory like the safety verification.

Targeting students who want to do a thesis or long-term research.

[Memory Management] Memory allocators in tiered memory (Musa Unal)

Memory management is a critical component of operating systems. Modern data centers rely on different types of memory (with different latency and bandwidth) to manage data, which is also referred as tiered memory. This project aims to understand how profiling can help identify optimal allocation sites in tiered memory.

In this project, you will:

  • Learn how PGO (Profile-guided optimization) can help an application to improve it's performance.
  • Learn how memory allocators works and manage the data.
  • Understand how CXL (Compute Express Link) works.

Prerequisite:

  • Have a basic understanding of how operating systems work.
  • Understanding C/C++

References:

[Memory Management] Energy consumption in tiered memory (Musa Unal)

Memory management is a critical component of operating systems. Modern data centers rely on different types of memory (with different latency and bandwidth) to manage data, which is also referred as tiered memory. This project aims to understand the trade-offs between energy consumption and performance in tiered memory.

In this project, you will:

  • How Linux's memory tiering system works.
  • Understand how to measure energy consumption in memory tiering.
  • Understand how CXL (Compute Express Link) works.

Prerequisite:

  • Have a basic understanding of how operating systems work.
  • Understanding C/C++

References:

[Scalable OS] Linux Kernel Data Structure Switching (Vishal Gupta)

The Linux kernel uses multiple data structures across different subsystems to store data. This includes linked list, hash table, red-black tree and maple tree. However, based on different access patterns these data structures might not be optimal.

In this project, you will:

  • Implement mechanisms to switch between different kernel data structures.
  • Implement policies to decide when to switch.

Prerequisite: Understanding of kernel data structures.

[Scalable OS] Faster Uprobes using User Mode eBPF (Kumar Kartikeya Dwivedi)

BPF based tracing is used to execute programs and collect data when certain functions are triggered in the kernel. The same is possible in user space using the ‘user probes’ feature, where programs are executed when USDT probes are triggered within the user space applications. However, currently the implementation of uprobes requires trapping in to the kernel whenever an event occurs, leading to slowdowns in applications, and being up to 2x slower than system call context switches. This project explores whether the inception of a new ‘user mode eBPF’ mode and making uprobes execute such program types in user space will be faster and have the same level of usability. The ideal end goal would be to attain transparent 100% compatibility with the current uprobe mechanism.

In this project, you will:

  • Develop a deep understanding of the eBPF verifier’s static analysis process.
  • Create a new ‘user mode eBPF’ program type for eBPF.
  • Measure and benchmark usability and performance differences between the current uprobe mechanism and the one based on user mode eBPF.

Prerequisite:

  • A basic level of understanding of eBPF.
  • Proficiency in C and Python.

References:

[Systems for ML] Improving the performance of ML workloads (Yueyang Pan)

ML workloads are at the center stage of the 21st century computing evolution. However, software that runs them is not entirely efficient. Thus, to efficiently utilize current hardware, whether for inference or training, within a single machine or across machines, we need to understand and redesign the current software stack.

In this project, you will:

  • Analyze and understand the overhead current software stack.
  • Do a complete breakdown of the cost associated within a single machine and across machines for both inference and training.
  • Propose a set of optimizations that improves such systems performance.

Prerequisite:

  • Using existing ML software.
  • Basics about ML algorithms.

You will learn:

  • Understanding the performance of software systematically.

In case you have projects that are not mentioned above but fall under the purview of our group's interest, feel free to contact us.