Flow Computing Podcast Series: Prof. Dr. Jörg Keller on PPU Latency and real-world performance (Episode 5)

News – 23/10/24

In Episode 5 of our Flow Computing podcast series, Prof. Dr. Jörg Keller delves into the critical issue of latency in computing and explores how the PPU addresses this challenge.

Professor Keller explains that in traditional CPUs, slow memory access directly impacts application reaction time, creating performance bottlenecks. In contrast, the PPU's architecture is designed to mitigate these latencies by utilizing multiple fibers. As a result, even long memory access times do not necessarily lead to slow application responses.

"When we talk about latency, we really mean two things. For a whole application, we mean whether its reaction time is real-time or not. For a single memory access, by latency, we mean how long it takes. In a classic CPU core, these two are connected—a slow memory access that cannot overlap with other computations slows down the application, so the reaction time goes up. Flow’s architecture is built to hide latencies through multiple fibers. Thus, a long memory access can be tolerated because its latency is hidden and does not translate into a long application reaction time." - Prof. Dr. Jörg Keller

Professor Keller highlights the PPU's efficiency in handling parallel tasks. Although switching control from CPU cores to PPU cores involves some overhead, it is significantly smaller than the overhead of starting new threads on a traditional multi-core CPU.

"When we switch control from CPU cores to PPU cores by running fibers, there is some overhead. However, this overhead is much smaller than the overhead incurred when starting additional threads on a classic multi-core CPU." - Prof. Dr. Jörg Keller

This efficiency makes the PPU well-suited for parallelizing smaller code sections that would typically be impractical to parallelize on traditional CPUs due to their high overhead.

Transcript

JÖRG KELLER: When we talk about latency, we really mean two things. For a whole application, we mean whether its reaction time is real-time or not. For a single memory access, by latency, we mean how long it takes. In a classic CPU core, these two are connected—a slow memory access that cannot overlap with other computations slows down the application, so the reaction time goes up. Flow’s architecture is built to hide latencies through multiple fibers. Thus, a long memory access can be tolerated because its latency is hidden and does not translate into a long application reaction time.

JK: For latency, PPUs decouple memory access latency and latency from the application’s reaction. I think this is a good thing. When we switch control from CPU cores to PPU cores by running fibers, there is some overhead. However, this overhead is much smaller than the overhead incurred when starting additional threads on a classic multi-core CPU. PPU fibers share a large part of their state, and they all execute the same code.

JK: The startup time for fibers is much smaller than for CPU threads, and thus going parallel and handing over control to the PPUs is much faster. Because of this, it pays off for much smaller parts of the code that can be parallelized with PPUs but would not be worth parallelizing on CPU threads, where the overhead would be too large for such a small workload.

Curious to dive deeper into Flow Computing's PPU technology?

Explore the technical details and insights from Professor Keller's analysis in our full report. To request access, contact us at info@flow-computing.com, and we’ll be delighted to share it with you!