Flow Computing Podcast Series: Prof. Dr. Jörg Keller on PPU Latency and real-world performance (Episode 5)
News – 23/10/24
In Episode 5 of our Flow Computing podcast series, Prof. Dr. Jörg Keller tackles the critical topic of latency in computing and how the PPU addresses this challenge.
Professor Keller explains that in traditional CPUs, slow memory access can directly impact application reaction time, leading to performance bottlenecks. However, the PPU's architecture is designed to hide these latencies through the use of multiple fibers. This means that even long memory access times do not necessarily translate into slow application responses.
"When we talk about latency, we really mean two things. For a whole application, we mean if its reaction time is real-time or not. For a single memory access, we mean by latency, how long it takes. In a classic CPU core, these two things are connected. A slow memory access which cannot be overlapped with other computation slows down the application. So the reaction time goes up. Flow's architecture is built to hide latencies through multiple fibers. So a long memory access can be tolerated as its latency is hidden and does not translate into long application reaction time." - Prof. Dr. Jörg Keller
Professor Keller highlights the efficiency of the PPU in handling parallel tasks. While there is an overhead associated with switching control from CPU cores to PPU cores, this overhead is significantly smaller compared to starting new threads on a traditional multi-core CPU.
"So when we switch control from CPU cores to PPU cores by running fibers, there is a certain overhead. However, this overhead is much smaller than the overhead when we start further threads on a classic CPU with multiple cores." - Prof. Dr. Jörg Keller
This efficiency makes the PPU suitable for parallelizing even smaller code sections that would not be worthwhile to parallelize on traditional CPUs due to the high overhead.
Transcript
JÖRG KELLER: When we talk about latency, we really mean two things. For a whole application we mean if its reaction time is real time or not. For a single memory access, we mean by latency, how long it takes. In a classic CPU core these two things are connected. A slow memory access which cannot be overlapped with other computation slows down the application. So the reaction time goes up. Flow’s architecture is built to hide latencies through multiple fibers. So a long memory access can be tolerated as its latency is hidden and does not translate into long application reaction time.
JK: So for latency PPU’s have decoupled memory access latency and latency in the application's reaction. And this is I think a good thing. So when we switch control from CPU cores to PPU cores by running fibers, there is a certain overhead. However, this overhead is much smaller than the overhead when we start further threads on a classic CPU with multiple cores. The PPU fibers share a large part of their state and they all execute the same code.
JK: So the start up time for fibers is much smaller than for CPU threads and thus going parallel, and handing over control to the PPU’s will be much faster. As it is much faster it pays off already for much smaller parts of the code that can be paralelized with PPU’s, but are not worthwhile to be paralelized on CPU threads because there, the overhead would be too large giving the small workload.
Interested in a deeper dive into Flow Computing's PPU technology?
If you'd like to explore the technical details and insights from Professor Keller's due diligence, you can request access to the full report. Simply contact us at info@flow-computing.com, and we'll be happy to send it your way!