2025 - M. Łukasik - Monolith to Microserv. at 20M+ requests per second: Latency and Scale Challenges

youtube.com 12 godzin temu


This presentation explores the unique challenges of migrating a monolithic application to microservices under extreme scale and strict latency requirements. As a leading ad re-targeting company operating in over 70 countries, RTB House processes Exabytes of data and handles over 20 million requests per second. We will detail our journey in deconstructing a core application, extracting its memory-intensive functionalities into independent, gRPC-accessible microservices while maintaining millisecond-level response times.

We will share our key technical solutions for the critical hurdles we faced:

Scalable Load Balancing: We moved from a cumbersome, 300+ instance centralized load balancer to a nimble client-side gRPC model using DNS-based service discovery, resulting in significant resource optimization.
Mitigating Microservice Overhead: With our business logic being exceptionally lightweight, the cumulative overhead from the microservice binding layer—spanning TCP/HTTP handling, serialization, and thread pool dispatch—became a primary bottleneck. We engineered a custom batching mechanism that was so effective it enabled us to reduce the number of required microservices instances globally from around 900 to 300.
Controlling Tail Latency: To eliminate high tail latencies caused by 'stop-the-world' garbage collection pauses, we transitioned from G1 to the Z Garbage Collector (ZGC), specifically designed for low-latency workloads.
Ensuring High Resiliency: The introduction of batching increased the blast radius of failures. We adopted the hedging technique with a throttling policy, available in the Java gRPC client, to guarantee robust system operation.
This session offers a comprehensive look at the architectural decisions and technical solutions essential for migrating high-throughput monoliths, focusing on principles applicable to any demanding, high-performance system.