Running Apache Kafka Efficiently on the Cloud ft. Adithya Chandra

Streaming Audio: Apache Kafka® & Real-Time Data

محتوای ارائه شده توسط Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

4y ago 38:35

MP3•خانه قسمت

Focused on optimizing Apache Kafka® performance with maximized efficiency, Confluent’s Product Infrastructure team has been actively exploring opportunities for scaling out Kafka clusters. They are able to run Kafka workloads with half the typical memory usage while saving infrastructure costs, which they have tested and now safely rolled out across Confluent Cloud.

After spending seven years at Amazon Web Services (AWS) working on search services and Amazon Aurora as a software engineer, Adithya Chandra decided to apply his expertise in cluster management, load balancing, elasticity, and performance of search and storage clusters to the Confluent team.

Last year, Confluent shipped Tiered Storage, which moves eligible data to remote storage from a Kafka broker. As most of the data moves to remote storage, we can upgrade to better storage volumes backed by solid-state drives (SSDs). SSDs are capable of higher throughput compared to hard disk drives (HDDs), capable of fast, random IO, yet more expensive per provisioned gigabyte. Given that SSDs are useful at random IO and can support higher throughput, Confluent started investigating whether it was possible to run Kafka with lesser RAM, which is comparatively much more expensive per gigabyte compared to SSD. Instance types in the cloud had the same CPU but half the memory was 20% cheaper.

In this episode, Adithya covers how to run Kafka more efficiently on Confluent Cloud and dives into the following:

Memory allocation on an instance running Kafka
What is a JVM heap? Why should it be sized? How much is enough? What are the downsides of a small heap?
Memory usage of Datadog, Kubernetes, and other processes, and allocating memory correctly
What is the ideal page cache size? What is a page cache used for? Are there any parameters that can be tuned? How does Kafka use the page cache?
Testing via the simulation of a variety of workloads using Trogdor
High-throughput, high-connection, and high-partition tests and their results
Available cloud hardware and finding the best fit, including choosing the number of instance types, migrating from one instance to another, and using nodepools to migrate brokers safely, one by one
What do you do when your preferred hardware is not available? Can you run hybrid Kafka clusters if the preferred instance is not widely available?
Building infrastructure that allows you to perform testing easily and that can support newer hardware faster (ARM processors, SSDs, etc.)

EPISODE LINKS

265 قسمت

#Tech #Tech News #News #Confluent #Event Stream Processing #Data #Event Driven Architecture #Open Source #Data In Motion #Kafka Cloud Native #Data Mesh #Data Pipeline #Serverless Kafka #Podcasting Education #Confluent, original creators of Apache Kafka® #original creators of Apache Kafka® #Apache Kafka® #Cloud IT #Real Time