Scaling an Apache Kafka Based Architecture at Therapie Clinic

Streaming Audio: Apache Kafka® & Real-Time Data

Streaming Audio: Apache Kafka® & Real-Time Data

محتوای ارائه شده توسط Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka®. تمام محتوای پادکست شامل قسمت‌ها، گرافیک‌ها و توضیحات پادکست مستقیماً توسط Confluent, founded by the original creators of Apache Kafka® and Founded by the original creators of Apache Kafka® یا شریک پلتفرم پادکست آن‌ها آپلود و ارائه می‌شوند. اگر فکر می‌کنید شخصی بدون اجازه شما از اثر دارای حق نسخه‌برداری شما استفاده می‌کند، می‌توانید روندی که در اینجا شرح داده شده است را دنبال کنید.https://fa.player.fm/legal

۳ سال پیش 1:10:56

MP3•خانه قسمت

Scaling Apache Kafka® can be tricky, let alone scaling a team. When he was first hired, Domenico Fioravanti of Therapie Clinic was given the challenging task of assembling a sizable tech team from scratch, while simultaneously building a scalable and decoupled architecture from the ground up. In addition, he wanted to deliver value to the company from day one. One way that Domenico ultimately accomplished these goals was by focusing on managed solutions in order to avoid large investments in engineering know-how. Another way was to deliver quickly to production by using the existing knowledge of his team.

Domenico's biggest initial priority was to make a real-time reporting dashboard that collated data generated by third-party systems, such as call centers and front-of-house software solutions that managed bookings and transactions. (Before Domenico's arrival, all reporting had been done by aggregating data from different sources through an expensive, manual, error-prone, and slow process—which tended to result in late and incomplete insights.)

Establishing an initial stack with AWS and a BI/analytics tool only took a month and required minimal DevOps resources, but Domenico's team ended up wanting to leverage their efforts to free up third-party data for more than just the reporting/data insights use case.

So they began considering Apache Kafka® as a central repository for their data. For Kafka itself, they investigated Amazon MSK vs. Confluent, carefully weighing setup and time costs, maintenance costs, limitations, security, availability, risks, migration costs, Kafka updates frequency, observability, and errors and troubleshooting needs.

Domenico's team settled on Confluent Cloud and built the following stack:

AWS AppSync, a managed GraphQL layer to interact with and abstract third-party APIs (data sources)
AWS Lambdas for extracting data and producing to Kafka topics
Kafka topics for the raw as well as transformed data
Kafka Streams for data transformation
Kafka Redshift sink connector for loading data
AWS Redshift as the destination cloud data warehouse
Looker for business intelligence and big data analytics

This stack allowed the company's data to be consumed by multiple teams in a scalable way. Eventually, DynamoDB was added and by the end of a year, along with a scalable architecture, Domenico had successfully grown his staff to 45 members on six teams.
EPISODE LINKS

فصل ها

1. Intro (00:00:00)

2. Build an internal tech team from scratch (00:01:21)

3. Get to know Therapie Clinic (00:05:12)

4. Reduce manual data processes (00:07:50)

5. Building a data pipeline with Apache Kafka (00:10:52)

6. Kafka Connect (00:15:28)

7. Build a dedicated pipeline in Python on serverless (00:21:33)

8. From custom-built to Kafka-based data pipelines (00:29:03)

9. Cloud providers (00:33:32)

10. Managed cloud services (00:37:35)

11. Event sourcing (00:53:32)

12. It's a wrap! (01:08:25)

265 قسمت

#Tech #Tech News #News #Confluent #Event Stream Processing #Data #Event Driven Architecture #Open Source #Data In Motion #Kafka Cloud Native #Data Mesh #Data Pipeline #Serverless Kafka #Podcasting Education #Confluent, original creators of Apache Kafka® #original creators of Apache Kafka® #Apache Kafka® #Cloud IT #Real Time

Streaming Audio: Apache Kafka® & Real-Time Data