GeeCON Prague 2023: Vojtěch Juránek - Feeding the ML model with data from the database

youtube.com 1 tydzień temu


In today's fast-paced business environment, organizations are constantly seeking ways to derive actionable insights from their data as rapidly as possible. ML models are capable of doing that. However, implementing a complete ML pipeline can be rather challenging, especially if you want to process recently arrived data immediately or you have a legacy strategy which is hard to connect with your modern infrastructure . Change Data Capture (CDC) has emerged as a technology for delivering real-time data changes from various sources, especially from the databases. In this talk we will introduce Debezium (https://debezium.io) a leading open origin framework for CDC. We will discuss how it can be leveraged for ingesting data from the various databases into ML frameworks like TensorFlow and what the pitfalls are if you go this route. Attendees will gain a deeper knowing of how Debezium CDC works, how it can aid them to ingest data from the origin database into the ML framework in real time and besides what are the possible challenges with this approach