
Hosted by Hubert Dulay · EN

In this podcast Ralph and I interview a former colleague of mine, Kai, who has extensive experience in the data streaming and real-time events space. Kai highlights the top five trends for data streaming with Kafka and Flink, including data sharing, data contracts for governance, serverless stream processing, multi-cloud adoption, and the use of generative AI in real-time contexts. We discuss the role of generative AI in providing accurate answers and the importance of real-time data integration for contextual recommendations, using the example of travel and flight cancellations. We also delve into the role of Flink as a stream processor in ensuring the accuracy and freshness of data for semantic searches and generative AI applications.We also delve into the idea of streaming databases and whether the market is ready to embrace them. We discuss the need for data contracts and data governance to understand the flow of data through systems, as well as the responsibility of the data engineering team in creating embeddings. We also discuss integrating large language models with other applications using technologies like Kafka and provide examples of how generative AI can be integrated into existing business processes. The interview touches on the concept of a "lake house" and the separation of compute and storage for real-time analytics. The guest also highlights Confluent's approach to building Kafka in a cloud-native way and their focus on the streaming side, while emphasizing the need for accessible stream processing solutions for ordinary database users. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

In this podcast, we interview Arjun Narayan, Frank McSherry, and Nikhil Benesch from Materialize. Ralph and I are writing a book on streaming databases and seeking expert insights from Materialize on topics rarely discussed in the field. We begin by exploring the distinction between operational and analytical workloads, highlighting the importance of real-time or near-real-time results for operational tasks. We further delve into the significance of consistency in operational workloads and the challenges of using eventually consistent systems. The guests caution against relying on eventually consistent stores and databases, stressing the value of consistency in certain domains like payments.We focus on the concept of time in differential data flow, explaining how revisions provide a better understanding of time in this context. Consistency is highlighted as crucial in temporal joins, especially for mathematical operations and data enrichment. Overall, we emphasize the importance of real-time workloads, consistency, and integration in operational systems.SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

In continuing the Filipinos in Tech series, in this episode, I interview Marlo Carrillo and Ron Guerrero currently at Databricks but previously from Cloudera. We reflect on the significance of the Balikbayan box, symbolizing resilience and the importance of remembering their roots. We share personal and emotional stories of their own families' journeys to America, the struggles they faced, and the sacrifices made for a better life. We also discuss the challenges of growing up Filipino in different communities, feeling different, and trying to find connections. We highlight how Filipinos assimilate into new cultures while holding onto their heritage, and how language can be a marker of identity and assimilation. The episode explores the immigrant experience and the complexities of belonging to multiple worlds.In addition to discussing our immigrant experiences, we focus on the impact of technology on the Filipino community. We speculate that more Filipinos will join the technology field in the future, including their family members. We discuss the preference for social and personal interactions that Filipinos may have, which could potentially explain the underrepresentation of Filipinos in the tech industry. We express gratitude towards America and its opportunities while acknowledging the unique charm of the Philippines. We also talk about retirement plans and the possibility of returning to the Philippines, with some expressing a desire to visit rather than permanently relocate.SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

The founders of Deep Haven created the company to monetize technology from their previous company and diversify their capabilities in the capital markets. They found a gap in the market for a data system that met their needs, so they developed DeepHaven to provide a live data stack that integrates with Kafka and other data sources. Deephaven Community Core is an open-source project that is a real-time, time-series, column-oriented analytics engine with relational database features. Queries can seamlessly operate upon both historical and real-time data. Deephaven includes an intuitive user experience and visualization tools. It can ingest data from a variety of sources, apply computation and analysis algorithms to that data, and build rich queries, dashboards, and representations with the results.SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

Another episode of “Filipinos in Tech.” This time I interview Al Domingo, senior director of solutions engineering, Americas Strategic at Confluent (and a long time friend). Al and I share a love of real-time data and music (guitars specifically) but we also share pride in our heritage as Filipinos.We also reflect on our experiences as Filipinos in the tech industry. We discuss the cultural expectations placed on Filipinos to pursue careers in healthcare and the challenges of being one of the few Filipinos in his workplace. Al also shares his fascination with open-source technology and his time at companies like Confluent and Cloudera. This episode highlights the importance of pursuing one's passions and the impact of cultural influences on career choices. Our time at Cloudera emphasizes the supportive learning environment and the unique perspective of working for an open-source company. We discuss the strong Filipino community we encountered at Cloudera, showcasing the impact the company had on our careers and personal connections. The episode concludes with a plan to further explore the experiences of younger Filipinos in the tech industry and encourage more recognition and representation in the field. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

This is the first of a series of podcasts about Filipino Americans in the tech industry, with our guest Keith Oliver Rull sharing his immigration experience and career journey. We discuss the stress and struggles faced by Filipino immigrants, their hard work and sacrifices, and the lack of Filipino representation in the tech industry. The conversation touches upon the impact of colonial mentality on Filipino culture. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

In this podcast interview, I discuss federated systems with Peter Corliss, the Director of Product Marketing for StarTree. Peter will be presenting at a meetup next Tuesday:Peter explains the emergence of federated systems from the evolution of web development and the need to define the backend workings of front-end websites. They also explore the definitions of terms like stack, platform, and cluster in today's environment. The conversation highlights the shift from traditional stacks to clusters of systems and discusses the distinction between federated systems and federated data. They also delve into the challenges and limitations of federated systems and databases, emphasizing the trade-offs between moving the data or the processing. They touch on the concept of federated learning in AI and ML and the importance of optimizing data for queries. They conclude by discussing the need for new language and grammar to describe these complex architectures and the importance of collaboration between data sciences and data engineering teams.In the second part of the podcast, the conversation focuses on the interoperability and limitations of cloud computing systems, specifically AWS, Google Cloud, and Azure. The guest notes that while efforts have been made to make these systems interoperable, users still have to choose between different ecosystems offered by providers. They then shift to the importance of replication in data systems and the concept of a data divide. They emphasize the need to choose the best database or system for each specific aspect of an application architecture. They also discuss the potential for a stack to span across cloud regions and continents, allowing for global consistency and the ability to query data from different locations. Finally, they discuss Apache Pino, describing it as a complex system that can act as a cluster of clusters. They highlight its ability to assimilate more components and scale out, as well as its powerful tools for organizing and storing data. They conclude by discussing the expectation of clusters of clusters in modern systems.SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

In this podcast interview, the founders of Aklivity, John and Leonid, discuss their journey from working on WebSocket at Kaazing to starting Aklivity. Aklivity aims to support event-driven architectures, particularly those based on Kafka. They also highlight the lessons they learned at Kaazing, emphasizing the importance of meeting clients where they are and using familiar tools and APIs.We delve into the features and capabilities of Zilla, an open-source project developed by Aklivity. Zilla acts as a proxy for Apache Kafka and supports both source and sink APIs. It allows for data extraction from various sources, placing it into an asynchronous system, and exposing it as an external API. The integration with Kafka ensures reliable event-driven architectures, while Zilla’s Kafka cache provides advanced features such as indexing, filtering, and message sharding.SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

In this podcast interview, Sai, the CEO and co-founder of PeerDB, discusses his background and motivations for creating the company. He noticed that customers using existing ETL tools for data movement with Postgres often faced issues and ended up building in-house solutions. This inspired him to start PeerDB, a data movement tool optimized for Postgres. The initial use case for PeerDB is real-time streaming of data from Postgres to data warehouses, queues, and storage engines.Sai explains the benefits of this feature, including minimal lag and the ability to easily stream data across different namespaces, topics, and subscriptions. They explore the differences between real-time CDC replication and streaming query replication features in PeerDB. SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe

In this podcast, Ralph and I interview Micah Wylde the founder and creator of Arroyo, a stream processing platform. Micah talks about the challenges of current stream processing tools being too difficult for end-users. These challenges motivated him to create Arroyo to make stream processing accessible to everyone. Micah also delves into the importance of the Dataflow paper by Google, explaining how Arroyo focuses on timely data processing by using watermarks to handle potentially delayed and out-of-order data. It’s how Arroyo differentiates itself from other stream processing solutions.SUP! Hubert’s Substack is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Get full access to SUP! Hubert’s Substack at hubertdulay.substack.com/subscribe