Transcript
A (0:00)
A common challenge in data rich organizations is that critical context about the data is often hard to capture and even harder to keep up to date. As more people across the organization use data and data models get more complex, simply finding the right data set can be slow and create bottlenecks. Select Star is a data discovery and metadata platform that builds a continuously updated knowledge graph of an organization's data by analyzing both its structure and how it's actually used. It enriches data with context such as popularity, lineage and semantic models, making it easier for AI and teams to discover, trust and use the right data. These enriched metadata layers are also highly valuable for large language models, significantly improving the accuracy of generated SQL queries. Shinji Kim is the founder and CEO of SelectStar and she joined Sean Falconer to discuss solving metadata curation challenges, managing data context at scale using LLMs for SQL generation, emerging trends in metadata management, and more. This episode is hosted by Shawn Falconer. Check the Show Notes for more information on Shawn's work and where to find him.
B (1:29)
Shinji, welcome to the show.
A (1:31)
Thanks Sean. Great to be here.
B (1:33)
Yeah, I probably should have said welcome back since you've been here before, although it's been a couple years.
A (1:37)
Yeah, more than three years ago to introduce Select Star. But I am really excited to be back and software engineering daily has always been also morphing and changing a lot.
B (1:49)
Yeah, well it's been three years so why don't you catch us up? Maybe three years. Especially in the world of tech, the world of startups, and now what's increasingly becoming the world of AI is a lot of time. A lot could happen in three years. So what's happening with Select Star today? Maybe go back even to the beginning sort of. What's the story behind where you guys started and where are you today?
A (2:10)
Amazing. Sure. Yes. So much changed. I started Select Star five years ago after noticing time and time that a lot of enterprises collect, store and process data. But to try to use the data, it takes days or weeks to find the right data and actually use it properly. You have to rely on outdated documentation. Usually you need to just find somebody else, rely on tribal knowledge to understand how to use the data. I mean this is something that I saw firsthand at Akamai when I was running the product for their IoT data processing, partnering with consumer electronics and automotive enterprises building their next consumer applications. They were looking to pull a lot more telematics data and especially in enterprise perspective this was an issue and hence there are solutions like traditional enterprise data catalogs that are trying to solve this issue. At the same time, I've noticed that there was a lot more demand around this also as more companies are adopting modern data stack of cloud data warehouses and building their data lakes on the cloud with snowflake databricks. Data discovery, finding and understanding data has been a lot wider issue in organizations. So that's where SelectStar is really focused on. We provide a very Easy to use UI now. MCP Server, APIs, Chrome Extensions, Slack app, all different places where end users. So whether you are a data scientist, data analyst, software engineer or product managers, whenever you have to touch or see data or data products, you can easily access the context about that data, documentation about their data. Where did the data come from? Who else is using this inside the company? What other data assets or analysis are already attached or have been built on top of? So there's a lot of. I would say we are almost like drawing a knowledge graph for you in terms of how your data assets are connected and utilized inside the organization today. So that's the core of what we do.
