AI Developer Tools at Google with Paige Bailey - Software Engineering Daily

Summary7 min read

Podcast Summary: Software Engineering Daily - AI Developer Tools at Google with Paige Bailey

Release Date: January 9, 2025

In this engaging episode of Software Engineering Daily, host Jordymon Companies sits down with Paige Bailey, the Uber Technical Lead of the Developer Relations team at Google ML Developer Tools. They delve deep into Google's suite of machine learning (ML) and artificial intelligence (AI) developer tools, exploring their evolution, functionalities, and future prospects.

1. Introduction and Speaker Background

[00:00 - 02:07]

The episode kicks off with Jordymon Companies introducing Paige Bailey and her role at Google. Paige oversees a range of ML developer tools, including Gemini APIs, Gemma, AI Studio, Kaggle, Colab, and Jax. Her extensive experience in the ML field, starting around 2009, positions her as a seasoned expert in developing and managing AI tools for developers.

Notable Quote:

Paige Bailey [00:54]: "I am so excited to be here and really excited to have the opportunity to talk to you and also loved the questions that you are asking before we hit record. I think this is going to be a fun conversation."

2. Evolution of AI and Multimodal Models

[02:07 - 04:47]

Paige discusses the transformative journey of AI, emphasizing the shift from single-task models to multimodal models that handle text, code, video, audio, and images. She highlights how modern transformer models like GPT-3 have expanded capabilities, enabling more integrated and versatile applications.

Notable Quote:

Paige Bailey [03:09]: "We’re getting into this really brave new world of multimodal models. So not just this underpinning language backbone, but also really interesting capabilities in terms of video understanding, audio understanding and transcription image understanding kind of coupled with text and code as well."

3. Google's AI Developer Tools: Gemini API and Gemma

[04:47 - 17:25]

Paige provides an overview of Google's AI developer ecosystem. She distinguishes between Gemini APIs, which are proprietary and accessible via REST APIs, and Gemma, an open-source family of models available for customization and local deployment. Gemini APIs offer high performance and are designed for scalability, while Gemma caters to developers needing flexibility and control over their models.

Notable Quote:

Paige Bailey [17:25]: "One of the nice things about open source models is that if you're running them locally, that's kind of free, you're just using your onboard compute. You might want to customize in ways that you would not be able to with a proprietary model..."

4. Target Personas and Use Cases

[08:29 - 11:57]

The discussion moves to the different user personas for Google's AI tools:

Jax Users: Experts building large-scale models, similar to those used by DeepMind.
Gemma Users: Developers fine-tuning models for specific applications or deploying them on various platforms.
Gemini API Users: A broader audience utilizing REST APIs for easier integration without deep ML expertise.

Paige emphasizes the versatility of Gemini APIs in simplifying the ML Ops process, making advanced AI accessible to a wider range of developers.

Notable Quote:

Paige Bailey [09:06]: "The beautiful thing about the Gemini APIs is that if you can make a REST API call, then you can call the Gemini model."

5. Kaggle Workshop on Generative AI

[11:57 - 13:48]

Paige recounts a recent five-day generative AI intensive course hosted on Kaggle, Google's platform for data science competitions and learning. The course attracted approximately 150,000 students, offering hands-on experience with prompting models, retrieval embeddings, fine-tuning, and implementing evaluations. The curriculum was designed to cater to a diverse audience, from beginners to advanced researchers.

Notable Quote:

Paige Bailey [12:26]: "We recently did a five day generative AI intensive course on Kaggle... But the curriculum we designed was focused not just on the model calls, but also on all of the additional features that you need to have around the models in order to make these systems production ready."

6. AI Studio: Features and Capabilities

[23:37 - 25:00]

AI Studio is introduced as Google's interactive platform for experimenting with Gemini models. It allows users to:

Experiment with various Gemini models.
Engage in image generation.
Utilize features like function calling and code execution.
Compare different models.
Fine-tune models and generate API keys.

AI Studio aims to provide an intuitive interface, especially beneficial for junior developers navigating complex ML workflows.

Notable Quote:

Paige Bailey [23:44]: "AI Studio is a place where you can go, you can kind of experiment with the different Gemini models... all without having to kind of wrangle with the Google Cloud console."

7. Advanced Features: Code Execution and Function Calling

[26:14 - 30:17]

Paige highlights advanced functionalities within Gemini APIs:

Code Execution: Allows models to write and execute Python code to solve tasks, enhancing problem-solving capabilities.
Function Calling: Enables models to interact with specific tools or APIs, such as databases or weather services, to perform complex operations.

These features introduce a degree of agency to AI models, enabling them to perform iterative tasks and utilize external resources dynamically.

Notable Quote:

Paige Bailey [26:19]: "I really, really love code execution and function calling just because... it's a one liner change. All you have to do is say like tools equals code execution and you're off to the races."

8. Integration into Developer Tools and Devices

[30:17 - 32:17]

Google has integrated Gemini models into popular developer tools and devices:

Android Studio: Enhances code completion and generation capabilities.
Chrome Browser: Embedded within the Chrome Canary release for on-device AI processing.
Pixel Devices: Gemini Nano models run directly on the operating system, enabling on-device AI functionalities.

These integrations aim to streamline the development process, offering AI assistance directly within the tools developers use daily.

Notable Quote:

Paige Bailey [30:25]: "Gemini models have been baked into Android Studio as well for code completion as well as code generation."

9. Retrieval Techniques and Use Cases

[19:14 - 23:37]

Paige explains retrieval techniques like RAG (Retrieval-Augmented Generation), which enhance model outputs by grounding them in specific data sources. This approach improves accuracy and reduces hallucinations by sourcing information from, for example, a company's internal documents.

Use Case Example: A CTO can feed company guidelines into the retrieval system, ensuring that junior developers receive responses aligned with the company's coding styles and policies.

Notable Quote:

Paige Bailey [19:28]: "Retrieval is really kind of doing this kind of extraction from sources that might be relevant, giving that to the model and then having the model summarize those insights as outputs."

10. Future of AI Models and Agentic Properties

[30:21 - 33:51]

The conversation touches on the agentic properties of AI models, where models can perform tasks autonomously by executing code and utilizing tools. Paige envisions a future where models like Gemini can manage schedules, perform complex queries, and execute multi-step tasks with oversight to ensure compliance and accuracy.

Notable Quote:

Paige Bailey [32:12]: "I would love to say, hey Gemini, please look on my calendar and find the next best time for me to go and like do yoga... those are all things that could be done today."

11. Resources and Upcoming Features

[33:51 - 37:49]

Paige encourages listeners to explore AI Studio and participate in educational courses on Kaggle. She also highlights upcoming features like multimodal paradigms and a new video model called VEO, which allows for video description and generation. For the latest updates, she recommends following the Google Devs Twitter handle and other team members.

Notable Quote:

Paige Bailey [37:30]: "Just the takeaway for everybody should be if you haven't tried out aistudio.google.com, go explore it like test it out on your own data."

Conclusion

Paige Bailey provides a comprehensive overview of Google's AI developer tools, emphasizing their versatility, scalability, and integration capabilities. She underscores Google's commitment to making advanced AI accessible to a broad spectrum of developers, from novices to experts, through platforms like AI Studio and Gemini APIs. The episode concludes with an encouraging note for listeners to engage with these tools and participate in upcoming educational initiatives.

Final Quote:

Paige Bailey [37:30]: "If you haven't tried out aistudio.google.com, go explore it like test it out on your own data. And you know we have a very generous free tier, so I strongly, strongly encourage you to take advantage of it."

Stay Connected:

AI Studio: aistudio.google.com
Google Devs Twitter: @GoogleDevs
Kaggle Generative AI Course: Accessible through the Kaggle platform.

For more insights and updates, follow Paige Bailey and her team on various social media platforms as mentioned during the podcast.

Loading summary

Transcript70 lines

[00:00]
Jordymon Companies
Over the years, Google has released a variety of ML data, science and AI developer tools and platforms. Prominent examples include Colab, Kaggle, AI Studio, and the Gemini API. Paige Bailey is the Uber Technical Lead of the Developer Relations team at Google ML Developer Tools, working on Gemini APIs Gemma, AI Studio, Kaggle, Colab and Jax. She joins the podcast to talk about the specialized task of creating developer tools for ML and AI. This episode of Software Engineering Daily is hosted by Jordymon Companies. Check the show notes for more information on Jordi's work and where to find him.
[00:52]
Paige welcome to Software Engineering Daily.
[00:54]
Paige Bailey
Excellent. I am so excited to be here and really excited to have the opportunity to talk to you and also loved the questions that you are asking before we hit record. I think this is going to be a fun conversation.
[01:06]
Jordymon Companies
I do have a point to make at the beginning because you're one of the owners of one of the funnest social media handles you are dynamic webpage. But I do have a question about it. Apart from being fun, have you ever done any dynamic webpage design web page loading that credits you with the honor of being the owner of such a handle.
[01:26]
Paige Bailey
So I am not gifted in the web design space or the web app creation space. For that I look to all of my dear friends who are working on things like next JS and all of the JavaScript and TypeScript libraries. I will say that I did have the pleasure and the honor really of working with the VS code team for quite some time when I was at Microsoft and that's not really web design, but it is very much kind of like the JavaScript TypeScript contingent. And I love and adore creating VS code extensions just because they're super easy to create if folks haven't experimented with them previously. And they're also very very useful in the sense that you can have VS code extensions do a broad spectrum and variety of yeah, we were chatting about.
[02:08]
Jordymon Companies
The fact that I had been following you for years now and that you in my vision of the industry you've been always in this AI space. Probably we would have called it ML or any other terms in the past, but I was thinking about my own career and I've always been in developer tools and DevOps platforms, stuff like that. But I did have a short stint way back when like in 201314 if I'm not wrong in what I at the time would call and probably it's still called Langtech industry. So companies and products that are participated in the development products are machine translation but also in sentiment analysis and so forth. And I bring this up not only to point out that I'm certainly not the expert in this field, that's why you were here, but also because it feels from my experience in that and following this field a bit from afar, but now quite close, that it all stems this AI revolution that LLMs have put out there. It all feels like it stems from language, from written language and spoken language, but written language. Right. What are your thoughts on that statement?
[03:09]
Paige Bailey
I will say I'm glad to meet another kind of machine learning veteran. You know, I started building models I think around 2009, 2010. So it's been a wild and crazy ride since then. I will say that kind of the transformer models and things like GPT2 and GPT3, they originally started focused just on text and code, kind of the written word. But now we're getting into this really brave new world of multimodal models. So not just this underpinning language backbone, but also really interesting capabilities in terms of video understanding, audio understanding and transcription image understanding kind of coupled with text and code as well. You can get a lot more out of the models. Even apart and aside from text and from code understanding, which is very exciting. I'm sure you remember like back in the day, even to just get a model to be adept at doing a single task, it took months of getting the right data in order and trying to experiment with different model types and trying to do hyperparameter tuning and then even just to get the smallest percentages of improvements. And now all of these models can do relatively well for all of the tasks that we had been using single task models for out of the box.
[04:25]
Jordymon Companies
So are we experiencing a step function evolution of Word2Vec? Sort of like that technologies that powered NLP. That would be I guess the way in which I would classify that previous stage of text based AI ML. Are we experiencing just sort of like a natural evolution or does the underpinnings of what's happening with video with multimodal models have a different nature?
[04:48]
Paige Bailey
Yeah, it's a great question. I think everything kind of started with the transformer paper around 2017 and then we were building a whole bunch of models building on the concepts expressed in those papers. But one of the coolest things I think now is that people are building these AI systems that kind of couple together different model types. Like as an example, if you're using Gemini, you're using kind of a mixture of experts model that is really, really good at multimodal use cases. But if you want things like Audio as output or video as output or images as output. The model is not yet capable of that. It can generate text, but it is not going to be giving you kind of the image or video outputs that you would get from something like Imagen or veo. So when we start kind of seeing these really novel new approaches, I'm sure you've experimented with NotebookLM. Yeah, yep. Where you can, for folks who might not be familiar, I encourage you to go try it out. But notebooklm.google.com you can input a PDF or kind of a GitHub repo or anything, and it suddenly generates a podcast recording of two people discussing in great detail. You know, should point out at this.
[06:01]
Jordymon Companies
Point that this is not a notebook conversation. This is fun.
[06:06]
Paige Bailey
That would actually be very hilarious.
[06:08]
Jordymon Companies
Yes.
[06:08]
Paige Bailey
To like give the kind of transcript and to see how well the notebook LM folds.
[06:13]
Jordymon Companies
I was actually fiddling with some ideas to see if we could do something like that, maybe in the next iteration of this conversation, this interview. But this is a real one. It's happening today, the 19th of November.
[06:23]
Paige Bailey
Yeah, I love that these kind of AI systems that are increasingly multimodal systems give you the ability to create not only text encode as output, but also images and video. And it really kind of resonates with, you know, I'm thinking in particular of my cousins. They love watching videos. It's a stretch to ask them to, you know, read something a little bit more long form. So I think to really be able to engage with audiences and to help people learn and to understand and to really hit every single learning style, we're going to need to experience, experiment with different modalities of outputs, not just inputs.
[06:58]
Jordymon Companies
Yeah, correct. So you work at Google. Google, in a very Google fashion, has joined, if anything, I mean, I mean, you've mentioned the papers that revolutionize this field. These mostly, if not all, came from Google, but has sort of like joined the release of models, like in an abrupt way in the sense that it's put out so many things out there. So give us a sense of what is Google doing with AI, specifically with this new generation of AI, and what kind of products and models do you focus on?
[07:27]
Paige Bailey
Yep. So my particular role is I'm the ubertl for our ML Developer Tools, which is a new org that was created at Google just a few months ago. The products on this team are the Gemini APIs, AI Studio, Kaggle, Colab, Jax in the open source stack for Jax, and also Gemma, our open source model family. So basically everything that you can imagine from like a 3P facing ML developer perspective kind of lives in this ML developer org. The things that are top of mind for me for these tools are really kind of growing the number of students, researchers and also early stage startups that are incorporating AI into their products. I think for our enterprise customers there are a whole bunch of other great tools that exist within the Google cloud, like vertex AI product offerings, but to really be able to move quickly and to experiment with the latest models, Gemini APIs, AI Studio are the place where you should go to try that out and really the only place where you can get access to the latest Gemini models.
[08:29]
Jordymon Companies
Before we dive into the products, what is the typical Persona that you're engaging with? Because I find fascinating the fact that we're talking about ML developers like are there real people and not one, five, six, but dozens, potentially hundreds and thousands of people that are able to not only train models like the ones you mentioned in the Gemma family, but others and able to also deploy them in a fashion that software engineers and developers without any first name or surname are they able to do CI CD with those things is such a figure, such a Persona exists.
[09:07]
Paige Bailey
It's a really different way of building software, I would say, and the Personas for each of the tools would be slightly different. So as an example for jax, jax, for folks who might not be familiar, it's a machine learning framework that Google uses to build all of our models and it took off like gangbusters. I think all of the papers being produced by DeepMind over the last few years are using Jax. It feels very similar to another kind of numerical library that you might be familiar with called numpy, but it gives you the ability to build models or to build kind of physical systems and to dynamically scale them in a very straightforward way. So you can build a JAX model and with zero code changes, it can run on CPUs, GPUs, TPUs and any arbitrary hardware backend as a result of it, using this thing called xla, which is a machine learning compiler that was originally created for Google to be able to interact or to deploy models very efficiently on TPUs. So Jax, like when I think about the canonical JAX user, my brain is just like, oh my God. People who are building large language models or multimodal models or who are doing like highly complex dynamic physical modeling, like that is the group and that cohort is quite small. Like the number of people who are building models from scratch with JAX is quite small. Then when I think of the Gemma audience, the Gemma audience is slightly different, right? Like Gemma is a model that's already been created. You can either fine tune it or you could do continued pre training on it, but you're probably using a high level Python API to do that. You could also just take the model checkpoints and deploy them on mobile devices or deploy them in browsers. And the user groups for both of those aspects are a little bit different, right? Like the people who might be fine tuning Gemma, perhaps they're, you know, wanting to create evals, or perhaps they're wanting to do some sort of research on it, perhaps they want to use it as part of their product. But that's different cohorts than maybe from the building models from scratch Jax humans. But the beautiful thing about the Gemini APIs is that if you can make a REST API call, then you can call the Gemini model. And it's the same with OpenAI. With Anthropic, we just recently released OpenAI library compatibility. So if people have already been preferring the OpenAI models, it's just a three line code change to get the Gemini models being used instead. But the mlops process in all honesty feels a lot simpler than it did when you know you were having to worry about data versioning, model versioning, et cetera. If you're just making a REST API call, you do have to worry about which model you're calling. You have to worry about the format for your prompts. But there is a whole bunch other like machine learning maintenance work that's just taken out of the equation. So it actually simplifies the DevOps process in a number of ways, as opposed to building your own models from scratch, deploying them and maintaining them.
[11:58]
Jordymon Companies
So I presume those three Personas, even the first cohort that you mentioned, they're probably very acquainted with low level programming, despite that sort of like target architecture agnosticity of Jax that you mentioned. They must have this small cohort, must have that knowledge. But have all of these three cohorts being present in the recent Kaggle workshop that you actually come from finalizing right now. Give us a sense of what's happened there and how people can know about future upcoming if there's going to be another edition of that.
[12:27]
Paige Bailey
Yeah, thank you for the question. We recently did a five day generative AI intensive course on Kaggle, which is a platform at Google that originally was for competitions but is now more of like a model hosting dataset, hosting learning platform. I think we weren't expecting so many folks to be interested in learning about the programming, but we ended up having, I think around 150,000 students register, and everybody was kind of forking the notebooks, running them, asking great questions on discord. The content was really around prompting models, retrieval embeddings, fine tuning models, and then also implementing evals and sort of these mlops behaviors. And we had students that were really the spectrum from just getting started with, like the Gemini APIs to just getting started with Gemma. So lots of variation in terms of skill sets and backgrounds. But I think everybody, from what I can see, really enjoyed it. And I especially loved that the curriculum we designed was focused not just on the model calls, but also on all of the additional features that you need to have around the models in order to make these systems production ready, like setting up retrieval or prompt management, or really designing strong evals. All of these things are very important to get the right outcomes from the models.
[13:48]
Jordymon Companies
Let's actually focus on that. Let's double click on that. So this is Gemma exclusively related, right?
[13:54]
Paige Bailey
No Gemma and Gemini. But the course was predominantly focused on the Gemini APIs.
[13:59]
Jordymon Companies
Okay, so what about those? Can you give us a broad overview of what the APIs are capable of?
[14:06]
Paige Bailey
Yep. So the Gemini APIs are kind of the recommended way to interface with our Gemini models. They support video, audio, text, code, et cetera. So all of those modalities that I was just describing as inputs, the Gemini 1.5 Pro model has on the order of a 2 million token context window, which means that you can send to the model a whole bunch of information right at inference time. That means that you can analyze full videos, multiple code bases simultaneously, all of the above, all at once, and be able to get sense out of it without having to go through the process of standing up a vector database or fine tuning. Gemini 1.5 Flash is our smaller version of Gemini. It has a 1 million token context window, which is still a lot, but it's also much, much faster and much, much cheaper than most other models out on the market. I think it's seven and a half cents per million tokens. And we also have a Gemini 1.58 B version, which is around 2 ish cents per million tokens, which means that you can record, as an example, everything that you're doing on your laptop screen 365 days a year, you know, 24 hours a day. And it would still cost less than like a cup of fancy coffee to analyze all of the videos and to be able to make sense of all of the things that you're doing. So Google has really invested a lot in making sure that our models are performant, efficient, but still very capable and also not really breaking price points for anyone. If you look at ArtificialAnalysis AI, the Gemini 1.5 Flash and 1.5 Pro models are always kind of the most cost effective frontier models on the board.
[15:47]
Jordymon Companies
Indeed, yeah, very affordable. Where does Data Gemma fall into this picture that you're describing?
[15:54]
Paige Bailey
Yep. So our Gemini APIs, they're all proprietary models, which means that we haven't released the source code or the data used to train or the checkpoints or anything of that nature. They're just available via these rest APIs. Gemma is a family of open source models that we've kind of released all the things for. So you can look at the code on GitHub, you can kind of download them from Hugging Face, you can experiment with them, you can fine tune them. Our latest version of Gemma is Gemma 2, which comes in a variety of sizes. So 2 billion parameters, 9 billion parameters, and I believe 27 billion parameters. The smaller models are small enough that you can embed them within a browser. So embed them within Chrome or embed them on a mobile device like a Pixel and they give you the ability to do a lot of interesting kind of text only large language model work. So you can generate code, you can generate text, and then you can also fine tune these Gemma models to do a broad spectrum of things. So like Data, as an example you mentioned Data Gemma, there's also Polygemma which helps with multimodal understanding so you can understand images. There's Shield Gemma for security use cases, and I think the last time I looked there were tens of thousands of Gemma fine tuned variants on Hugging Face. So lots and lots of people kind of stretching them, fine tuning them, kind of making them great for specific use cases.
[17:17]
Jordymon Companies
So what is the rationale behind releasing GEM as open source, Gemini as closed source? What is Google's stance on this rationale?
[17:26]
Paige Bailey
Well, I obviously can't speak for Google, but from my perspective, I think it's really nice to have both options to be able to call to a performance kind of proprietary model, send your data to a server, and then for other use cases you might have different constraints, like you might be under different cost constraints. One of the nice things about open source models is that if you're running them locally, that's kind of free, you're just using your onboard compute. You might want to customize in ways that you would not be able to with a proprietary model, or you might be operating in an area where perhaps you don't have WI fi connectivity, in which case Having an open source model that's on board for your mobile device or for your laptop is kind of mission critical. You can't be sending your data elsewhere. There are some companies that also have data privacy constraints and so they don't want to be sending their data off site, which means that rest APIs are kind of out of the question. And so having a version of Gemini that's not a mixture of experts approach, but is a much lighter weight, kind of very efficient model that's also open source so people can kind of tweak it, customize it to their delight, is really powerful.
[18:39]
Bitwarden Sponsor
Are your software deployments secure by design? Lately, secure by design and shifting left principles have been hot topics in the software industry, pushing development teams to make security a foundational part of software development. Today's sponsor, Bitwarden supports developers in securing every phase of the development lifecycle with end to end encrypted credential management. This ensures software is built on secure principles to prevent data leaks and unauthorized access. Try Bitwarden Secrets Manager, built specifically for developers to safeguard infrastructure and machine secrets, or Bit Warden Password Manager for everyday logins and other sensitive information. Start a free trial today@bitwarden.com of the.
[19:15]
Jordymon Companies
Techniques that are more popular these days, like rag. Can you explain the differences between them? Like rag rig I believe, or Wrike, I'm not sure how you pronounce that. Which ones are the most popular and what are the use cases, why people use them for?
[19:29]
Paige Bailey
So I think for folks who might not have experience with these different approaches towards retrieval, just think of them as kind of ways that you can get better performance out of your model's outputs and then also ground your model's outputs in data sources, which helps mitigate hallucinations and helps with accuracy of the model outputs as well. So as an example for retrieval, you might want to One example that I hear quite often from customers is I would really like to ground the model's outputs based on my own company's internal data. So if somebody asks a question about, you know, HR benefits, or they ask a question about a specific club that is just internal to the company, I want to be able to source the outputs to not just use information that it might have learned from the Internet somewhere, but to have it extract insights from my company's data sources and use those to guide the outputs. This is nothing really new. I think internal corporate search is something that everybody has been interested in for quite some time. But the retrieval phase is really kind of doing this kind of extraction from sources that might be relevant, giving that to the model and then having the model summarize those insights as outputs. If you haven't experimented with. There are a couple of approaches for this that have been kind of baked in wholesale for the Gemini APIs out of the box. One is grounding with Google Search. So you can turn on grounding with Google Search. And if you ask the model a question, it will first kind of use the top 10 or however many results from Google and use those to kind of summarize and ground its answers, which gives you a higher confidence in the accuracy of the outputs. And then there's another feature that's only available through Vertex where you can say, I want the model's responses to be grounded in the data that I have located in this particular GCS bucket. And so you could say like, hey, here's a pointer to all of my company's data. Hey, model. If you're going to be giving outputs like, use these data sources to help with your summarization and then have pointers back to those sources. So just think of retrieval as a way to figure out what information to stick in the context window to help the model with more accurate summarization and its outputs.
[22:01]
Jordymon Companies
Would this technique work for the following use case? I'm a cto, I'm a senior developer hiring junior developers, and I want them to be constrained by, influenced by, and hopefully learn the company guidelines. So can I feed those assets into this retrieval technique and therefore allow for any junior developer to be able to be provided with answers that are fine tuned to, again, the coding style of the company, the policies that need to be followed, etc. Would it work in the same way?
[22:35]
Paige Bailey
I think you could attempt it with retrieval, but dependent on how many guidelines you have at your company and also your stylistic guidelines for code bases. It might be worthwhile to first experiment with just putting that information into the context window. As an example with Gemini, I had mentioned before that you can have 1 million tokens, 2 million tokens, just kind of given to the model. If you do that with a repo and you say, hey, here's my company's code base. Now please generate outputs aligned with, you know, the conventions in this code base as well as any style guide or any kind of like guidelines that you might have, Gemini should be able to do that out of the box. And then oftentimes, if you have stylistic constraints, if you do just kind of add that as a preamble in your prompt. You know, like if you're giving me code recommendations or if you're if you're doing completions in this way, make sure to follow these stylistic conventions. Usually the model pays pretty close attention without even needing to set up something like retrieval or fine tuning.
[23:38]
Jordymon Companies
What about aistudio? I haven't used it, but what I get from the name is that is this a playground where I can use all of this?
[23:44]
Paige Bailey
Yeah, absolutely. So aistudio.google.com is. And every time I mention it, I feel like I need to open up like a browser tab and start showing things. But it's aistudio.google.com is a place where you can go, you can kind of experiment with the different Gemini models. So the Gemini 1.5 pro family that I had mentioned before, as well as Flash and some of our newer model versions, you can also experiment with image generation within AI Studio. You can turn on features like function calling if you want to do tools use. You can turn on code execution, search grounding. You can compare models against each other. You can also fine tune models, you can generate API keys and kind of track usage over time. And all without having to kind of wrangle with the, with the Google Cloud console, which I think can sometimes be quite overwhelming for junior developers.
[24:38]
Jordymon Companies
So then I presume AI Studio is open to both all the cohorts that we've mentioned before, right? So those that have extreme expertise already in fine tuning actually developing models themselves, maybe to those new people that are just starting. Right. Have you seen the most junior people start getting acquainted with AI Studio? What is the main use case they go about resolving for?
[25:01]
Paige Bailey
Well, it's pretty much everything, right? Like, given that there are connectors to drive, that you can upload files that you can kind of record yourself speaking or videos, you can basically just use it for any kind of model question that you might have. Like one. One example that I always like to show is like upload a video and then ask for, you know, extracting out all the logos along with the timestamps where the logos are occurring, transcribing all of the audio from the video, identifying all of the different speakers in the video, describing or summarizing the events from the video with timestamps, dividing it into chapters, identifying any like, electronic equipment. Like all of these things are just things that you can ask in natural language within the context of AI Studio. All of these. It kind of makes me laugh because I'm sure you remember in the before times there were all of these dedicated single task models that were sometimes available as things like cognitive services or like other specialized video intelligence APIs. Now pretty much all of those you can just use Gemini for, and it's just a prompt as opposed to trying to figure out which API you should be calling and doing API key management for all of them.
[26:15]
Jordymon Companies
Of the latest features of the APIs, which ones are your favorite and why?
[26:19]
Paige Bailey
I really, really love code execution and function calling just because so code execution for folks who might not be familiar, it gives you the ability to say, gemini, I'm going to ask you a question or I'm going to ask you to help me with a task and you have the ability to write and execute arbitrary Python code in order to solve it. So it's setting up a sandboxed environment with the Python standard library as well as a few other additional libraries and then giving the model the ability to write and execute code for you. And if it gets it wrong the first time, it will just keep going and going until it gets the correct answer. And this is just available out of the box. It's a one liner change. All you have to do is say like tools equals codeexecution and to turn it on and you're off to the races. Function calling likewise. Very cool. It gives you the ability to identify tools that the model can call. So it might be like, hey Gemini, you have access to this database so you can write SQL code against the database. You have access to this weather API, you have access to this model that can do satellite image segmentation. So you could use that as a tool. And then you can ask highly complex questions and get Gemini to select which tools it needs to use in order to answer the as well as to execute any arbitrary code for those tools. So it's giving the model a lot of flexibility, otherwise it would not have.
[27:50]
Jordymon Companies
So this feels that it's going into the fascinating feel of sort of like agentic properties. But before we dive into in the code execution example that you just gave, how would the model know that it's achieved the right answer? Should the test be provided in the.
[28:08]
Paige Bailey
Prompt or you don't have to provide the tests. I think for most code execution it's just looking for a specific output. Going to break the rules for podcast folks. So I will describe what I'm showing on my screen. I've just pulled up AI Studio and I'm selecting our smallest Flash models. So Gemini Flash 8B. I'm turning on code execution, which is just a little toggle button that you can share and first I'm going to show what it looks like without code execution turned on and then I'LL show what it looks like with code execution, but you can ask questions like, please give me the dates of every single Monday in the year 2026. So if I hit run, the model will give me kind of a really troubling response. It'll say, unfortunately, I can't provide a complete list of every Monday because this would require a calendar program or something similar. If I turn on code execution and then rerun that same prompt, the model recognizes that it needs to write Python code and then it runs it for me until it gets the correct response. So you can see here that the first iteration of Python code it ran, it didn't get the correct response and then it just kept going. It said, I saw there was a bug in the previous code and then it was able to get the correct response.
[29:32]
Jordymon Companies
Fascinating. Just for the record, we do have a YouTube channel, so this might be actually uploaded there. So for those of you intrigued about the interface of AI Studio, you'll find it there otherwise in the URL that Paige mentioned. And it's quite intuitive. This.
[29:45]
Paige Bailey
I mean, excellent, obvious, amazing. And there's also like one other thing that I adore about AI Studio is that after you do all of these really interesting explorations in the ui, if you hit this get code button, it gives you the exact code that you would need in order to rerun the experiment that you just did. And for tools use or for code execution, it's a one liner that just says tools equals code execution. To be able to give Gemini the ability to write and to debug code over and over again, what supported, that's.
[30:18]
Jordymon Companies
What was my next question. So go Kotlin.
[30:20]
Paige Bailey
Yeah, yeah.
[30:21]
Jordymon Companies
Didn't you announce something about Android Studio very recently?
[30:25]
Paige Bailey
Yes. So Gemini models have been baked into Android Studio as well for code completion as well as code generation. So if you want to be able to use AI assistance within Android Studio or Colab or some of our other coding IDEs that already exists and is powered by Gemini.
[30:43]
Jordymon Companies
But yeah, we saw on screen a minute ago. But for those that are not watching so curl Python, there are plenty of languages with us more obviously there's a myriad of them in the world, but a wide range of supported languages at the minute.
[30:56]
Paige Bailey
Yeah.
[30:56]
Jordymon Companies
Following on the field on the questions of agenticness agency, rather, how do you feel? What's your personal view? I'm not. I know, I'm asking now about the future. I don't want to get you to talk about the roadmap or, you know, stuff that is not shareable, but where do you see these things going like models being able to act by themselves? This is a very broad way of describing it, but yes. How do you see that?
[31:23]
Paige Bailey
Well, I think we're already getting into this world where the most interesting use cases for models, at least from my perspective, are these kinds of write, run, execute, code, do it in a while, loop until it works sort of scenarios. Though I think that in order to help people have confidence on these use cases, there has to be transparency over every action that the model is taking as well as kind of overseer like yes, go ahead stage for folks if there are any changes to the system that are going to be made, I will say one of the. You know, I had mentioned before that we're baking models into the Chrome browser. So if you want to try out Gemini Nano within the Chrome Canary release, that's that's available for you to test today. The Gemini Nano is also embedded within within Pixel devices. But what that means is that within.
[32:12]
Jordymon Companies
The Pixel device itself and the hardware, or rather in Chrome running in a.
[32:17]
Paige Bailey
Pixel device or both, it's not embedded within the hardware, but the model is baked into the operating system, so it's running on device. But if you do have these models that are running on board, then suddenly you can start imagining really interesting step by step behaviors that the models might be able to make on your behalf. As an example, I would love to be able to say, hey Gemini, please look on my calendar and find the next best time for me to go and like do yoga or something and have it be able to both look at the calendar for my yoga class, look at my work calendar and then try to figure it out for me and schedule time. Those are all things that could be done today. In theory, it just takes someone kind of setting up those step by step calls to do with the model.
[33:08]
Jordymon Companies
And I wonder from a compliance perspective if the model eventually will, after performing the tasks and hopefully correctly will be able to deliver a sort of like a chain of thought proof of what the process has been so that again, someone verifying not only the tests eventually, but also that the process has been logical or compliant. Right. That would be probably something that an enterprise user would be thinking of.
[33:34]
Paige Bailey
Or before the model takes action it could say like hey Paige, it looks like you, you have some free time around, you know, like next Friday at 12. Do you want me to go ahead and book it? And in which case I could say yes or no.
[33:48]
Jordymon Companies
What else has you really excited about what's coming up?
[33:51]
Paige Bailey
Also very excited about these multimodal paradigms. You know, NotebookLM was really enchanting for a number of reasons, but I think partially because not everybody is a text learner. I love to read, I adore it. I probably read too much, but many people prefer video content or they enjoy listening to books as opposed to reading them. So giving folks the flexibility of being able to learn in the way that is most effective for them I think is really exciting. And then also just from the perspective of, you know, I could write books all day or like tell stories to my nieces and nephews, but I was never able to. I am not gifted in drawing, so having the. It's a very challenging skill to learn. And so being able to generate videos or images is also pretty magical. We have a video model that is getting released through API as well called veo, which gives you the ability to both describe a video and have it displayed in six second chunks or to seed the first frame of the video with just like a static image. And that's been really cool to see.
[35:03]
Jordymon Companies
Where else can everyone find you? Where else can actually people know about the releases that pertain, Gemini, the APIs, Google AI Studio and all the things that we've been talking about?
[35:16]
Paige Bailey
Yeah, so I strongly recommend, as I mentioned I'm dynamic webpage pretty much everywhere but I strongly recommend following our Google Devs Twitter handle that should get you insight into the all of the latest new features that are coming for Google developers. And then I would also recommend following some of the other folks on the team, Logan Kilpatrick, Chris Perry who's the PM for CoLab, and Matt Veloso who's kind of the lead from Microsoft for the for the full team. So Google Developers Twitter handle my assumption is that is that everybody is going to have similar properties around threads or bluesky or LinkedIn. But I will send the Google Devs Twitter link via chat right now so you'll have handy access to it.
[36:02]
Jordymon Companies
Is there any point in doing the Kaggle calls that we've talked about and that I'll include a link in the show notes?
[36:08]
Paige Bailey
Yep, I definitely think so. So the Kaggle course, it's a five day generative AI intensive course. Around one hour a day is the expected time for the coursework and then a follow up hour to listen to the live stream. But each day includes kind of Colab notebooks so you can walk through code examples. It includes podcasts summarizing a whole bunch of white papers or just the white papers themselves if you would prefer to read and includes a live stream and then there's also a lot of great discord discussion about the course itself. So if you're interested in generative AI, both function calling, building agents like prompting models, doing retrieval, interacting with embeddings, writing evals like this should be a really great course crash course for you to try.
[36:50]
Jordymon Companies
Yeah, the curriculum looks fantastic and I really look forward to actually glancing over it and being able to understand. At least I'd be happy with 30% of it.
[36:59]
Paige Bailey
Awesome.
[37:00]
Jordymon Companies
Because I should point out that I'm not a developer.
[37:03]
Paige Bailey
I think everybody can be a developer these days with these generative AI tools, to be honest.
[37:07]
Jordymon Companies
That's true. That's very true. And actually these tools have helped me understand code bases that c code bases that are way, way beyond my understanding. And I'm really happy that it's opened the gates of my understanding to, in this particular case, arcane and low level programming languages and code bases. Anything else that we didn't touch upon that you would like to mention before we conclude?
[37:30]
Paige Bailey
Nope. Just the takeaway for everybody should be if you haven't tried out aistudio.google.com, go explore it like test it out on your own data. And you know we have a very generous free tier, so I strongly, strongly encourage you to take advantage of it.
[37:46]
Jordymon Companies
Well, thanks so much Paige, and take care. Have a splendid rest of your week.
[37:50]
Paige Bailey
Excellent. You too.