Skip to main content

Build & deploy AI-powered apps — Paige Bailey, Google DeepMind

TL;DR

  • Google DeepMind is rapidly advancing its AI model ecosystem, notably with the Gemini 3.1 series, which offers powerful multimodal capabilities across various data types like video, image, audio, text, and code.
  • AI Studio serves as a free, accessible platform for developers to experiment with and leverage these cutting-edge models, providing tools for enhanced performance, cost efficiency, and specialized tasks.
  • The platform's features, including code execution, grounding with external data, and real-time interaction through Gemini Live, empower users to build dynamic, context-aware, and multilingual AI applications.

Takeaways

  • The Gemini 3.1 models (Flash, Pro, Flashlight) provide a range of performance and cost profiles, with smaller models often proving highly effective when combined with built-in tools.
  • AI Studio (ai.dev) offers a free, web-based environment to select and configure DeepMind models, integrating tools such as Google Search, URL context, code execution, and function calling.
  • Gemini models are unique in their comprehensive multimodal capabilities, processing and generating content across video, images, audio, text, and code, unlike many text- or static-image-limited alternatives.
  • The Code Execution feature within AI Studio provides a secure, sandboxed Python environment that allows models to directly perform complex data science tasks and generate visualizations.
  • Gemini Live enables dynamic, real-time, and multimodal conversations, supporting screen/video/audio sharing, multi-language responses, and custom accents for highly interactive applications.
  • Leveraging "system instructions" is crucial for developers to explicitly control an LLM's persona, language, style, and tone, ensuring consistent and tailored AI interactions.
  • The "Build" feature in AI Studio allows users to generate and deploy full-stack AI applications from natural language prompts, including database and authentication integration.
  • Project Genie 3 is a DeepMind model capable of dynamically generating unique, explorable virtual worlds pixel by pixel from textual descriptions, offering novel interactive experiences without traditional game engines.

Vocabulary

Multimodal — AI systems capable of processing and generating information across multiple data types like text, images, audio, and video. Embedding model — An AI model that transforms diverse data inputs (text, images, audio, etc.) into a unified numerical vector space, enabling comparison and retrieval across modalities. AI Studio — A web-based development environment by Google DeepMind for experimenting with, configuring, and deploying applications using Gemini and other AI models. Code execution — A feature where an AI model is given access to a sandboxed programming environment (e.g., Python) to write and execute code, solving tasks or performing calculations. Function calling — The ability of an LLM to identify when a user's request can be fulfilled by an external tool or API and then to generate the correct function call to invoke it. Grounding — The process of providing an AI model with external, real-world information (e.g., from search results, specified URLs, or databases) to inform and constrain its responses, reducing hallucination. Tokens — The basic units of text or data that large language models process, influencing the cost of usage and the maximum context window of an interaction. Retrieval — The act of accessing specific external information, often from a knowledge base or documents, to augment an AI model's understanding and response generation. System instructions — Specific directives given to an AI model at the start of a conversation to define its persona, constraints, or desired behavior for subsequent interactions. World model — An AI model designed to generate and simulate dynamic virtual environments, often in real-time and pixel by pixel, based on user input or descriptions.

Transcript

Can everyone hear me? Excellent. Awesome. Greetings, valiant few. I'm not sure how many folks have heard, but there were some electrical issues and the rest of the building. So y'all were the ones who showed up early, which means that y'all are part of the few that get to hear the talks. So if you don't feel lucky, just wait. You know, you are definitely experiencing something special this morning. And then for anybody who wants to come back and hear more, you missed the Gen Media session a little bit earlier today. We're going to be doing a whistle stop tour of all of the presentations for DeepMind this afternoon. So you can come back and meet more of the team and kind of hear more about all of the talks and all of the technologies. I also really, really love first sessions to be interactive. So I'm going to show you some demos. This is going to be very demo heavy as opposed to slide heavy. And then if you have any questions along the way, please feel free to shout them out. It's always much more interesting if this is more of a conversation than just me showing stuff over in a weekend. So I don't think it's a secret. And also, I guess, introductions. Hi, everybody. My name is Paige. I'm one of the injolids for developer relations at Google DeepMind. I've been doing machine learning for a really long time. I started in 2009 and was contributing to some of the early days of open source scientific computing libraries, things like NumPy SciPy Scikit Learn. And then did product for a couple of years and back on the engineering ladder. And really, really love that now it's really, really hard to nitpick what's product, what's engineering, what's design, and what's Deborah. And all of the roles seem to be conflated a little bit. So I don't think it's a secret that Google has been a little bit busy over the course of the last while. Over the last month and a half, we've been releasing models so fast. I feel like everybody's got a little bit of Whiplash. Gemini 3.1 Flash Live, which we'll take a look at in a second. Gemini 3.1 Pro and Flash Lite. So respectively, our largest and smaller model that are a performant, efficient, able to do a lot of things very, very quickly and at low cost profiles. We actually just had augment code if anybody is familiar with augment code, replat their entire agent system to default to Gemini 3.1 Pro specifically for performance plus cost related reasons. NanoBanana 2 for image generation and editing are embedding model, which is supporting video and images and audio and text and code all in the same embedding space. So you can say, show me all of the content related to cats. And it will show you not just video of cats, not just images, but also things like audio of a cat purring or mowing or like books about cats, all sorts of stuff. Luria Freedy for music generation, which you saw if you were in the gym media session just a little while ago. Genie 3 for world model building, so being able to dynamically generate new worlds based on user input. Our full stack runtime for AI Studio, which includes things like databases and OAuth, Gemma 4, which is part of our open model family. We're lucky enough to have a member of the Gemma team here at AIE this week. So definitely, Ian, raise your hand. Yep. Greetings. So the Gemma, the Gemma team, if you're interested in open models, would be excellent to talk to. And then also VO 3.1 Lite for video generation at a cost profile that's pretty compelling. So lots of different stuff. Just a show of hands, how many folks have heard of all of these models before? Excellent. And the deep minders in the back row, hopefully, hopefully, that was, yep. But if you haven't heard of any of these by the end of the session, you'll know all about them. And hopefully know which ones you could use or consider for your projects. So I don't think it's a secret that Gemma Nye is kind of special in the industry. One of the reasons that it's very special is that it's multimodal both for inputs and also multimodal in terms of outputs. So it supports video images, audio, text code for inputs. But it can also output multiple modalities. It can output text and code, but also audio images. It can images and text and are leave. And most of the other models on the market are only capable of handling text and code as outputs and only things like static images as inputs. So it's pretty compelling to see what you're able to do. And VR APIs, you're also able to handle flexible kind of input formats. So you can have PDFs with embedded images. You can have different types of video, different types of audio that you can serve as tokens for inference. But again, a lot cooler to see it rather than to just have me talking about it and waxing poetic. So I am going to go ahead and go to AI Studio real quick and pull up my personal instance of AI Studio, which is, you know, I always say it. But if you see anything embarrassing, please don't judge me. So this is how many folks have used AI Studio before? Cool. Excellent. For folks who have never used it, you can access it at AI.dev or AI.studio or AIstudio.google.com. It works just with your personal Gmail account. So you can get started for free. You can select different models here off to the right. So you can see that there are different kind of pills here for the kinds of modalities that you might want to work with. So things like video generation, VO3.1, 3.1 fast and 3.1 light are all kind of in this tier section. You can also select the different Gemini models. So Gemini 3 Flash Preview, Flash Light Preview, I'm going to select that one just for the interest of time. And you can also do things like toggle on and configure many of these different tools here off to the right. So you can specify things like structured outputs, code execution, which we'll take a look at in a second, function calling. You can turn on things like grounding with Google Search. So just automatically incorporate that as a tool, grounding with Google Maps, and also even things like URL context, which gives you kind of like poor man's retrieval. You can have a list of URLs, and then incorporate that into the models context window so it can use that to ground some of its outputs. And as I'm sure all of you all know, models are kind of limited based on the data that they have as part of their pre-training and post-training mixtures. So if they're only trained on data up to a specific point, that's all of the insight that they have out of the box for those kinds of data. If you want it to be able to answer questions that happened after that date, you're going to have to give it access to tools, either through search or through retrieval in order to do that work. And again, if anybody has questions as I'm kind of rambling along, feel free to raise your hand and shout them out. This is a small enough group that it should be pretty fun. Cool. So I've turned on grounding with Google Search. You can also add media so you can connect to drive, you can upload files, you can record audio, add camera footage, link a YouTube video, link sample media, and YouTube just works via URL. So you can paste in a YouTube URL and have that be used for inference with the Gemini models. So as an example, I haven't tried this. So we'll see if it works. But we can take a look to see if we can find a dinosaur video. I love T-Rexes. So this past weekend in the Bay Area, we have this thing called Bay Area Big Wheels, which and it defaults to one frame per second. You can also specify different start and end times. So just for the interest of speed, I might specify start time is like maybe 0, 0, 0 or 0 seconds, and then maybe end time would be maybe like 300 seconds. And you can see that this ends up being around 27,600 tokens for five minutes of content. But I could say create a table with timestamps for all of the kinds of dinosaurs that come on in. You see in this video, no worries, it's all good. Make sure to include a fun fact about each dinosaur type and then hit run. But what I was saying about Bay Area Big Wheels is that there's a big bendy hill in San Francisco with a whole bunch of very whiplash sort of turns. And everybody gets a little tricycle and rides down it. And I did that this past weekend. It's an Easter Sunday tradition, but I was dressed as a dinosaur and was handing out dinosaur Easter eggs. So this is very on brand. And what's happening behind the scenes is we've turned on grounding with Google Search, we have searches a tool which can help inform some of our fun facts. We've got the video that's being pulled in. So the first five minutes worth of content, we can see the different dinosaur types. Rexie and his parents obviously have a lot of appearances in this first episode, as well as a Brackia source of a lost raptor and a tyrannodon which is a flying reptile. And I love that it's calling out the true fact that tyrannodons are terrassores, not dinosaurs. And you can also see the different citations along the way from the URLs that are informing all of these fun facts. You can also click get code to see all of the code that you would need in order to replicate the experiment that you just did in AI Studio. So it selects the appropriate model. It shows you how you would handle the URI for YouTube. And then it also gives you insight into the prompt that you can use for the video in order to do the work in Python and TypeScript and Java, whatever your favorite language might be. And if you wanted to not use a YouTube URL, if you wanted to use your own video, you would be able to pass that to the model too. It's just really, really handy to be able to pull in a YouTube URL as opposed to having to do the process of downloading it and then kind of sending it off yourself. And now I also want to watch this episode of Rexie, the Little T-Rex. This looks very cool. If you hadn't seen as well in the thinking config, you have different thinking settings for all of our Gemini 3.1 series. So minimal low, medium, and high. If you want the model to spend more tokens thinking, you can turn on high thinking. But I often just keep it on minimal or low just for time sake. For Gemini 3.1 flashlight, you get a really nice price versus sort of price performance and also speed profile for the models. So you're not having to make big trade-offs between them. So that is how you would interact with Gemini 3.1 flashlight within AI Studio for video analysis. One of the other slept upon features, I think, in our APIs as well as in AI Studio in general, is compare mode and also code execution, which we see here off to the right. So if I turn on code execution, what we do is we give Gemini a sandboxed environment with Python and a whole bunch of data science libraries pre-installed where it can kind of pull in those libraries as tools to kind of help solve arbitrary data science tasks. And since this is sort of giving the model access to it in a sandboxed environment, you don't run the risk of having any of this impact your local environment, which is quite nice. So as an example, if I select Gemini 3.1 flashlight preview, turn on code execution, go into compare mode, I might try to compare it against Gemini 3 flash preview also with code execution. And then one of the things that you can do, and we'll see if this works, is you can select a picture, so this is just some Lego bricks. What I could do is paste this in. We can make sure that it's secure and safe for the corporate overlords. This image itself is around 1000 tokens, but I could say something like draw bounding boxes around all of the green Lego bricks using Python. And then maybe display the image with bounding boxes and hit run. What should happen is that we see a head-to-head comparison of the two different models. Gemini 3.1 flashlight was able to get it right out of the gate, which is pretty wild. So the super, super tiny model worked really, really fast. So I have the Python code to pull in the image, to analyze it, and to define the bounding boxes. And then if you hover over the token consumption, the amount of dollars required to do this work is pretty wild, right? So being able to pull in an image, do this kind of analysis, you could have also asked for things like segmentation masks. You could have asked to count specific kinds of entities in the photo, again using bounding boxes or something similar. And all of this was done at well under a fraction of a penny. So strongly, strongly recommend experimenting with the smaller weight models, especially turning on these tools to help them do their work more effectively. And you can also see that Gemini 3 Flash preview got to the same answer. It just took a little while longer. And then the cost is slightly more, but still well under a penny. So that's compare mode, again using Gemini 3.1 flashlight, just with the addition of code execution along the way. For folks who might be interested in URL context, just because I know that this is something that we've heard quite a bit about from folks that are using the Gemini APIs pretty regularly. If you turn on URL context, you can do things like add URL. So I'm going to pull in a URL for a blog post about Gemma 4, which was released just recently last week after the model's training data cutoff. I'm going to pull in a blog post about Genie 3, also after the model's training data cutoff. And I could say something to the effect of compare and contrast. Genie 3 and Gemma 4 tell me how they're similar, different or completely unrelated. And they're mostly completely unrelated, but we'll see what the model thinks. Hit maybe turn on medium for the thinking level and then hit run. And what we should also see is that the model is able to give its output but site to each one of the sources that it's using in order to make its assumptions. So you can see the different sources down at the bottom, the two URLs that I had used. You can use many, many more than just two, but it sites each one of the sources in line as it's making assessments along the way. And so you can use publicly available information and then there are also tools within Vertex that allow you to do retrieval on custom documents that are internal only without necessarily having to set up a vector database for retrieval. And again, if you click get code, it gives you all of the code that you would need to replicate what you're doing in the AI Studio interface. So we've talked about the Gemma 9 3.1 series of models. You can also use 3.1 Pro preview which is our largest model series. It's also our most expensive, our kind of by virtue of being larger. It's also a little bit slower. Gemma 9 3 Flash is kind of the workhorse. It's the model that we see most commonly used in production. And then Gemma 9 3.1 Flashlight is also gaining traction especially for people who are using Gemma 9 2.0 Flash or 2.5 Flash. And then for our Gemma 9 live series, how many people have seen or heard of Gemma 9 live before? Cool. A couple of hands. Thank you, thank you DeepMinders for paying attention and also for using the models. Gemma 9 Live gives you the ability to share your screen, share a video feed, share audio with the model and have a conversation with it dynamically, including things like adding custom function calls, generating automatic function responses, and then also doing things like grounding with Google Search. So I can turn on grounding with Google Search again, just as a tool that the model can call. And then I can give the model access to my screen to help me either navigate an app to help me answer questions about what it sees and to do that in a broad variety of languages, accents, etc. So as an example, I can start screen sharing. Hey there Gemma 9, can you tell me what you see on the screen? I see a Google Search for Lego bricks and pieces. There are tons of images of colorful Lego bricks, sets and bulk collections from various websites. On the right there's a larger image featuring closeups of red, yellow, blue, and green bricks. Are you looking for something specific on this page? And do any of y'all, does anybody in the room speak a language that is not English? Yeah. Guillaume, you would not be a reliable narrator. But at least last night we tried this and one of our colleagues yelled out, yes, very confidently, a language that they do not speak. And so we do not have anybody to verify. But I saw a couple of hands on the front or what languages do you all speak? Italian. Okay, so I'm going to, are you okay with verifying if the model is like saying true things in Italian? Okay, cool, cool. Thank you for being our fact checker. Hey Gemma 9, could you tell me what you, could you tell me what you said again, just tell me an Italian and then also could you tell me what the weather is like in London today, also in Italian? Sure, so how much does your research look like? I see a lot of pictures of colored mattons, seven and blue collets. There is also a bigger picture with red, yellow, blue and green. So how much does the weather look like? London today has given a new time, with the possibility of rain. Do you need any other information? It was good, excellent, very cool. And so you can ask for different languages you can also say, Hey Gemma 9, can you tell me a poem about what you see on this green, but could you, could you tell me it using a text and accent? Well, how do they are? Let me take a gander at this here screen. I see piles of Lego bricks, a colorful commotion like a tiny town fixing the rise from a plastic ocean red and yellow, blue and green and batches big and small, ready to build a castle tall or maybe just a wall. Folks want to buy them, sell them, trade them from all over the place to put a smile I reckon on a little builder's face. Anything else you need help with, partner? That's giving me flashbacks to my hometown, but you can change all of this dynamically just by asking naturally within the flow of conversation. So you could imagine practically a scenario like perhaps you have an entryway in a bank and there's some sort of a screen, somebody comes in, starts speaking in Spanish or starts speaking in there, the language that they feel most confident in and the models able to dynamically respond and answer their questions in a language that's familiar to them. Or you could kind of specify within system instructions, but the what language dialect accent style you might want the model to adopt. So if you only want the model to respond in a specific language or a specific style or with a specific tone, strongly, strongly recommend modifying the system instructions. And same as always, if I click get code, you see all of the code that you would need to use to replicate the experiment that you did within the UI. So you can see the media resolution settings, the settings for compression and all of that kind of incorporated in naturally. I can also do things like share video feeds. So hey there Gemini, how many fingers am I holding up? They're holding up two fingers. What about now? That's a thumbs up. Yeah, cool. And so big, big, I kind of spectrum of things that you can accomplish with Gemini Live. And again, just a very, very low price point compared to other solutions that make you kind of stitch together the speech to text, LLM understanding and text to speech pipeline with all of the video content and puts an output solve by yourself. We have another feature, I always feel like whenever I'm describing AI Studio, I'm just like and also and also and also you can do all these other things. We have another feature called build, which if you've played with V0.dev or lovable feels very similar, it gives you the option to kind of create and deploy and to share a whole spectrum of apps. And now we've even added support for things like databases and authentication. So you can add a database, you can add login with Google, you can add custom API keys that are all kind of kept secure for you. And you can also of course create and edit existing apps. So you saw a little while ago some examples using music from Luria 3, which is exciting. Guillaume who created the Luria Studio app is in the back today. And is like these are all really, really fascinating to play with if you haven't had a chance to experiment with some of the generative media models. You can see some of the examples with Nano Banana 2 as well and also with MediaPipe. So as an example, if I click on this app, you can see that it's requesting camera access. This is a game that's taking in kind of the location of my hand. So I can grab, grab and kind of we can all find out that I play this game really badly. But you can sort of play the game and then also inspect all of the code that's used to create the app itself. But for the purposes of this, I'm going to just show you how you can get started with creating an app from scratch, just based on anything that you could possibly imagine. So and I'm going to use database and authentication. So we can sort of add a fire store and authorize it, sort of the Google Login with Firebase. And I will click this little speech to text microphone that we have here. So it's easier than me typing out all of the details. But as an example, create an app that allows me to upload a picture of a bookshelf. The bookshelf should have a lot of books and kind of profile views. So we can see all of the spines and maybe some information about the titles of the books, the author's names, etc. But the app should use Google search grounding to add more information. So what we should get is like the author name, the title name, a description of the book, kind of what the category of the book might be. And it should, the app should ask the user to log in with their Google login. It should save all of that information for the user to a database. And we should be able to have that persist. So it's basically like you take a picture of your bookshelf and it automatically catalogs all of your books for you. Which is a lot. That in theory would have been a startup three, four years ago. But this looks reasonably correct. So I'm going to go ahead and click built. And what's happening behind the scenes is you can see Gemini 3.1 Pro preview kicks in. It starts thinking and planning about what would be needed in order to create this app. Since it's doing a lot, standing up a database, like thinking about authentication, it's going to take a while. And while it is, we're going to be kind of going to go and show another couple of demos. So we can let the model cook in the background. And then if it needs to, if it needs me to take any actions, there will also be like a little ping. So we can, we can hear it in the background just in case, just in case along the way. So I am going to minimize this a little bit. And I'm going to pull up my, pull up my other browser window. And we're going to take a look at Project Genie. So how many people have heard of Genie before? Yep. Excellent. So all of the hands in the back row, thank you. And then also a few folks here in the audience as well. Genie 3 is DeepMind's model for generating new worlds. So you can describe a kind of a scene, describe a character, and then actively experience it with each frame generated dynamically. No physics engine behind the scenes, no unity, no unreal engine. Just each frame generated dynamically pixel by pixel. You can navigate it using the arrow keys off to the left. So the, the WAST keys within the, the Genie app. And you can also change the video perspective using the arrow keys. Do things like, click the space bar. But it's everything from this volcanic landscape where you're navigating with, with kind of a rover to things like navigating a watery landscape on a jet ski. And you can see that if you hit one of these lights, it actually responds as if there was some sort of a physics engine based on its training data and other information that it's seen along the way. It also sounds like a studio might have done something, so we'll take a look at that in just a second too. And then you can also see things like hurricanes and what it would be like to experience a hurricane in Florida, jellyfish, you know, and these thermal underwater situations just really wild and very magical sorts of experiences. Anything you could, anything you could create. So let's take a look at what AI Studio is asking me for. So it's once me to enable the Firebase database. And it looks like it's setting that up. So that seems good. Like it's on track. And I'm going to head over to Project Genie. We're going to explore and create a world if I could sign in. We're very big on security. For good reason. Then so we have the option to create an environment, to create a character. And since I am feeling homesick after hearing that like Texas twang about about the poem for Lego bricks, I'm going to say Big Bend National Park in Texas in the middle of the summer. Sun shining in the sky. But all of the rock formations are made out of Lego bricks. And the sky has a rainbow quadruple rainbow. So why not? And that I can guarantee you is not like a situation that exists in actual Texas. Ground is sandy and dusty. And then maybe the character is what would be a good idea for a character ostrich with a rocket blaster and goggles. Maybe make it pink. So pink, I don't think Texas has ever had that. So we'll see what gets created behind the scenes. Genie 3 is actually a composition of models. So it's not just one model. It's nano banana, VO, Gemini to help with prompting all kind of stitched together along with some really really interesting approaches towards distributed systems and compute. Oh my gosh, this is amazing. Like I immediately want a YouTube video about this guy. Also we see some Lego brick rock formations. So let's create this world. And then what we should be able to do is navigate through it again using the arrow keys. The arrow keys to change the visualization and the views and the WA SD keys to navigate the little dude around the world. So we've got the like a couple little options for the ostriches. Each one moving. So you can see that it also seems to have given him like very very muscular arms. Like maybe maybe it wants the ostrich to kind of be like a military grade fighter. But you can see it walking around, navigating the Lego bricks. And then if I turn around, let me see if I can find my way out of this rock formation. You can even have it investigate some of the some of the scenes. So we've got the the rainbows. If I'm remembering correctly, if you walk towards this canyon, there should be a river at the bottom. So we can try to make them jump into the canyon. But all of this is kind of captured again just dynamically by the by the genie three by the genie three model harness itself. So come on. Oh no, I didn't make it in time. But it's it's really interesting to to see some of the things that you can build. One of our colleagues, Fofa on Twitter. So F-O-F-R created a game where you're a fish and you have to escape the kitchen. And you're just like bouncing along as a fish, you know trying to trying to get out before it's dinner time. So it's really really cool to be able to see some of these things in action. Genie three is not currently available as an API just yet. But the team is you know actively thinking about a trusted tester program. And today you can access Genie three through an ultra subscription. Though the ultra subscription is only available with Genie and a select number of countries. So I strongly recommend taking a look at that. Yep question. No, no, no, you would not be able to create the 3D game meshes or pull them pull like this ostrich dude in as an asset for a game. It is just the pixels. So we have seen people couple together things like the images that are generated with nano banana and kind of use additional techniques to turn them into 3D assets. But that does require additional work. This isn't automatically creating the 3D assets for the game themselves. But it's a really really good question. There are some other companies that are taking different approaches for world model building. So Fefe Lee's company as an example at world labs is taking a different approach towards building out these environments that do incorporate more of kind of like the unity and real engine style asset generation. But I think it's longer term as all of the models seem to converge on many input modalities, many output modalities. We'll probably see all of that kind of converge as well. And so I wouldn't be surprised if in the future there would be an opportunity to have like video as an ingested thing for a model and then 3D world or like the code for it produced externally. Yeah, especially given that with Gemini today you can already kind of give it a give it an image and then say please create like an SPG of this image and it can do it pretty well which actually might be a fun demo. But what I've never tried before. So let's see if it actually works and hopefully it's not just me pretending that it does. But what you can do is if we make sure I'm still sharing my screen cool. Well, but that benchmark has gotten saturated right? So I'll take the Lego bricks, the Lego bricks photo that we had just used and say something like create an SPG of this image SPG representation of this image which is a very very simplistic prompt. I could probably get a lot better I could probably get a lot better results by asking Gemini to expand upon this prompt as opposed to me just kind of like spitballing a really really simple one. So if we don't get great results we'll ask Gemini to rewrite our prompt to to improve it. And so we can see the the thinking kick in one of my most favorite hackathon projects ever. They created they used nano banana actually to take an input image and then to show step by step how you would draw it with the different stroke marks along the way. But we can see that it's thinking through the perspective. It's defining the bricks. It's thinking about the dimensions of the bricks. It's calculating a grid defining some colors. Since I turned on the thinking level to be high for the Gemini 3.1 pro model it's doing an awful awful lot of thinking about simulating the rotations and the transformations. We can see see that happen along the way. It also sounds like AI Studio has an update for the bookshelf cataloger. So let's take a look at that while the SPG is generated. Firebase terms accepted. Let's retry to see to see what it's doing. And it looks like it was able to create some code for the TypeScript the CSS etc. I wonder if because I started using Gemini 3.1 pro in a different tab. Maybe it got a little bit tired. But we'll see. I also really love that you can experiment with the generative media models in AI Studio. So if you were here for the earlier session you saw Guillem share a lot about Luria, about our nano banana models, about VO throughput one light. And so as an example with nano banana 2, you also have the option to do things like image search grounding. So you can turn on image search and it will reverse the image search and bring back things that are tightly aligned with what you're asking for. So as an example I could add sample media for this cute little dog. Maybe sample media for... Let's see. sample media for either greetings. Welcome like the sample media for maybe this outside this very very nature friendly location. And then say something like show me the dog in the middle of the natural park with a can of Celsius which if you have never had Celsius like bless your heart like that seems like a great life. Celsius is like a notoriously disgusting or at least for my perspective it's pretty disgusting. But very popular at Hacophons caffeinated beverages that tastes that tastes a little bit like battery acid. At least to me like I'm sure it tastes delicious to many other folks. It's also very low calorie so it's a little bit like a red bull art alternative. But I've given it a picture of a dog of the picture of this natural scene. I've turned on a reverse image search so it should be able to pull in details about what a Celsius can might look like and it's thinking through the assignment. It's got my little dog in the natural scene with a can of Celsius. And you can also if you hover over the token consumption see that in comparison to the nano banana the kind of pro model or pro tier model it's much much more cost effective than previous iterations. So if you're interested in using the nano banana series nano banana 2 is a good one to get started and just as always if you click get code it gives you the code that you would need to replicate whatever you just did in the AI studio UI just using text script or Python or whatever it might be. This is true. Like the if you if you want the just as always if you change the thinking settings to be minimal or low the model will give you a response much more quickly whereas if you ask it to think it will spend a lot of time generating tokens for planning and for reasoning about the task that you've described. And so let's go back to this SPG representation. It looks like we've got a first pass so I'm going to copy I'm going to go to an SPG visualizer just an online one and then paste in that and it looks like we've got our Lego bricks they're a little bit distorted but they but they look pretty reasonable honestly. And then the as a reminder the the picture that we were trying to replicate is this one and it was able to get all of the different kinds of Lego bricks just not in the right configuration setting. So it's really really cool to see that you can kind of pull in an image and then with this was a very very simple prompt but with a much more detailed prompt you would probably be able to get a much better representation. I'm also curious like if I turn on code execution like I wonder if it would be able to have I wonder if it would be able to invoke code execution as a tool call in order to do that more effectively. So we'll see that we'll see that in a second. So it does look like it was able to it does look like it was able to pull in an appropriate library to to think through the to think through the process of generating generating SPGs or an SPG for the image. And it's even doing the segmentation. This is very cool. And for folks who came in a little bit a little bit later code execution is a tool automatically invocable via the API that will give us to deny the option to kind of create a sandboxed Python environment with a whole bunch of data science libraries pre-installed and it can invoke those as kind of sub tools within the environment. Awesome. Very very cool. So we're also still building out the bookshelf visualizer. It looks like it's creating the the firebase blueprint as well as some of the rules. And so if we go back to code we can see all of this getting generated along the way. Another thing that I strongly strongly recommend folks take a look at if you've if you have interest is our video generation series. So we have a new model called VO3.1 light that also gives you the option to create really really nice stock footage backed with audio as well as as well as basically anything that you would be using the larger to your series of VO to do just with the model itself. So as an example let's go to let's go to Gemini and ask it to help us generate a prompt. I'm just going to turn on thinking to be low and say something like create a prompt for a video generation model to generate stock footage for a vegan basketball themed food truck make sure that the food options are warriors themed which is which is a San Francisco which is San Francisco team and then hit run and then what we're going to do is we're going to take this output prompt and then put it in put it in VO3.1 light hit run you can see that the output resolution is set to 720p you have a couple of different options for output resolution not 4k which is something that that you would need to use kind of a higher tier video generation model for. You can also specify different aspect ratios so 16 by 9 or 9 by 16 if you want more of a mobile app experience and you can also sort of configure the video duration. So if you want 8 seconds versus if you want you know something a little bit more concise like 4 or 6 seconds you can pull that in as well as this is a paid tier model so you would have to attach an API key in order to use it. The handy thing or another handy thing about AI Studio is that if you expand the settings off to the left you can see there's a section called get API key and if you click get API key you can create you can create one that's acceptable for free tier use just out of the box without having oh my gosh chef curry this is amazing chef curry uh splash brothers and that does look like tofu like tofu barbacoa with kale and with avocado and with edamame like I would oh my gosh I love this yeah no kidding like as and I am absolutely going to send this to somebody I know because their dream is to start like a vegan basketball food truck as well as like a custom vegan nut butter business which I think would be a really like apparently nut butters have like a 50 to 60% margin so if any of us need like a hobby plan like maybe maybe cultivating some of these culinary hobbies is a good as a good one to take um the another thing that I want to make sure to mention we talked about it a little bit is and we have Ian from the Gemma team also available in the back um he'll he'll be coming back later this afternoon to discuss as well um but we just recently released our Gemma 4 series of models which are extremely extremely powerful um so they're they're able to punch far above their weight um in terms of the the parameter size and the compute footprint associated but they're uh but you can use them via the APIs and AI Studio as well for free um so if you if you want to be able to test out the Gemma series of models you can have this kind of try before you buy experience within AI Studio um before downloading them to your own infrastructure if we don't necessarily have a spare GPU at home um hiding out in your closet you can just kind of you can ping it through the AI Studio interface as well um and if you click um I'm gonna uh do another another prompt um and then pull in just uh pull in just an example image um the the Gemma 4 models also support multimodal understanding so they can analyze audio or video or images you can say something like um generate a brief description um of this image turn thinking level to minimal and then the the Gemma models are are pretty fast as well um so if you if you need a lighter weight model accessible via an API that you can work with for free or if you need a model that you can download use on your own infrastructure fine tune and run for free within Apache 2 license the Gemma 4 models are an incredible option for you to try um they also run on mobile devices for the smallest versions so you can have one locally downloaded to your pixel um the next series of pixels uh like pixel 10 should have Gemma already added to it um and then Chrome as a browser is also incorporating the Gemma models cool so we've seen the vegan warrior food truck we've seen genie 3 we've seen our open model family um some uh some Lego bricks and pieces um it looks like the the AI Studio app is still cooking a little bit um and uh one of the other things one of the other things that was mentioned was um one of the other things that was mentioned was the Luria model um which is also available via AI Studio um so if we go to audio uh you can see a couple of different models that are available to try via API so Luria 3 Pro preview Luria 3 clip preview um so as an example if I click on this guy um you can see the the kind of some of the the automatic templates that you can use with it so acoustic folk um 90s all rock etc but I really really love this app that GEOM built um which you can find in the gallery and can also remix to your heart's content and it incorporates uh different sound configurations um so if we preview this guy you can see an option to create your own sound so a clip um maybe electronic uh danceable um uh vegan food truck uh vegan basketball food truck um and Legos uh and then uh uh we talked about Italian what language do you speak sir in the front row yep Spanish excellent uh Spanish um uh lyrics in Spanish um and then create and we should see the the clip start synthesizing um it looks that does look pretty pretty Spanish um and we'll see we'll see what it means for electronic and oh my gosh that's amazing that is so cool um in the food truck vegan piezas del lego de todos los colores construyendo un mundo de esa buena energía this is you know like it's so well so so we've got a video for it we've got uh we've got a theme song for it like clearly this is something that we should all be like uh like our post to ASI plan is now like we're going to start a vegan food truck that's basketball themed and Legos um but this is this is our Luria 3 model um we had a session a great session about gender and media just before this um led by Guillem um so if you missed it it should be recorded and you can watch it you can watch it afterwards and we'll also be talking a little bit about it in the workshop later this afternoon but again all of the code is kind of available for the app so you can experiment with it and test it um and then we'll also take a look at the oh cool so it looks like our bookshelf cataloger is done um i'm going to go ahead and sign in with google um so it should ask me to log in with my personal Gmail account um we're gonna continue so it's signed in as me which is great um we're gonna upload a photo so i'm going to find a bookshelf um with books on it a smaller one to make it a little bit easier let's go with uh if tried so we'll see what this one looks like yep so this one has some uh this one has some like handwritten style text that i want to see if the the model will be able to pick up on and also you can't really see some of the some of the author names so i want to see if it'll be able to um sort of figure out what the what the book title is even though i can't see everything on the spines um i'm going to upload this photo that we just downloaded and it shows the latest upload it's figuring out the book details and it's adding all of them so it's it's figured out the the different kinds of books the the name of the authors um the the descriptions of the books um and then if i sign out and sign back in again it should be able to also persist yep so it persists all of the books that i had on my shelf um and then if i wanted to share it with all of y'all um uh and copy the link um uh make public so public anybody can access um if anybody wanted to uh like QR code generator yeah if anybody wanted to try out this uh this book shelf um app themselves you can access it by uh by trying out the QR code there um and going to it which is pretty wild right like it's also a one button click deploy to deploy to cloud run though i like in the interest of not burning up my quota too awful much uh like that is uh i wore a frame from doing it for this app in particular but those are those are most of the things that i that i wanted to show um so let me go back to the slides again i hate slides like i'm pretty allergic to them um we'll see uh we'll see how this works um so we've talked about Luria another thing that you can use the Gemini live model so that real-time interaction model that we were just uh that we were just playing around with is in robotics so this is a robot called pupper um it is completely uh like open source you can 3d print all of the parts it's running raspberry pie all of the software is open sourced but it's using the Gemini models behind the scenes for things like object detection and um to to be able to to respond to its environment you can also run Gemini live with the pupper um you can use it to to kind of flexibly tell the robot um what to do and the the way to orchestrate this isn't having Gemini live um control the robotic actions you would have it kind of build the plan and then invoke models that might be local on the robot in order to to do things like um pick up specific items but you can use Gemini to build the plan to to accomplish those tasks um and then also things like augmented reality Gemini live is great at giving directions at kind of responding to things that it sees at describing you know how to do math that might be on a whiteboard um and even uh you know enabling things like um real-time transcription of uh you know if somebody speaking to you in chinese being able to to transcribe just in english um what the what the person is saying um so lots of really really cool things are capable with these multimodal systems um and with that it feels like a good place to stop to ask for questions um and to also i know i'm the only thing standing in between all of us in lunch like hopefully hopefully get us all uh to the cafeteria or the the session with the food a little bit early does anybody have any questions? didn't anybody learn anything new? cool cool yep so so yes uh they're they're not not so much a codex but they're but there is uh there is a plan um to have an a i studio app which Logan has alluded to um at least a few times on twitter um so stay tuned um stay tuned uh it should be it should be interesting to see and the the team's very excited and you can also use the jiminite APIs with um all of the things that you know and love like open claw um we have a colleague gali who is very emotionally invested in his telegram plus jiminite setup um and uses it all the time to uh to invoke uh like workspace actions and uh and coupled with google search um so definitely uh especially given the free tier for the jiminite models and for some of our jiminite models um jiminite plus open claw is a is a good path forward excellent well thank you all all for coming thank you all for being early as well and then hope to see you tomorrow and later this afternoon

Feedback / ReportSpotted an issue or have an improvement idea?