OpenAI Codex Masterclass — Vaibhav Srivastav & Katia Gil Guzman

Hi everyone, thank you for being here. So today we're going to talk about Codex. My name is Katja, Katja Gupluzman and I'm with VB. We are both working on the developer experience team at OpenAI, based in London. And so our role is really to help developers build and get the most out of our products, including Codex. And so today we're going to start with a quick Codex overview, just so we know how many of you here are using Codex. Can you raise your hands? Yeah. Okay, cool. So we're not going to stay too long on that overview part. And then we're going to do some demos, so we're going to show you plugins and automations. VB is going to talk about sub agents and then about the bleeding age. So hopefully for those who already know Codex and use it, you'll learn something that you didn't know about. And then we'll have some time at the end for Q and A. So feel free to ask anything. Also this is a workshop format, so if you have like a pressing question, feel free to ask. And I see you all have your laptops, so also feel free to kind of follow along with us. We're going to show you how to do some things and you can like try it at the same time. And during the Q and A, that's also like the perfect time as well to try things on your site. Okay, so to start just for those who don't know Codex or even if you know it, maybe you don't know it that well. Codex is our open AI software engineering agent. So it's not just a coding agent, it's not just an agent that writes code, it can do much more than that. It can run commands, it can run tests, it can explore code bases, it can really do everything that a software engineer would do. And so it's based on our models as a foundation. So for example, GPT 5.3 Codex was our previous ones. We also have the Spark version which is like the super fast model that we have. The state of the art model right now is GPT 5.4. And we also have a mini version that came out last week. And every time we make improvements, every time we have better models, Codex benefits from it. But it's not just the models. On top of that foundation, we have what we call a unified agent harness that will manage evaluates the agent's behavior. And that is a wrapper for tool execution, for environment setup, for everything that can let the agent do its work and run smoothly. There's also safety, safety embedded in that harness. So all of that is Codex. And then you can interact with it through different surfaces. So you have the Codex app that we're going to talk about in a few minutes. You can also interact with it through your IDEs, with the extension, you can interact with it through the CLI. And also through other surfaces like Slack, for example, at OpenAI, we all the time, just like ping Codex in Slack and ask it to fix things or in GitHub as well. And on top of all of that, you can also integrate it with your preferred tools so that it can really work with everything that you're already using. So you can integrate it with Figma, with linear, with Notion. All of that combined can let you really do everything, can let Codex do everything that a software engineer colleague would do. And so as I mentioned, this is based on our models. And so I'm going to let VB tell you a little bit more about that. All right. Good morning, everyone. So as we've been talking about the Codex app, the IDE, extension, the CLI and so on and so forth, all of these harnesses, as well as all of these surfaces, would not be nearly as good without the models powering them. And just to sort of take a step back, back when I joined OpenAI, which is not really as far back along, was in December, our leading model at that time was GPD 5.2. From there, we went on to release GPD 5.2 Codex, which was a specialized codex variant of GPD 5.2, where we sort of pushed how far you can take the model and run it on long running tasks, how far you can let it just continue to chug along. And then shortly after, we followed up with GPD 5.3 Codex. Shortly after that, in partnership with Cerebrus, we followed up with GPD 5.3 Codex Park. And most recently, we released GPD 5.4. And you can already see how we're sort of pushing this whole model and harness flywheel as fast as possible, trying to bring the next frontier as fast as possible to you all. Something to note is, and something that's not on the screen, at the same time, we also, whilst we were pushing for larger models, which are really good for long running tasks, as well as really complex tasks and so on and so forth, we also released GPD 5.4 Mini and GPD 5.4 Nano, which you can use for short running tasks and subagents, which we'll talk about in a bit. And something that we haven't really emphasized on this over here is two things. One that, as we sort of pushed on making these models better, we also worked quite a bit on making sure that these models can be served to you as fast as possible. What that means in principle is we introduced something called Web Sockets, which allow us to sort of create a connection between your device as well as where the API resides to be able to give you roughly about 1.75x faster tokens without really like paying the cost over to you. At the same time, we also released a fast mode, which allows you to on top of the 1.75x get 2x more faster tokens. And this is something which the team is continuously sort of hammering on. There's lots more speed improvements coming over there. And so to bring this all together, at the start of this year, we brought together the codex app. How many of you have used the codex app? All right, that's a fair good chunk of people. To be honest, back in December, and even before, I was a hardcore CLI user. And at some point, during the app launch, while Sub was better testing it, doc-fooding it, the codex app became really really important part of my workflow. And the reason for that is it brings together a really nice way to work across projects, number one. And number two, within a same project, work on multiple features at the same time. The way you can do that is you can have individual projects, you can see on the left side, you can have the codex project, chat, GPD, Sora, and so on and so forth. But also within those, you can use work trees to work on individual feature requests, bug fixes, or just like Q&A, all at the same time, without really interfering with individual tasks. This is something which we're quite proud of. And providing a native work tree support helps you do the same task and do multiple tasks at the same time, without really having to context switch as much. At the same time, through the launch, we've been trying to increase the net benefit you can get out of the codex app. And some of these features have been like having a better automation support. Automations is also something we're going to talk about just in a bit. But the short summary of automations is that you can essentially have a rough process that you want codex to run. Let's say every day at 9AM, or let's say every evening, or let's say you want codex to look through your calendar and create a briefing for you. And that all is possible all within the native codex app setup with automations. And then of course with the work trees and more native get support, you can work across projects and just be able to push changes that you want with whatever get persona you want to do it with. Last but not the least, based on which surface you use the codex app on, at the start of the year, we released a just on macOS. But now we have native windows support, which comes along with native windows sandbox. Is there anyone here who's using windows today? I'm sharing for you, man. And so for the one gentleman over there, we have native sandbox support. In windows, we're the first of kind. There is no other competing harness which supports like native sandbox for windows. And so I've been talking on and on about the codex app, been talking about all the models that we've been shipping, but what's new in terms of all the features that we've shipped? This is in, I think if I'm not wrong, in the sort of descending order. So most recently we launched plugins. Plugins is a way that you can bring together skills, MCPs as well as prompts. And any other thing really together in one bundle and allow the model to do more nuanced matching while sits building. We also released recently mini models, which tie in quite well with sub agents, which allow you to parallelize a particular feature, or bug requests, or Q&A, whatever it may be, at a faster rate, all wilds making sure that you don't pay as much cost for your particular models. And then we have like bunch of other stuff, which we're going to talk about as we go through some of this is how codex is so good at code review, how codex is really good at security, and so on. All of this, while we talk about all of this, I want to sort of emphasize on this fact that we're at OpenAI quite lucky that the community has really embraced codex. In fact, just last night we crossed the milestone of crossing 3 million weekly active users. And this is a pretty big deal for us. And we want to continue supporting the developer community, the enterprises, startups building on codex. So throughout this session, if you have any questions, please feel free to throw it at both Kaki and myself, or even afterwards, or just ping us. With this, I'll pass it over to Kakiya. Thank you. And yeah, the 3 million weekly active users thing is really cool to see, and it's crazy to think that it's also more than tripled since January. So just in a few months, we've seen huge adoption, and yeah, and it's really, really cool to see. Okay, so plugins. Plugins, I don't know if you've heard about it, it's quite new on codex, the native support for plugins. The idea of plugins, I'm going to show you what it looks like in practice and how you can use them, is that they bundle a bunch of things together. So like skills, apps, integrations, MCP servers, and they bundle that into reusable workflows. And so what skills, apps, and MCP servers are, again, I'm going to show you, but just to introduce that a little bit, so skills are essentially reusable instructions packaged for specific processes. So if you have something that you're doing quite a bit, you can actually create a skill for it so that codex knows about it, you can give it instructions, you can give it scripts, as well, resources, and all of that will save you from just repeating yourself over and over. So every time you have like a sort of neat workflow that is always the same, you can package that into skill, you can actually ask codex to create the skill for you as well. And then apps are connections to other services. So, you know, again, we'll see a quick demo, but the tools that you use every day, like notion, linear, all of that, you can let codex connect to it. And MCP servers, you might be familiar with this already, but they basically expose tools for codex to just extend its capabilities further, and it's tools from external systems. And so all of these three things are already very useful on their own, and what plugins do is that they bundle that so that you don't have to, you know, set up everything manually, you don't have to install multiple skills, you don't have to connect multiple apps, you don't have to connect multiple MCP servers, you can just add a plugin. And another thing I wanted to talk about in the codex app, in that we'll show a quick demo for, is automations. Personally, this is like one of my favorite things to do with codex, because you can set up automations that run in the background, so like a corn job, and you can connect apps, you know, you can use plugins there too, and just set it to run on a scheduled time. So for example, you know, you can set an automation for to run every day at a certain time, and it's just an instruction that codex will run in the background. And the last thing I wanted to show you with the demo right after is specific skills for web app and game development, because we've, we've heard a lot about developers who want to use codex to build these things, to build apps, games, and every time, you know, they kind of repeat themselves, every time they kind of use the same skills, so we actually package that into specific plugins, and there's two skills that I want to highlight that are super useful, and honestly there are game changer when you're developing something visual. It's playwright interactive, and so for this, you don't know playwright is essentially like a headless browser, like a sandbox browser that you can, that codex can just run, and use that to see what it's doing, so you can open your app in the browser, and with the interactive version, you can actually click things, and you know, just navigate to your app, and take screenshots, and see and analyze those screenshots, and then image gen is a great way to just generate visual assets for your apps and games. So enough talking, I'll show you a demo. I'm going to start by actually running this one, because this one is pretty long, when I ran it yesterday, it took like an hour to build, so I also have the final version, but I wanted to show you this prompt, how codex is going through it, and so what I'm doing here is I'm using the Games Studio plugin, which is again a bundle of a bunch of skills that are helpful for game development, and I'm asking you to use ImageGen to create visual assets, so sprites for the games, and using playwright interactive to also debug the game and make sure that it works well. So we're going to let that run, and then we're going to talk about plugins for a little bit. So let me switch to another project here, so this developer's website one. Okay, so this one is the repo for our developer's website, which is here, sorry, I'm going to put that in full screen, and so on our developer's website, we have this page with all of the codex meetups we have, so there's a lot, and all of that is actually in our repo, like in our codebase, in YAML files, and so what I'm going to do is I actually added this Google Drive plugin here. You know, we have a lot of feature plugin built by us that you can choose from, you can also of course add your own plugins, but I connected this Google Drive plugin that lets codex access my Google Drive, and so what I did is that I prepared this this spreadsheet called codex events with the event name date and city, and I'm going to ask codex to just update this sheet with the current codex meetups listed in the codebase. So I'm going to start this, again, it's going to take a little while, and so let's check, you know, okay, for the game tasks it's still running, I'm going to show you when it's doing a little some more interesting things. But the last thing that I mentioned is automations, and so automations is again something that you can just set up using apps, you can just ask codex anything, but instead of it being interactive, like you're actually using the codex app, you can set it up to run it background. So for example, so much of missions that I set up that are honestly helping me a ton in my day to day lives is one for Slack messages, so I connected codex to Slack and I'm asking, hey, codex, can you check every day at 9 a.m. the messages that I should reply to and flag if it's time sensitive or waiting for an urgent response? Can you also do a summary of all the things that have happened since yesterday on Slack? And I'm asking that to bucket it through bucket per topic, and then important information to be aware of, so we have like important channels where company information generally the things that you can that leaks in like one day, but so important company information is in there, and so I just want to make sure that I don't miss anything here, so that's the kind of stuff that I ask codex to just summarize for me. Another one that is pretty cool is the is connecting Gmail, and same thing like I receive honestly an ungodly amount of emails per day, and so I'm just asking codex to check if there are emails that I should actually reply to, and to check, you know, if it's time sensitive or if it looks legit or not, because I do get a lot of requests that I'm not necessarily something that I would that I would reply to, but this is like saving me hours per day. And so the way you can create automations is you can create it from here, or you can also just say something like hey codex, can you create an automation that will look at Slack and look for anything that mentions codex use cases, and then list all of the important use cases that I should put on our website. So I'm going to let codex think about this for a second. I should have used Spark, and it's going to come up with this, you know, it's going to create the automation for me basically, and I didn't specify when I wanted to run it, but I can actually like, oh, interesting, it's doing something different, because this is a lifetime or so obviously it wouldn't, okay, normally it will, it should like do a little pop-up, so I can just like click on D, oh, it's doing it, perfect. It was just very chatted this morning. Okay, interesting. Okay, so please create the automation. So this, it should show a little pop-up if everything goes well, but if not, you can still like create it manually. Let's just see if it is doing it. Okay, I don't know what's going on, but okay, let's just do it manually. So it will, you can also create it from here, and basically all you have to do is just call the plugins you want to use, you know, like use Slack, and then choose, you know, the frequency where the automation should run, which project is true running, et cetera. Okay, so let's check on our other tasks. This one is still running. Okay, generated some pretty cool sprites. We'll look at this after. And let's check on our task to update the spreadsheet. So here, could X took two minutes to actually analyze the code base. It found the source for all of the codex events where we have our YAML files, and then it wrote the 57 event rows, so we have 57 events currently listed on the website, and so let's check, let's see our spreadsheet, and yeah, we can see that it was updated. Nice. So this is something, you know, this is a simple example, but every time you have something that's very, you know, time consuming, and anything that has anything to do with data, data review, for example, you can actually ask codex to do it for you. It has access to everything on your code base, and you can also feed it other inputs, you know, like other CSV files, and then you can just ask codex to do that type of work for you. Okay, now last thing, let's check on our game. So as you can see, codex is actually using image gen to generate, I'm going to zoom out a little bit, so oh, nice. So it's generating all the sprites, all the game assets that I asked it to do, and this looks pretty nice. It's also, so it's going to take a while. What I'm going to do is I'm actually going to show you final results, but as you can see, like codex is just reading, sorry, it's just generating all of these assets, and then it's going to use the playwright skill to see how that looks like in the app. So unfortunately, we don't have an hour to wait for this final results. So let me just show you the one that it did yesterday. So this is untouched, like I haven't touched it. It's literally just codex who built this, and all of that, like I had, I gave zero input, I was just like do a platformer game with platforms made of bricks, that's it. And yeah, it generated everything. So granted the overall UI is not like, you know, I would probably iterate on that, but I think the platformer itself is pretty cool. And what is really cool here is that literally like all the sprites like here, you know, I'm just like moving all around, and you know, that's at least like five different sprites of the little character, and I didn't have to do any of that. You can also, you know, do a custom game with your face as input and have image-gentes like create a 2D version of you. So that's a way that you can like leverage the image-gents skill, the playwright, interactive skills, and that game studio plugin, and just to show you what's inside, like we have also the same thing for web apps, but it's a bundle of like all of these skills together. So yeah, that's it for me. I'm going to pass it back to VB. Thank you. Thank you, Katya. All right. Perfect. So just to do like a very quick checkpoint and like recap on what we've spoken so far. So we went through like all the models that power the codex ecosystem, then we went through all the surfaces. You can consume codex from, and then we went through plugins, how to use them, and what are some of the plugins that you can use. You can also create your own plugins using plugin creator, and then we went through to speak about automations, and image and and and so on and so forth. Now something to note is like as we continue sort of delegating more and more work on these agents, it could be any of your favorite agents, codex or not. One thing that you want to be sure of is whatever it is that your agent produces, is of the utmost quality, which means that as we as we start sort of working on multiple features at the same time, multiple processes at the same time, it's going to be quite likely that it's impossible for you to go and look through each and every line of code, which means that at least for the first pass, you want to have a way which you can rely on to review your code. And this is where code review sort of comes in. It's by no means am I bragging about this, but in my own biased way, codex code review is one of the best in the industry right now. This is something which people on Twitter and LinkedIn on our own sort of platforms, Discord and so on and so forth, keep raving about that how is codex code review so good. So I wanted to spend like a quick hot minute on what it does. So first of all, it is available on the surfaces that you work at, which means number one, you are able to use codex code review on GitHub. So you can connect your chat GPD account with GitHub and for each and every pull request that you create, you can set it up such that codex can automatically review each and every pull request. And it would typically give you some sort of a, what's this called, a call out, like this on the pull request itself saying that, hey, this is something that is missing, hey, maybe P0 fix this, P1 fix that, P2, this is something that would be good to have and so on and so forth. At the same time, you can use a slash review on the codex CLI or the codex app and codex will spin up large sort of review process and so on and so forth. And very recently last week with my colleague Dom, we shipped a CloudTour plugin for codex, which allows you to essentially invoke codex within your codex sessions to be able to get the same sort of state of the art code review but in your code sessions. So something to sort of see here is, let's say that I am working on a project like this. By the way, this is my actual working setup at work. This is like all which I work on. Everything that you see here is like all of these threads, all of these projects is something which I work on day to day. So if you see something which you shouldn't just close your eyes. And so typically what I would do is I would go through, you know, like a feature request or I would go through some sort of ask from someone and let's say over here, I ask what I do a bunch of things. So I'm just going to ask it to review its changes. And so then you get an option to, you know, either choose from a base branch, if you have multiple branches in the Git repo, you can choose it against a feature branch, against an e-val branch, whatever it may be, and so on and so forth. In this case, I'm just going to ask it to review uncommitted changes. And what it does is if you see what it does is it spins off a totally new thread. And what that thread would do is it would essentially spin up a totally new codex process, which has like our own, you know, review system prompt and it would continue sort of looking through not just the diff or like the list of all the changes, but it would also contextualize it with everything that is there on the on the modern repo itself, right? And so a lot of the times, codex code review will like find out changes which would have second order effects, which is not limited to just the, you know, diff or whatever changes you've made, but also to some other like modules which you haven't even touched in the pull request itself, or in the changes itself. And this is so effective that 100% of pull requests across all open air repos made by all employees, including Greg, are reviewed by codex code review by default. And that's when, you know, that's the first pass that you take. Cool. And so as you can see over here, codex worked for a minute and it came up with these, with these sort of, you know, updates like P1, you know, localize whatever revenue detail, P2, translate this to this and so on and so forth. And what you can do like after this is, like essentially ask codex to either like take a pass at fixing this or like open another sort of PR on the on whichever branch you're at and then sort of go on from there. Now we get to subagents, which is something which I'm personally quite excited about. So first and foremost, what is subagents? Subagents is the, is essentially the ability wherein you can spin off a master task into decomposable parallel and independent tasks, which you can hand off to agents which can, which can allow these agents to sort of work independently and then at the end of their run, get back to you and, you know, give your response. And overhead, like sky is literally the limit, like you can spin up as many agents as you want. Of course, as long as you're API key or you're, you know, whatever chargeability pro plus go subscription you're on can can can can take. You can do a lot of like interesting things with subagents. For example, what I'm doing on the screenshot on the left is I have a codex agent repo which we're going to look at in a sec. It's not public yet, but I hope that we'll be able to make it public very soon, which has a lot of personas for subagents that you can use. So it's kind of meta. It's essentially subagent personas like doc reviewers or, you know, test case creator or test case runner and so on and so forth. And what I every now and then we would change the change the spec. This is from before we wanted to change the spec off how how subagents work. So what I wanted it to do is to go through all of these 40, 50 different subagent personas, review them and and make sure that they're up to spec. And of course doing it without subagents would have meant that codex would open each and every file and then review it and then give me a summary and continue doing it for like 50 different subagents. In this case, it essentially created reviews slices, which means it created say, you know, these are the two files that, you know, subagent poly or subagent Plato should, you know, essentially review. And then they would spin up a new codex environment. They would review those and then at the end, codex will collate all of these and, you know, give me back a response. So let's give this a shot. So the repo in question is this. It's just the codex agents repo which has a bunch of personas. You can see that we have quite a few sort of personas over here. We've got like an accessibility reviewer, architect and so on and so forth. And this is like actually something which you can create yourself and we're going to touch on that in just a in just a minute is you can you can define your own custom subagents, right? But think of this repo as like a collection of these subagents. And this is typically what you would have for each and every subagent. You would have a name, you would have a description, you would have a different sort of like, you know, sandbox mode, whether you want it to be right only, whether you want it to be read only. And then you would have some sort of like, you know, instructions and so on. And so now what I'm going to do is I'm going to ask codex to, I'm going to go over to my codex agents. I'm going to switch to, let's do medium over here. Let's close this. Can I make this full screen? All right. So let's give it a task. spin up 20 subagents to review all the subagents. So this is a very simple task. All the masking codex has to do the same task which I was showing before. Wherein I wanted to review all the different subagents. Persona is in this repo and you can see that, you know, there's, it already figured out that there's like agents and skills and it's looking into it. There are 45 curated persona files and what it's going to do is it's going to create 20 reviewers and it's going to give them all of those terminal files and then it's going to review those. And you can see that there's two things which are quite interesting over here. Number one, codex automatically decided that this is potentially a complex task. So it automatically kickstarted the plan mode which is what's active over here. So you can see that it essentially came up with five tasks to solve this particular problem. You can explicitly invoke plan mode as well. But in this case, it decided to do it on its own. It's then partitioning all of these persona files and then it's going to spawn 20 subagents very soon. I swear it's faster. But so now what it's doing is it's, oh, so for some reason on my particular setup, I have a cap on six like six concurrent agent threads that can be run at the same time. We can fix that. But to go back up, what we can see over here is that it actually spin up six agents which is my limit for now. And I can see all of those agents working over here. I can quickly see what Jason, the agent over here is doing or Hume and so on and so forth. And you can see that something to note here is that the main codex model over here, the main codex model over here, essentially created a persona. And not just that, it doubled down and it gave the exact files that this particular subagent should review. And additionally, it also gave it some insight on there's Reaper guidance in Reaper.MD, in contributing.MD, in skills and so on and so forth. And it will continue going down this route for all the different subagents. And what it does towards the end stage is that it will tear down all of these subagents when they have gone through, when they have gone through their whole process of looking through all the tunnel files and so on. And if I go back to my main thread, you can see that two of the agents are still working. But eventually, like it would collate all of this feedback that it has gotten from all of these individual subagents and proceed. Now, you can think of this. This is like a very simple sort of explorer use case, right? But you can think of this from, for example, a cybersecurity perspective wherein you have a gate commit or you have a particular gate repo and you want codex to spin up and run multiple vulnerability. You wanted to create multiple sort of vulnerability analysis from different points of use or from different hypotheses and you wanted to sort of tackle the same diff or the same GitHub repo and try and come up with like a vulnerability map, right? And this is something we actually use quite a bit, or I personally use quite a bit when I'm brainstorming a particular feature. I would just spin up multiple codex subagents to sort of look through how I would approach a problem, right? So let's say I want to add a feature, I would ask Codex to create a plan for what are, say, five or six or 10 different ways that a model, that a particular feature could be implemented. And then I would quickly double down on, like an ask codex to then create multiple subagents. To get me some sort of understanding for these tasks. Sorry, my watch was constantly vibrating. And so that's like a quick high level overview of how subagents work. By default, we ship three subagents, three subagents personas. Let me open. So by default, we ship three personas, one is like a default general purpose, fallback agent. Another is a worker, which is sort of execution focus. So this is something that you would use for, you know, when you want Codex to write a particular feature request or work on a particular feature, then there's Explorer, which is the same one which we used before. And then for each of these, you can double down and create your own codex subagent personas, like we saw before, and we will create one right now. Something to note is that these particular subagents, they, like for each of these, you can define what model you want to use. You can define what reasoning effort do you want to use. You can define what sandbox mode do you want to use and so on and so forth. The reason why this is important is for a review agent, you would almost always 100% want to use the review agent and read only mode. You would never want your review agent to execute anything. For same reason, for like a cybersecurity vulnerability assignment, you would want your subagent to always be in read only mode. But for like a docs writer, or for something which like, you know, creates docs for a particular feature that you've created or a bug report and so on, you do want to give it right access so that it can execute stuff and also create a bug report for it as well. Something to note is that you can also double down and give these subagents, you know, more capabilities by giving them MCP access so you can just give, you know, let's say you can give a subagent MCP access to centuries so that it can look through all of your reports over there or like one subagent access to your linear, you know, backlog so that it can interact with linear, it can read through all the all the issues added to you, triage them and so on and so forth. You can also give them skills. So really you can, if you really want to, you can quite heavily customize this entire setup for your own use case. So let's open our codex app again. You can see that it went through all of these subagents, it created a bunch of other subagents just to go through all of these and it came up with these findings. It's like based on readme based on contributing, performance, investigator is overprivileged, P1 has a sandbox, sorry, verify has a sandbox mismatch, same for writer and so on and so forth. And so you can see that this is already quite useful and it saves you quite a bit bit of time to be able to go through all of these individually or sequentially and so on and so forth. Now let's go back and see a bit more about custom subagents. So as I mentioned that we ship three subagent personas but at the same time you can create your own custom subagents. In fact, we do recommend creating your own subagents or just ask your codex to look through your past sessions and create subagents for you. Both of these scenarios work and work quite well. So in this particular case you can see that we have a PR explorer subagent which reads your codebase, uses gpd5.3 codex park which is our research preview model text only deployed on Cerebras and is blazingly fast is quite fit for this particular use case and we set sandbox to read only so we don't want the model to sort of execute and we give it certain instructions. In this case we say stay in the exploration mode, trace the execution path, you know, don't propose any fixes and just like you know search through and figure out like what exactly do you want us to do. Now let's try and create a subagent. So let's say we want to do docs researcher. In this case what I typically do is to just go and ask hey codex can you create this subagent for me? Here's here's its persona and then let's see and so what codex is going to do because codex is aware about how it works and you know what it's supposed to do and where it's supposed to place all of these things. What it's going to do is it's going to create a Toml file for this docs reviewer and in this particular case this uses the doccmscp server which we created from the DX theme which packages all the API references all the docs all the guides all the you know toolkits and so on and so forth and it will add that as an mcp server so that every time we ask it ask it a question about hey like what's the best way to use gpd5.4 with web sockets or what's the best way to use gpd real time with I don't know pick your favorite way of using gpd real time and can you create a react plugin for this and so on and so forth it would be able to reference all of these things. So I'm going to let it do its thing and in the meantime head back over to the slides and so just to go back so once again what you can do just to sort of invoke a particular sub agent is you can say hey can you review a sub agent and review based on the so in this case you can you can essentially like use the same particular sub agent leverage it again and then ask it to do the particular task that you want to do. Now what are some like interesting ways that you can use this is imagine like you have like a long build process or you have a test process you can have a sub agent which can run your test case locally you can have a sub agent which can always make sure to I'm being told that I don't have as much time. You can have a sub agent which can pull the latest from from GitHub as soon as you do a pull you can have a sub agent which can you know quickly pull all of the context from a linear issue and so on and so forth. So really like you can you can you can do this for you can leverage this for a lot of things and the best thing that I like to do is to just ask or dexter look through my past sessions and recommend me certain automation certain sub agents and so on and so forth that I can use. Cool so now we're at the at the bleeding edge this is bunch of stuff which we have shipped in the past and we haven't really made as much of a slash about. So what we're going to do is we're just going to quickly go around and see like what each and every one of these do and how you can leverage them. So first and foremost is guard gene approvals. This is an experimental feature. You can activate it today by just going on slash slash experimental so it'd be something like code x hopefully it works and then you can look at experimental and you can in my case I already use guard gene approvals and you can activate it this way. What guard gene approval does is all of us including myself at some point were guilty of using your logo all the time which means that you by default give unfettered access to your coding agent to do literally whatever the hell it wants and this by all means and measure is not safe. Hence we came up with something called guard gene approval which for each and every time code x needs a privilege needs to run a privileged task. Let's say it is can I remove this particular directory can I run a server can I expose a particular file to the internet whenever all of these things sort of pop up what could x will do is it will spin up a new sub agent right which will based on a particular prompt try and verify whether or not this is something which needs my human interruption or not and in most cases it doesn't need you know human interruption so it will just say hey go on run this particular you know privileged tool or privileged task and so on and so forth and this way what we what we hope to do is we hope to reduce the human fatigue that comes by just you know always sort of having to approve you know do this task do this run this particular basket or run this and so on and so forth. In principle how would that look is trying to see if there was okay it doesn't show show it to me right now but if I just in the interest of time I'm going to ask hey can you run the dev server and I'm going to instead of full access to which for some reason again I'm not able to click on let's let's try and see if it if it invokes gargina rules whilst this this works I'm going to head over to the next step which is hooks hooks is also something which is experimental right now we're we're trying 24-7 to try and make this a better experience currently codex supports three hooks one is after each tool use one is at the start of a session and third is at this when you stop a session what hooks allow you to do is it allows you to programmatically ask codex to do a thing x based on a particular event so let's say that when you start your your codex session you want codex to pull the latest from your GitHub repo so in that in that particular case you would want to set up a start hook if you want codex to do something after each tool use let's say for a lot of researchers who want to document each and every tool use they might have like a per tool use hook wherein they document what codex has done per session and so on and so forth so you can do that with that and last one of the least something which I personally use is the stop hook which is when I'm running long running tasks I would at the end of each turn of codex I would ask it to keep going so that like it just continues you know continuously keeps running a particular task and in in theory how this would look like is is is is sorry one second why I was really prepared for having more time I have to say but in theory how this would look like is is that you have some sort of a Python script and you have you define like a hooks dot JSON so in this particular case you can see over here that you have a pre tool use you have some sort of a you know mature you say like on startup or resume run this particular session dot session start dot p y and so on and you can define how you want to in this particular case so what I did for for example for the sales dashboard example that I've been showing you so far is I created a hook for stop which runs this Python script which is keep going dot p y which is every time it encounters the stop hook it would just ask codex to keep going do one more pass run one one solid validating command tighten one more thing and then stop and give the result and so for really long running tasks you can just set it up and like ask it to continue doing its own thing last one of the least we have personality changes which means that you can go on codex and you can ask it to quickly look at personalization you can set up different personalities you can set up a more friendly personality or a pragmatic personality based on whatever you want to do you can also add custom instructions so you can ask it to always cite whatever it is it is doing and so on right and then last two things is we released something called codex security this is our state of the art model which allows you to find and fix vulnerabilities in in your GitHub projects and you know essentially what it does is it would go through commit by commit and it would create a vulnerability patch and it would use codex to then sort of patch to set changes as well lastly as I mentioned before we released a Claude Code plugin which allows you to use codex in in Claude Code this is something which was surprisingly used quite a bit by the community and this is something which allows you to sort of ask codex to review whatever it is that you've done so far run an adversarial review or just like ask codex to rescue whatever changes you've done so far as well that's it thank you so much for joining us and feel free to ask any questions that you might have hi so we don't have a lot of time for q&a and for specifically we should have started maybe a little bit earlier but happy to take maybe a couple questions in the room and then we'll stay here anyway so if you have questions and you don't have any words to be you can come to us yeah yeah so what you typically do is like all of the sessions within codex are put in dot sessions within a particular within the same dot codex folder and codex has the ability to just like scan through all your sessions and then you know do things yeah you can use it you can use codex app you can use codex CLI anything you just have to ask it to look through the sessions and yeah do whatever you want thank you thanks there's another oh okay maybe a couple more yeah in the back here hi hi is there a way to hand off a task to a Claude agent so let's say I'm here working on a task and I'm have to close my laptop so yes yes yes definitely we didn't touch on that but actually you can do that from the codex app directly like maybe you can you can show your screen but you can either work locally and as you mentioned you can do it like we support get work trees as well but you can also just select Claude here and you can select the number of times this task should run like in parallelize we call that like best of n so you can like run it four times in the Claude and then just pick the best output so that's something that is like built in in the the codex app in the id extension and you can also like access it directly from the the web interface and there's more cool stuff coming on that very soon more what there's more cool stuff coming on that very very soon I think there was one right here yeah thank you so much my question was actually about the Claude UI as well because today sub agents aren't supported if I'm not wrong and especially the thing that bothers me is it doesn't use the skills that are in the repo is that coming soon or so there's like at the risk of you know talking about the whole roadmap we definitely have a lot more changes coming up on that particular front I'm not sure if skills within Claude is going to be as soon as I say that it's going to be but it's definitely at the top of the mind and we do want to sort of add you know give you the ability to sort of like have your own trusted mcp servers to be able to run there or CLI's and so on and also the ability to just like have ssr agents that you can just ponoff a particular task to on a VM and so on so lots of work on that like it can use skills in the repo right that is checked in it's not on Claude tasks yeah but like if you like it reads instructions and so on and you can like find it and like still see it since it's in the code base it's more like the the skills that you have locally that the reason why we don't allow it on Claude is because there's no way for the sandbox to know whether or not a skill is trusted or not right and so that's why we we don't and like skill can package like a python script or yeah or an execute it won't execute things but like if you have you know like things like resources it can access it technically because it is like in the repo is just yeah it's not as good so I have to request it yeah thank you thank you were there any other questions cool have a great day enjoy the day and if you have any other questions we're going to be around today tomorrow and also maybe on Friday feel free to reach out or just like drop a dm and enjoy thank you

OpenAI Codex Masterclass — Vaibhav Srivastav & Katia Gil Guzman

TL;DR

Takeaways

Vocabulary

Transcript