Building AI-native: Inside the stacks powering Cognition, Gamma, and Harvey

Building at the frontier of AI requires companies to constantly re-architect their systems, as rapidly improving model capabilities and user expectations necessitate fundamental shifts in product design.
A key challenge for AI-first companies is adapting from individual productivity gains to organizational impact, focusing on observability, infrastructure, and flexible architectures that balance speed and quality.
The future of AI products lies in proactive, agent-mediated intelligence, enabling human-agent and multi-agent collaboration, and transforming workflows from self-driving codebases to integrated communication platforms.

Rapid Iteration and Re-architecture: AI companies must be prepared to fundamentally re-architect their platforms with each new wave of AI progress (e.g., foundation models, reasoning models, coding agents), treating architectural decisions as temporary.
Prioritize Observability and Internal Tooling: Invest heavily in logging, observability, and internal tools to understand agent decisions and system behavior. This enables quick adaptation to new model capabilities and informed re-architecture.
Leverage Autonomous Agents: Bet on autonomous agents that can plan, execute, test, and fix issues independently, especially with improvements in long-horizon autonomy and native file system usage for memory and planning.
Shift from Speed to Quality (with user choice): While early AI breakthroughs emphasized speed, user expectations now prioritize accuracy and efficacy. Design flexible generation architectures that allow users to choose between speed and quality based on their specific workflow needs.
Embrace Multi-Channel Platform (MCP) Strategies: Integrate your AI product as an agent into other widely used platforms and tools to enhance distribution, facilitate new user acquisition, and enable seamless workflows without users leaving their existing environments.
Focus on Infrastructure for Enterprise Adoption: When dealing with sensitive data, prioritize robust infrastructure that addresses ethical walls, data boundaries, and security before fully offloading complex tasks to agents.
Prepare for Agent-Mediated Workflows: Design products and services to be delightful and useful for both human and AI agents, anticipating a future where communication, collaboration, and tasks are increasingly mediated by intelligent agents.

Frontier AI Companies — Businesses that are developing or leveraging the most advanced and cutting-edge artificial intelligence technologies. Application Layer Companies — Companies that build software applications on top of existing AI models or infrastructure, often tailored for specific industries or use cases. Autonomous Agents — AI systems capable of perceiving their environment, making decisions, and taking actions to achieve specific goals without constant human intervention. LLM Instruction Tuning — The process of fine-tuning a Large Language Model to better understand and follow specific instructions or prompts, improving its performance on given tasks. LLM Tool Calling — The ability of a Large Language Model to intelligently determine when and how to use external tools (like APIs or functions) to retrieve information or perform actions. Agent Orchestration — The process of coordinating and managing multiple AI agents, or an agent's internal steps, to work together towards a complex goal. MCP (Multi-Channel Platform/Partnership) — A strategy to integrate and distribute a product across various external platforms and tools where target users are already active, enabling new acquisition channels and workflows. Ethical Walls — Internal policies and technical controls used to prevent information sharing or conflicts of interest within an organization, especially in fields like law. Observability — The ability to understand the internal state of a system by examining its external outputs, logs, and metrics, crucial for debugging and improving AI systems. Jevons Paradox — The economic observation that as technological efficiency increases the use of a resource, the rate of consumption of that resource also increases, rather than decreasing.

and venture partnerships of Anthropic, Beth Robertson. Hello everyone, my name is Beth Robertson, and I lead the startup team here at Anthropic, and I am so excited to have all of you in the room with us today. Now everyone in this room is working through the same architectural questions that come with building frontier AI companies. And these three humans, next to me, are absolutely no strangers to them. So I'm excited today to take us on a tour of how the bets that they've made and some of the nuances that they've had to navigate as they are building at the frontier. So, before we kick things off, I'd love to invite us to just go down the line. Who are you? Introduce yourself, please. What's your day job? What does your business do? And what is one core bet that you took when founding your company? Core bet, all right? Well, first of all, Beth and Anthropic takes your having us as an incredible event. I'm Nico Gruppen. I lead applied research at Harvey. Harvey is the gender-to-vehide platform for legal and professional services. And, yeah, I mean, I think Harvey, and I would argue, this is the case for most application layer companies, is really a large bet that model capabilities are going to improve really rapidly, and that those capabilities are going to generalize well to the legal vertical. Right? Like a lot of people didn't know this. When I joined Harvey, we were living and working out of an Airbnb. Gabe and Winston were using what I think we would call like small models now to essentially answer personal legal questions on Reddit. Right. And so it's really this wave of this exponential progress at the model layer that's allowed us to kind of raise the ceiling on our ambition as a company. Amazing. Yeah. Again, also, thank you for having me. I'm Walden. I'm one of the co-founders of cognition. And we build coding products like Devon and Windsorf. And I think the key bet that I think of when it comes to Devon was really this bet on autonomous agents. And this was even before we had agents to start with. So in many ways, our product didn't work. One of the first came up with it. But the vision of having something that didn't just write the code for you, but then actually had its own computer. And we'd run the code, and then actually pull up a desktop, and test it, and tell you when it isn't working, and fix its issues, and give you a finally working PR at the end of it. Very much was not possible with the set of models we had two years ago. But lots of incremental things changed and improved since then. I think around this time last year, you saw a lot of natively agentic models, like Sonya3536 come out. That helped a lot. Computer use testing capabilities are becoming more common nowadays. And I think especially with the recent models we're seeing, there's a large, long horizon autonomy that's becoming nearly possible, where you can have it run on ours end. And you start to feel bottlenecked by trying to run too many agents locally. And so we see explosion in Claude agent usage this year. I think one crazy stat we've seen as a result of new bottleneck capabilities is that our best week of 2025, like the end of 2025, the amount of agent usage has grown five to seven X in our customers just so far in 2026. So it looks like it's going to be exposed to your for Claude agents. And the bottleneck abilities are only going to keep growing from here. Amazing. Well, we'll get back to your current best that you're taking in a minute. But let's go down to you, Jimmy. Excited to be here. First of all, especially excited to be here with all of you folks, we're big dev and users, we're big Claude users, we're big gamma users. And Nico, we're going to have to figure out how to get big heart users. I'm going to get you on Harvey. I'm going to get you on. Hey, everyone, I'm Dini Fatihah. I'm head of AI product at gamma. Gamma is a visual communication platform for professionals. We have 70 million users and growing. Gamma started with the simple observation that people like us when we're communicating our highest-dakes ideas. We're usually doing it through a visual artifact. Presentations to your investors, proposals to your customers, marketing sites, social posts, to spread your ideas. And what do we do when we have to communicate those ideas? We spend like 10% of our time thinking about the core insight and 90% of the time on the design and the formatting and the fudzing of the details. Gamma's big bet was that AI with AI that we could take away that 90% of the fudzing that people spend their time on, you as a professional, bring the core insight that only you are capable of bringing. Gamma will take it, flesh out the idea, structure the narrative, design it and make it look beautiful so that your ideas can be cast in the best light possible. Yes, and it's a beautiful experience. I thank you all for building what you felt. We love it. I want to just go back to the origin story before we get to the present. There was presumably a window where you just had an idea and it was finally made possible. Talk to me about what was happening in the ecosystem, what shifted that made it first possible for the first time for your product or service to exist. Was there a moment? What was kind of the moment? I'm happy to take the first step at this. To be honest with you, I do think there was a first moment but I think honestly with every big wave in AI progress, Gamma's had tailwinds that have also evolved our product. Gamma started in 2020 before the recent AI wave really had hit and landed. When image models started to get come out and get good, and when LLM instruction tuning started changing how we interact with LLMs and what we can get out of them, that's when Gamma really had an aha moment. That's the inflection point that birthed Gamma as we know it today. Over time, again, like I said, tides of AI progress have evolved how we see and build our product. The next big thing that I think we did was when LLM tool calling started getting really good, we jumped on it. We built our first agenteic experiences. To this day, editing using our agent is one of our biggest differentiators and that happened because of the big wave in LLM tool calling agent orchestration. A big one for us as well was the MCP wave. Again, we leaned into that early, we leaned into it heavy. It allowed us to build connectors into several other platforms. Actually, I think we built our first connector into Claude. And it really changed how we think about distribution, not just our product, but our GTM because all of a sudden, what that allowed us to do was we had Gamma as an agent in other surfaces and tools that our target users loved and were already living in. And so what we started seeing is that not only were our existing users now able to use Gamma much more frequently and regularly because they didn't have to break their workflow and go to Gamma to continue their work. They were just doing it already in Claude. And just hitting like, yes, please make it presentation out of this. Not just that, but also for new users. It was a huge acquisition channel for us. Users started discovering us through platforms like Claude because that's where they were already working and living. So that was another sort of, I would say, evolution that happened because of MCP. And honestly, we're thinking about the next evolution of our product already now with communication being agent mediated. We're asking ourselves what does Gamma look like in this next evolution? And that's going to be something that is, I think, something that we spent time on this year as well. So is the MCP acquisition channel your primary way that you're getting customers today? Sorry. Is the MCP acquisition channel your primary product-led growth motion today? Oh, it's a huge one. I don't know. It's primary. And we also have our, because of MCP, we have connectors now not just in Claude, but in several, there be to be tools. So it's becoming a really good acquisition channel. Yeah. Fascinating. How are you, Walden, for Cognition? Yeah. One of the things I think you learn when you're building agents starting in like 2024, you build a lot of things that you go delete. And so one of the funniest examples of this models back in the day, they didn't know how to edit code. They could only spit out entirely new files of code. And so you had to go to really creative measures to get these things to edit code. And if you look into it, it's actually a really interesting tech that companies like Cognition worked on to make this happen. You're like, a combination of speculative decoding and inference techniques. But it's all changed once models were actually natively RL to be able to edit code, grab code on their own. And so that was a huge leap for us. I think another one that people underestimate is just how natively these things use file systems now it is. Before, we had a whole custom planning system, the ability to create this long-rise interjectories and follow the steps cleanly. And now you can kind of just tell the agent, write your plan down and follow it. And it'll know how to look at the file system, figure it out. Same with memory, I think there's probably a lot of you in the audience right now thinking about how to make memory for your agents and get it built a lot of very custom systems before. Now you can use your file system in a much deeper way. People moving from rag to file systems. So I think that's probably a pretty actionable thing that we found is how you should build your models and agents now that that capabilities improve the line. And it's unlocked a lot of long-rise and work you can do with Devon. You'll have that. So autonomous, agentech work-ish. Yeah. Let's go over to you, Nico. Sure. No, I think actually I love the phrase, Walden used during the intros, which is that model capabilities make things that were impossible possible. So for Harvey, I think there were really three inflection points. The first is just the emergence of foundation models, the scale leading to kind of emergent reasoning capabilities that brought us from this like Reddit legal Q&A world to solving big-go-ah legal work. Second, of course, reasoning models kind of late 2024, early 2025. That was really where we started working on what I would call like workflow automation. So with enough elbow grease, enough predetermined model calls, retrieval steps, search steps. You can really solve any individual task. And then this past winter, I think Opus 4.5 was really the tip of the spear here with coding agents. You just offload planning an orchestration to these models. And I think the coolest part about it for us at Harvey is not just what happens at the individual agent level, but with multi-agent coordination. So work in law firms is very hierarchical. You have a partner who's working with Dario or your GC on a months-long project. They're going to decompose that into a number of tasks they give to their senior associates. They're going to complete those task over weeks. We're going to break those down into tasks they can give junior associates complete on the order of these. And now with models like Opus 4.7 and increasingly infrastructure to orchestrate agents like CMA, I don't know if Jess and Tim are in the audience right now, but it's been also working with them sort of on board to this manage agents infrastructure. You can model agentic systems in the same kind of hierarchical fashion, which unlocks massive potential for our work. So you're able to outsource a bunch of tasks. Yeah, exactly. That's beautiful. Thank you for sharing the early bets. I think now just looking backwards at those bets that you made, for better or worse, I think we all in hindsight may have changed some things. What would this have been for your early bets and what did you learn the hard way that make that true for you today? As a word of caution, anybody's setting this up. I'm happy to start here because I just mentioned these three inflection points. What I didn't mention is that for each of those three inflection points, we had to completely rearchitect. A lot of. Right. And so I think the kind of hard earned lesson here is you can't make point and time decisions and then stick to them. You almost have to, like, even, this is kind of a new playbook for engineering where you kind of want to cut things once and then just scale it. You have to be able to project forward progress and in many cases, like, exponential progress. And so I think the coding agent example is a great one where six months ago, if you asked me what our architecture looks like, it's fundamentally different than what it looks like today. And if we hadn't been willing to say, we need to scrap this and go agent native. We just wouldn't have these capabilities in our platform. The ground is literally shifting under us. Is your planning timeline truncated them? Is it, like, months or weeks? Or how do you think about that? So we still plan on a quarterly basis, but we have baked into sort of, like, the week to week execution, just like Retro's. I'd say, like, hey, do we need to just totally deprioritize this and prioritize this around in its favor? Yeah. It's like, you need to be able to do that on a week basis. Yeah. I totally do. That's kind of the way of life of building an AI right now is you have to accept that the thing you build today is very likely going to be scrapped in six months to year, especially if you're keeping up at the time to doing your things correctly. And so I don't look back and regret the fact that we did any of the previous things we did and had to get rid of them. I think the actual important part that was really important to invest in and we should have invested more in, if anything, was building everything you needed to actually make sure you can keep changing your product as the quality of the model change. And so that actually looks like the underlying logging, observability, the ability to know, like, for your engineers, to be able to dig into any decision your agent makes. We play that, figure out, do new models, make that decision better, have e-values for those things. The easier it is for you to answer these questions about your system and why something went wrong, the easier it is for you to say, okay, new model capabilities mean that we have to do this differently now. We have to re-architect this. And I've seen a lot of companies that kind of like build in the dark, they add a prompt and they're like, oh, hopefully this makes things better. But you really don't know. And so if you want to reliably work across like new model upgrades, different models, the observability, your internal tooling, it's super, super important to get rid. Got it, so observability, don't lock into anything for too long because everything changes. How about you, do you? I mean, echoing a lot of what Nico and Walden saying, we're actually living through this right now where we re-architect, re-orchestrating our entire generation architecture. And this is because if I take us back in time, when we first built our generation system in gamma, the breakthrough is speed, right? You come to gamma, you put in a prompt, you choose some settings, you hit generate, and you go from a blank page to a beautifully designed deck in a minute. And I remember the first time I experienced that as a gamma user and I thought it was mind blowing. Like that was the breakthrough, right? When AI, you know, again, the reason it wave of AI hit, I feel like speed was the thing that everybody was so mesmerized by. But things have changed and evolved. It's not only has AI gotten much smarter in terms of its ability to self critique and coach the user and push back and reason and research and so much more, user expectations have evolved rightly along with that as well. So today, given the workflow, a lot of the times users are willing to wait instead of having a, wow, that was fast moment, they want to have a wow, it really nailed it moment. And they're willing to wait for that moment. And so now we're rethinking, we're sort of re-architecting our system to be able to a, take advantage of the latest sort of capabilities of AI, at the expensive speed in some cases. Because the thing that the nuance here isn't that it's speed versus quality, it's always speed and quality. Because I just told you all about all the fact, all of the stuff about how generation people are willing to wait longer. We find when they're editing and making the tweaks, they want snappy, responsive, fast, at the cost of quality sometimes. So it's one of those systems, like we're sort of taking a step back and figuring out, how do we build a system? Where it's not speed versus quality, these are all parameters that we can dial up and down and often, even depending on sort of the workflow, pass that choice onto the user. So they can decide what they're in the mood for today and what they have time for today. So designing a system that's sort of extensible across those parameters and flexible is sort of the exercise that's happening again. And are you using like different models to be able to play this whole field? Totally. We were always using different models, but I think we were sort of almost optimizing them for a certain outcome. And that what we're trying to do is, well, we have a whole layer of models that we're orchestrating, but then now we're also trying to sort of give the user the power to decide, you know what, today I need it fast. I don't have time. I have customers who are in their car who want to make me a deck. I'm on my way to my customer. So that's when you need speed versus quality. And we want to give that the power to the user to sort of decide. Fascinating. I love that. You brought us to the topic that I was going to logically flow to next, which is like, what is the big bet that you're making today? That you're kind of betting the next three months. We'll get it a shorter time horizon on. Or we can pivot it. It's a big bet. That you're making today that founders in this room should be paying attention to. I'm happy to answer the back question as well. No, I'm very excited for this next year. It feels like Claude agents are kind of becoming very possible now for the last few months. And I think the thing that everyone kind of has really high demand for, whether you call it self-driving code bases, whether you call it the software factory, is just how can you just take everything that you do as a software org and automate as much of that as possible. And then whatever does need human to review, to look at, have the AI lift that up to human. You're going from a world where it's like default like human at the driving wheel. And AI can take over to like the AI is the one driving the projects end to end, doing the planning, the coding, the reviewing, and testing, and figuring out when it actually needs to pull the human in. And when you make this shift, that's absolutely crazy. And it significantly increases the amount of work you can do as a software org. So I've been telling people today who I've been meeting like cognition as a company where like 50 engineers right now. But every engineer has like 10 devins that they're using to do everything they need to do. And the role of the software engineer has to change when your code base becomes self-driving and you have a self-maintaining code base. There's a lot of big companies in the world who have to make this change over the next three years. And also a lot of what we're doing is going in and partnering with them to help them figure out how do you restructure the way you think about coding, you think about project management to get to this point. But there's a lot of really cool automations and setups you can do to actually get you to this frontier. It's what we're hearing from all our customers and that's what we're building up on right now. Yeah, I mean, I definitely agree with this transition from like reactive intelligence to proactive intelligence. Where my brain went with this is actually more like back to the less and less. I saw you part of this. Which is individual productivity gains from AI distributed widely does not equal organizational product to gain. Right? And so this is like almost exactly what Walden was just describing is like the nature of your role changes. Is like if you move 10x faster, that means you can also move in the wrong direction. 10x faster or make mistakes 10x more quickly or the blast radius is 10x larger. So how do you actually move one layer of abstraction up in the decision making process? To like the engineering equivalent here would be like is this the architecture scale, those are secure, et cetera. I think we're going to see that brought it into more general knowledge work as well. And then I think the part that I'm really excited for Harvey to play in this is collaboration. Like what does the interface for that even look like? Right? If you want to enable lawyers to collaborate within their firm, you want to enable lawyers to collaborate with their clients outside of their firm. And then you want humans to be able to collaborate with agents kind of all in the same workspace. I think it's a really interesting problem from all aspects, product AI infrastructure. And I think it's going to be a big part of the rest of our year and going into next year. And knowing that, like I know that you're planning three months ahead, how are you thinking about that for Harvey? And just like the personal productivity versus organizational productivity stands out to me. Yeah. So this actually is where I know the tendency, especially on Twitter, is just to lean into the bleeding at the frontier of model intelligence and try to pull it and push it. This is actually where we're taking a step back and focusing on infrastructure first. So to give you an example, if you have agents running around in these workspaces, there are certain kind of data constraints that law firms have. This is extremely sensitive data, right? Client data. One thing we run into pretty frequently is what's called ethical walls. So imagine like Walden and I are associates at the same law firm. You could be representing Pepsi as a client. I could be representing Coca-Cola. We then get staffed on the same project. How do you make sure there's no data contamination, right? Of this very sensitive kind of client information. That's maybe not something that you want to immediately offload onto an agent to manage. You want like hard data boundaries there. Same thing with jumping between public and private data sources. So that's the immediate emphasis. And then I think you back out from there into really interesting AI concepts like memory. That's the thing. Incredible. Do you have any thoughts on the future like bets that are going to happen on the next day? I mean, from what Nico was saying, yeah, I think there's like two sort of bets slash things should bounders should be looking out that are sort of super top of mind for me. And it echoes some of what's been said. Well, the first one is it's a little bit different, which is I think this year we really want to. I mean, we've always sort of taste is the new buzzword in Silicon Valley. And guess what? A gamma we've been agonizing over taste for years. And so we're going to continue down that route. We're really thinking about how to increase the visual range of what you can do in terms of design in gamma. We've always been doing it. We're going to double down on it a lot more. And that's going to be something that's like super sort of critical for us in the next few months. But also the thing that both Nico mentioned and I mentioned earlier, which is we are increasingly in a world where agents are everywhere. And they're helping us and they're mediating our world. And we're thinking about this very deeply. Like what does gamma look like in a world where communication is agent mediated? Where agents are supposed to be able to use your product just as humans, how do you make your product delightful and useful both for humans and agents and human agent collaboration? So that's going to be something really, again, that we're spending a lot of time on this year in a big area for a focus. And I think it's something that I haven't at least seen enough people think about is what does your product look like? Does your product exist in a world in its current form in a world where it's all agent mediated? I think that's something that everybody's thinking about. A lot to chew on. OK, I'm looking at the timer. And we are going to move to our lightning round. He's excited. OK, one thing that AI has solved in your personal life that's been a massive unlock. Go. We're just going to have to. Bye, words. I mean, you guys, popcorn. Bye, words. All right, this is somewhat embarrassing. I've offloaded my entire weekly meal planning in day-to-day diet to Claude Code. I'm not getting any. I have a log on my local machine. And what? Are you eating better than ever? Oh, yeah. I like massaged all psychophantic behavior out of the model with the Claude. Oh, massive unlock. OK, that was more than five words. Yes, sorry. Travel planning. My mom. That's a good one. Travel planning. What was it? Travel planning? Yeah. Love it. I'm going to give you more than five words. I run as a concert at my home a few weeks ago. And Claude actually was the event planner for it. It found me caterers. It took care of my decor. It advised me on the setup. It was amazing. You guys are so cool. It's like travel, food, and concerts. That's your house. I love that. OK, one AI prediction in the next 12 or let's do six months that you'd bet money on. Companies are really sort of leaning into a multitude of AI tools. Like I said, we're using so many different tools, even in the same sphere. I think we can't, like, pockets are going to not not be deep for long. So I think there will start to be AI tool consolidation in the next 12 months. I think there will be a greater demand for software engineering jobs in a year than there are today, even if they're not called software engineering jobs. Some kind of technical role. But I think that might be a hot take and Silicon Valley today, but I do believe it. Love it. I actually don't think that's that hot of a take. I think Javon's paradox is real across all of knowledge work. And we're going to see that this year. I'm going to piggyback on Walden's point earlier. I think proactive intelligence and specifically ambient intelligence where the agents are just doing things on your behalf without you even knowing is going to be a big C, especially consumer products. Feels right. OK, more compute or better e-vails. Better e-vails. Yeah, one on both. But better e-vails in the short time. OK, I'll be the better lesson person here than say more compute. And more compute will help you build better e-vails. Love it. Generalist, one generalist agent, or many specialists. I'm going to say false dichotomy. Yeah. Yeah. Thank you, Fish. Specialized agent is just a very nice generalized agent with the right skills and tools. It's a big, it's a big, it's a big, big center. It's a big, it's a big center. OK, last one. Founders in this room, parting advice, and fireless words. Let's go this way this time. I think ship it before it's ready. OK. Prioritize hiring great people. Love it. That's a good one. I would say seek uncomfortability. I love it. Well, thank you all for being here. We're so grateful. Thank you guys for joining us. We're grateful. You're well. Thank you.

Building AI-native: Inside the stacks powering Cognition, Gamma, and Harvey

TL;DR

Takeaways

Vocabulary

Transcript