- The rapid advancement of AI was fueled by a shift away from the cautious "AI winter" mentality, embracing ambitious scaling laws that demonstrated consistent performance gains with larger models.
- Early AI safety efforts, including the "Concrete Problems in AI Safety" paper, aimed to ground abstract concerns in practical machine learning, building consensus across institutions.
- Anthropic's Responsible Scaling Policy (RSP) is presented as a foundational, iterative framework designed to integrate safety as a core product requirement, ensuring models are tested and secured at various development thresholds.
Building Anthropic | A conversation with our co-founders
- The prevailing "AI winter" mindset initially discouraged ambitious AI visions, which was gradually overcome as empirical evidence for scaling laws emerged.
- Scaling laws revealed that simply making AI models larger, by increasing compute and parameters, led to eerily consistent performance improvements across diverse tasks.
- Language models, particularly when combined with Reinforcement Learning from Human Feedback (RLHF), were identified as a promising path to align AI systems with human values by enabling them to understand implicit knowledge.
- The "Concrete Problems in AI Safety" paper (circa 2016) was a strategic effort to build consensus around AI safety by presenting practical problems grounded in contemporary ML, thereby making safety a credible area of research.
- "Constitutional AI" leverages the ability of large language models to read and internalize principles, effectively acting like a "multiple-choice exam solver" for ethical guidelines.
- Anthropic's Responsible Scaling Policy (RSP) functions as an internal "holy document" that establishes thresholds for model capabilities, requiring specific safety tests and security measures at each stage of development.
- The RSP ensures organizational alignment by making safety a non-negotiable product requirement, clarifying expectations, and preventing both excessive caution and uncalibrated "fire drills."
- Developing and implementing effective AI safety policies like the RSP is an iterative, challenging process that frequently uncovers unforeseen "gray areas," necessitating continuous adaptation and refinement.
AI winter — A period of reduced funding and interest in artificial intelligence research, typically following periods of over-optimism.
Google Brain — A research division at Google focused on deep learning and artificial intelligence.
OpenAI — An AI research and deployment company that aims to ensure artificial general intelligence benefits all of humanity.
scaling laws — Empirical relationships observed in AI that describe how model performance improves predictably as resources like compute, data, and model size increase.
GPT-2 — A transformer-based language model developed by OpenAI, notable for its ability to generate coherent text.
GPT-3 — A successor to GPT-2, even larger and more powerful, capable of performing a wide range of natural language tasks.
Anthropic — An AI safety and research company, founded by former OpenAI members, focused on building reliable, interpretable, and steerable AI systems.
language models — AI models designed to understand, generate, and process human language.
RLHF — Reinforcement Learning from Human Feedback; a technique used to align AI models with human preferences by training them with human-generated feedback.
Constitutional AI — A method for aligning AI models by giving them a "constitution" of principles or rules, which they use to evaluate and refine their own outputs.
Responsible Scaling Policy (RSP) — A framework developed by Anthropic to manage the risks associated with increasingly powerful AI models, setting thresholds for capabilities and requiring safety measures.
ImageNet — A large visual database designed for use in visual object recognition software research.
GPUs — Graphics Processing Units; specialized electronic circuits designed to rapidly manipulate and alter memory to accelerate the creation of images, crucial for AI training.
arXiv papers — Pre-print research papers published on arXiv.org, a repository for electronic preprints of scientific papers in fields like physics, mathematics, computer science, and more.
Why are we working on AI in the first place? I'm just going to arbitrarily pick Jared. Why are you doing AI at all? I mean, I was working on physics for a long time, and I got bored, and I wanted to hang out with more of my friends. So I thought, I thought, are you pitched you on it? I don't think I explicitly pitched you any point. I just kind of like showed you results of like AI models. And I was trying to make the point that like, they're very general and like they don't apply to one thing. And then like, just at some point after I showed you enough of them, you were like, oh yeah, it seems like it's right. How long have you been a professor before, like when you started? I think like six years or so. I think I helped recruit Sam. I talked to you and you were like, I think I've created a good bubble here. And like, my goal is to get Tom to come back. And then it worked. And did you meet everyone through Google when you were doing the interpreter level C stuff, Chris? No, so I guess I actually met a bunch of you when I was 19. And I was listening to the air, yeah, for the first time. So I guess I met Dario and Jared then, I guess they were postdocs, which I thought was very cool at the time. And then I was working at Google Brain and Dario joined. And we sat side by side actually for a while. We had desks beside each other. And I worked with Tom there as well. And then of course, I went to work with all of you at OpenAI when I went there. Yeah. So I guess I've known a lot of you for like more than a decade, which is kind of wild. Farron, remember correctly, I met Dario in 2015 when I went to a conference era and I tried to interview you and Google PR said I would have read all of your research papers that you need. Yeah, I think I was writing concrete problems in AI scene when I was at Google. I think you wrote a story about that paper. I did. I remember right before I started working with you, we had I think you invited me to the office to come chat and just like tell me everything about AI. And you explained, I remember afterwards being like, oh, I guess this is, I guess this is not as much we're serious than I realized. And you were like probably explaining the big love of compute and like parameter counting and how many neurons are in the brain, everything. I feel like Dario often has that effect on people. This is much more serious than I realized. Yeah, I'm the bringer of happy times. But I remember when we were at OpenAI, where there was the scaling law stuff and just making things bigger and it started to feel like it was working. And then it kind of kept on eerily working on a bunch of different projects, which I think is how we all ended up working closely together because it was first DPC2. And then scaling laws and GPC3 and we ended up being people. We're at the plot of people that were making things work. Yeah. That's right. I think we're also excited about safety, because there was sort of this idea that AI would become very powerful, but like potentially not understand human values or not even be able to communicate with us. And so I think we were all like pretty excited about language models as a way to kind of guarantee that AI systems would have to understand kind of implicit knowledge that. And RL for human feedback on top of language models, which was the whole reason for scaling these models up was that we couldn't do, the models weren't smart enough to do RLHS F on top of. So that's the kind of intertwinement of safety and scaling of the models that we still believe in today. Yeah, I think there was also an element of like the scaling work was done as part of the safety team that Gario started at OpenAI because we thought that forecasting AI trends was important to be able to have us take them seriously and take safety seriously as a problem. Correct. Yeah, I mean, we took, I remember being in some airport in England sampling from GPT-2 and using it to write fake news articles and slacking Dario and being like, oh, this stuff actually works and by F like huge policy implications. I think Dario said something like, yes. That's a tough one. That's a tough one. It's a tough one. But then we worked on that a bunch as well as the release stuff, which was kind of wild. Yeah, I remember the release stuff. I think that was when we first started working together. Yeah. That was a fun time that GPT-2 launched. Yeah, but I think it was good for us because we did a kind of slightly strange safety oriented thing altogether and then we ended up doing Anthropic, which is a much larger, slightly strange, safety oriented thing. That's right. So I guess just like going back to the concrete problems because I remember so I joined OpenAI 2016, one of the first 20 employees or whatever with Udario. And I remember at that time the concrete problems in ASAFD seemed like it was the first mainstream AI safety. Yes. Paper? I don't really know if I ever asked you what the story was for how that came about. Chris knows the story because he was involved in it. I think we were both at Google. I forget what other project I was working on. But with many things, it was my attempt to procrastinate from whatever other project I was working on that I've now completely forgotten what it was. But I think it was like Chris and I decided to write down what are some open problems in terms of AI safety and also AI safety you usually talked about in this very kind of absurface abstract way. Can we kind of ground it in the ML that was going on at the time? I mean, now there's been like six, seven years of work in that vein, but there was almost a strange idea at the time. Yeah, I think there's a way in which it was almost a kind of political project where at the time a lot of people didn't take safety seriously. So I think that there was sort of this goal to collate a list of problems that sort of people agreed were reasonable, often already existed in literature, and then get a bunch of people across different institutions who were credible to be authors. And like I remember I had this like whole long period where I just talked to like 20 different researchers at brain to build support for publishing the paper. Like in some ways if you look at it in terms of the problems and a lot of things that emphasized, I think it hasn't held up that well in that it's, I think it's not really the right problems. But I think if you sort of see it instead as a consensus building exercise that there's something here that is real and that is worth taking seriously, then it was a pretty important moment. I mean, you end up in this really weird sci-fi world where I remember at the start of Anthropic we were talking about constitutional AI. And I think Jared said, oh we're just gonna write like a constitution for a language model and that'll change all of its behavior. And I remember that sounded like incredibly crazy. Of the time. But why did you guys think that was gonna work? Because I remember that was one of the first early big research ideas we had of the company. Yeah, I mean I think Darryan and I had talked about it for a while. I guess I think simple things just work really, really well in AI. And so like I think the first versions of that were like quite complicated but then we kind of like whittled away into like just use the fact that AI systems are good at solving multiple choice exams and like give them a prompt that tells them like what they're looking for. And that was kind of what we needed. And then we were able to just write down these principles. I mean it goes back to like the big blob of computer, the bitter lesson or the scaling hypothesis. If you can identify you know, something that you can give the AI data for and that's kind of a clear target, you'll get it to do it. Here's this set of instructions. Here's this set of principles. AI language models can like read that set of principles and they can like compare it to the behavior they themselves are engaging in. And so like you've got your training target there. So once you know that, I think my view and Jared's view is there's a way to get it to work. You just have to fiddle with enough of the details. Yeah, I think it was always weird for me, especially in these early areas. Because like I was in physics and then coming from physics. And I think now we forget about this because everyone's excited about AI. But like I remember talking to Dario about concrete problems and other things. And I just got the sense that AI researchers were very, very kind of psychologically damaged by the AI winter where they were, they just kind of felt like having like really ambitious ideas or ambitious visions was like very disallowed. And that's kind of how I imagine it was in terms of talking about safety. In order to care about safety, you have to believe that AI systems could actually be really powerful and really useful. And I think that like there was kind of a prohibition against being ambitious. And I think one of the benefits is that physicists are very arrogant and so they're constantly doing really ambitious things and talking about things in terms of grand schemes. And so yeah. I mean, I think that's, I think that's definitely true. Like I remember in 2014, it was like there were just like, I don't know, there were just like some things you couldn't say. Right? But I actually think it was kind of an extension of problems that exist across academia other than maybe theoretical physics. Where they've kind of evolved into very risk-averse institutions for a number of reasons. And even the industrial parts of AI had kind of transplanted or forklifted that mentality. And it took a long time. I think it took until like 2022 to get out of that mentality. So there's a weird thing about like what does it mean to be conservative and respectful? Where you might think like one one version you could have is that what it means to be conservative is to take the risks or the potential harms of what you're doing really seriously and worry about that. But another kind of conservatism is to be like, ah, taking an idea too seriously and believing that it might succeed is sort of like scientific arrogance. And so I think there's like kind of two different kinds of conservatism or caution. And I think we were sort of in a regime that was very controlled by that one. I mean, you see it historically, right? Like if you look at the like early discussions in 1939 between you know, people involved in nuclear physics about what the nuclear bombs were sort of a serious concern. You see exactly the same thing with like for me resisting these ideas because it just seemed kind of like a crazy thing and other people like Zillard or Teller taking the ideas seriously because they were worried about the risks. Perhaps the deepest lesson that I've learned in the last 10 years and probably all all, you know, all of you have learned some form of it as well. Is there can be this kind of seeming consensus? These things that kind of everyone knows that that I don't know seems sort of why seem like their common sense. But really they're just they're just kind of hurting behavior, masquerading as maturity and sophistication. And when you've seen the consensus can change overnight. And when you've seen it happen a number of times, you you you suspected, but you didn't really bet on it. And you're like, oh man, I kind of thought this. But what do I know? Ha, ha, you know, how can I be right and all these people are wrong? You see that a few times. Then you just start saying, nope, this is the bad we're going to make. I don't know for sure if we're right, but like just ignore all this other stuff. See it happen. And I don't know, even if you're right, 50% of the time being right, 50% of the time contribute so much, right? You're you're you're adding so much that is not being added by anyone else. Yeah. It feels like that's where we are today with some safety stuff. Where there's like a consensus for you that a lot of this safety stuff is unusual or doesn't naturally fall out of the technology. And then I'm not going to pick we do all of this research where weird safety misalignment problems fall out as a natural dividend of the tech we're building. So it feels like we're in that counter consensus for you right now. But I feel like that has been shifting over the past, even just like 18. We've been helping. We've definitely. I know. I mean, the public should be doing research. The public should be. But I also think just like world sentiment around AI has shifted really dramatically. And you know, it's more common in the user research that we do to hear just customers, regular people all say, I'm really worried about what the impact of AI on the world more broadly is going to be. And sometimes that means jobs or bias or toxicity. But it also sometimes means like, is this just going to mess up the world? Right? Is it how is this going to contribute to fundamentally changing how humans work together or operate? Which is, I wouldn't have predicted that actually. Yeah. But yeah, for whatever reason it seems like it seems like people in the ML research sphere have always been more pessimistic about AI coming very powerful and the general public. Maybe it's a real public disability or something. I went to Aryan. I went to the White House in 2023. In that meeting like Harris and Romondo and stuff basically said, uh, paraphrase, but basically said like, we've got a high on you guys like, AI is going to be a really big deal and we're now actually paying attention. Which is, and they're right. They're right. Absolutely right. But I think in 2018 you wouldn't have been like, the president will call you to the White House to tell you they're paying close attention to the development of language models. Yeah. Just like a crazy place. That's not the big thing. Yeah. Like to hear to you. One thing that I think is interesting to just, uh, is like, I guess like all of us kind of got into this when it didn't seem like there was, like, we thought that it could happen. But yeah, it was like, like fair maybe in like skeptical of the atomic bomb. It was like, he had, he was just a good scientist and like, there was some evidence that it could happen, but there also was a lot evidence against it happening. And he, I guess decided that it would be worthwhile because if it was true, then it would be a big deal. And I think for all of us is like, yeah, like 2015, 2016, 2017, like there was some evidence and increasing evidence that this might be a big deal. But like I remember in 2016, like talking to all my advisors, when I was like, I've done startup stuff. Like I want to help out with the, with like, hey, I saved you, but like, I'm like, not great at math. I don't know, don't know how to, don't exactly know how I can do it. And I think at the time people were like, either were like, well, you need to be super good at like decision theory in order to help out. And I was like, that probably not going to work. Or they were like, like, it doesn't really seem like we're going to get some crazy AI thing. And so I had only a few people basically that were like, like, yeah, okay, that seems like a good thing to do. I remember in 2014 making graphs of image net results over time, when I was a journalist and trying to like get stories published about them, and people thought it was completely mad. And then I remember in 2015, trying to persuade Bloomberg to let me write a story about Nvidia, because every AI research paper had started mentioning the use of GPUs. And they said that was completely mad. And then in 2016, when I left journalism to go into AI, I have these emails saying, you're making like the worst mistake of your life. Which I now occasionally look back on. But it was like, it was seemed crazy at the time. From like, any many perspectives to go and take this seriously, it's scaling was going to work. And something was maybe different about the technology paradigm. You're like Michael Jordan and that coach that didn't believe in him and he's. How did you actually make the decision though? Was it, did you feel torrent or was it obvious to you? I did a crazy counter bet where I said, let me become your full-time AI reporter and like double my salary, which I knew they wouldn't say yes to. And then I went to sleep and then I woke up and resigned. It was all fairly relaxed. So you're just a decisive guy? In that instance, so far. I think it's because I was like going to work, reading archive papers, and then printing archive papers off and coming home and reading archive papers, including like Dario's papers and from the, the buy do stuff. And being like something like completely crazy is happening here. And at some point I thought you should bet with conviction. Which I think everyone here has done in that career. This is just betting of conviction that this is going to work. Yeah. I definitely, I definitely was not as decisive as you guys. But like six months, like, like, like, walking like okay, like, should I do it? Like, should I try to do a startup? Should I kind of do this thing? But I also feel like back then there were, there wasn't as much talk of engineers and the impact that an engineer could have in the AI, right? That feels so natural to us now. And we're at the same sort of talent raise for engineers of all different types. But the time it was like you're a researcher. And that's the only people that can work on AI. So I don't think it was crazy that you were spending time thinking about that. Yeah. And I think that that was, that was basically the thing that got me to join Open AI. I was like, I like, I messaged the people there. And they were like, yeah, we actually think that you can help out by engineering, engineering work. Yeah. And like that, that you can help out with AI safety in that way. Which I think there hadn't really been an opportunity for that. So that was what I mean. You were my manager at Open AI. I was. That's right. I think I joined after you'd been there for a while. A little bit. I was at brain for a bit. Yeah. I don't know if I ever asked you like, what it was that they got you to join? Yeah. So I had been at Stripe for about five and a half years. And I knew Greg. He had been my boss. He was my boss at Stripe for a while. And I actually introduced him in Dario because I said when he was starting Open AI, I was like, the smartest person that I know is Dario. You should, you would be really lucky to get him. So Dario was at Open AI. I had a few friends from Stripe that had gone there too. And I think sort of like you, I'd been thinking about what I wanted to do after Stripe. I had gone there just because I wanted to get more skills after working in nonprofit and international development. And I actually thought I was going to go back to doing that. Essentially, I had always been working. I was like, I really want to help people that have less than I do. But I didn't have the skills when I was doing it before Stripe. And so I looked at going back to public health. I thought about going back into politics very briefly. But I was also looking around at other tech companies and other ways of having impact. And Open AI at the time felt like it was a really nice intersection. It was a nonprofit. They were working on this really big lofty mission. I really believed in sort of the AI potential. Because I mean, I know Dario a little bit. And so they made it management helped. Yeah, they did it in a mental health. That is a fact. And so I think that felt very me-shaped. I was like, oh, there's this nonprofit. And there's all these really great people with these really good intentions. But it seems like they're a little bit of a mess. And that felt really exciting to me to get to come in. And even just I was such a utility player. I was running like people, but I was also running some of the technical teams. Yeah, the scaling or I worked on the language team. I took over. I worked on policy stuff. I worked on some policy stuff. I worked with Chris. And I felt like there was just so much goodness in so many of the employees there. And I felt a very strong desire to come and sort of try to help make the company a little more functional. I move towards the end after we've done GPT-3. You were like, have you guys heard of something called trust and say, yes. No, no, we're not. I was like, yeah. I said, you know, I used to run some trust and safety teams at Stripe. There's a thing called trust and safety that you might want to consider for a technology like this. And it's funny because it's sort of is the intermediary step between AI safety research, which is how do you actually make the model save to something just much more practical? I do think there was a value in saying, this is going to be a big thing. We also have to be doing this sort of practical work day to build the muscles for when things are going to be a lot higher stakes. That might be a good transition point to talk about. I think that the responsible scaling policy and how we came up with that or why we came up with it and how we're using it now, especially given how much trust and safety work we do on today's models. So whose idea was VRSP? You and Soul? Yeah, it was me and Paul first talked about it in late, Paul Cristiano in late 2022. First it was like, oh, should we cap scaling at a particular point until we've discovered how to solve certain safety problems? And then it was like, well, it's kind of strange to have this one place where you cap it and then you uncap it. So let's have a bunch of thresholds. And then at each threshold, you have to do certain tests to see if the model is capable and you have to take increasing safety and security measures. Originally we had this idea. And then the thought was just look like, this will go better if it's done by some third party. We shouldn't be the ones to do it. It shouldn't come from one company because then other companies are less likely to adopt it. So Paul actually went off and designed it and many features of it changed. And we were kind of on our side working on how it should work. And once Paul had something together, then pretty much immediately after he announced the concept, we announced ours within a month or two. I mean, many of us were heavily involved in it. I remember writing at least one draft of it myself, but there were several drafts of it. There were so many drafts. I think it's got the most drafts of any doc. Which makes sense, right? It's like, I feel like it is in the same way that the US treats the Constitution as the holy document. It's like, I think it's just a big thing that strengthens the US. And we don't expect the US to go off the rails in part because just every single person in the US is like, the Constitution is a big deal. And if you tread on that, I'm mad. Yeah. I think that the RSP is our, it holds that thing. It's the holy document for Anthropics. So it's like, we're doing a lot of iterations getting right. Some of what I think has been so cool to watch about the RSP development at Anthropic too is it feels like it has gone through so many different phases and there's so many different skills that are needed to make it work, right? There's the big ideas, which I feel like Dario and Paul and Sam and Jared and so many others are like, what are the principles? Like, what are we trying to say? How do we know if we're right? But there's also this very operational approach to just iterating where we're like, well, we thought that we were gonna see this at this safety level and we didn't. So should we change it so that we're making sure that we're holding ourselves accountable? And then there's all kinds of organizational things, right? We just, we're like, let's change the structure of the RSP organization for clearer accountability. And I think my sense is that for a document that's as important as this, right? I love the Constitution analogy. It's like there's all of these bodies and systems that exist in the US to like make sure that we follow the Constitution, right? There's the courts, there's the Supreme Court, there's the presidency, there's the, you know, the, both houses of, you know, Congress and they do all kinds of other things, of course, but there's like all of this infrastructure that you need around this like one document and I feel like we're also learning that lesson here. I think it's sort of reflective view, we a lot of us have about safety, which is that it's a solvable problem. It's just a very, very hard problem it's gonna take tons and tons of work. Yeah. All of these institutions that we need to build up, like there's all sorts of institutions build up around like automotive safety, build up over many, many years. But we're like, do we have the time to do that? We've got to like, go as fast as we can to like, figure out what the institutions we need for ASAPDR and build those and like try to build them here first but make it exportable. It forces unity also because if any part of the org is not kind of in line with our safety values, it shows up through kind of the RSP. Like the RSP is gonna block them from doing what they wanna do. And so it's a way to remind everyone over and over again, basically to make safety a product requirement, part of the product planning process. And so like, it's not just a bunch of kind of like broad lives that we repeat. It's something that you actually, if you show up here and you're not aligned, you actually run into it. And like you either have to learn to get with the program or it doesn't work out. The RSP's become kind of funny over time because we spend thousands of hours of work on it. And then I go and talk to senators and I explain the RSP and I'm like, we have some stuff that means it's hard to steal what we make and also that it's safe. And they're like, yes, that's a completely normal thing to do. Are you telling me not everyone does this? You're like, oh, okay, yeah. It's not true, everyone does this. But it's amazing because we've said so much effort and we love it. Yeah, when you boil it down, they're like, yes, that sounds like a normal way. That sounds good. That's been the goal. Like, in yellow was saying, let's make this as boring and normal, like, let's make this a financial thing. Yeah, imagine it's like an audit. Yeah, yeah, yeah. Boring, boring, boring and normal is what we want. Certainly right respect. Well, also, Dario, I think in addition to driving alignment, it also drives clarity because it's really, it's written down what we're trying to do. And it's legible to everyone in the company and it's legible externally. What we think we're supposed to be aiming towards from a safety perspective. It's not perfect. We're iterating on it. We're making it better. But I think there's some value in saying, like, this is what we're worried about. This thing over here, like, you can't just use this word to sort of derail something in either direction, right? To say, oh, because of safety, we can't do X. Or because of safety, we have to do X. We're really trying to make it clearer what we mean. Yeah, you can't, it prevents you from worrying about every last little thing under the sun, right? Because it's actually, it's actually a fire drill that damage the cause of safety in the long run. I've said, like, if there's a building and then the fire alarm goes off every week, like, that's a really unsafe building is when there's actually a fire. It's like, oh, it just goes off all the time. So it's very important to be calibrated. Yes, that's right. Yeah. A slightly different frame that I find kind of clarifying is that I think that RSP creates healthy incentives at a lot of levels. So I think internally it aligns the incentives of every team with safety because it means if we don't make progress on safety, we're going to block. I also think that externally it creates a lot of healthier incentives than other possibilities, at least that I see, because it means that, you know, if we at some point have to take some kind of dramatic action, like if at some point we have to say, you know, our model, we've reached some point and we can't make a model safe. It aligns that with sort of the point where there's evidence that supports that decision. And there's sort of a pre-existing framework for thinking about it, and it's legible. And so I think there's a lot of levels at which the RSP, I think in ways that maybe I tend to initially understand when we are talking about the early versions of it, it creates a better framework than any of the other ones that I've thought about. I think this is all true, but I feel like it undersells how challenging it's been to sort of figure out what the right policies and evaluations and what the lines should be. I think that we have and continue to sort of iterate a lot on that, and I think there is a question. Also, that's difficult of sort of, you could be at a point where it's very clear some things dangerous or very clear that something's safe, but with some technology that's so new, there's actually like a big gray area. And so I think that has been like all of the things that we're saying were things that made me really, really excited about the RSP at the beginning and still do, but also I think enacting this in a clear way and making it work has been much harder and more complicated than I anticipated. I think this is exactly the point. The gray areas are impossible to predict. There's so many of them until you actually try to implement everything, you don't know what's going to go wrong, so what we're trying to do is go and implement everything so we can see as early as possible what's going to go wrong. You have to do three or four passes before you really get it right. Like iteration is just very powerful, and you're not going to get it right on the first time, and so if the stakes are increasing, you want to get your iterations in early, you don't want to get them in late. You're also building the internal institutions and processes, so the specifics might change a lot, but building the muscle of just doing it is the really valuable thing. I'm responsible for compute at Anthropic and so it's important. So thank you. So I think that for me, for me, we have to deal with external folks. And different external folks are on different spectrums of the how fast do they think stuff is going to get. And I think that's also been a thing where I started out, not thinking stuff would be that fast and have changed over time, and so I have sympathy for that. And so I think the RSP has been extremely useful for me in communicating with people who think that things might take longer, because then we have a thing where it's like we don't need to do extreme safety measures until stuff gets really intense. And then we can be like, they might be like, I don't think stuff will get intense for a long time, and then I'll be like, OK, yeah, we don't have to do extreme safety measures. And so that makes it a lot easier to communicate with other folks externally. Yeah, yeah, it makes it like a normal thing you can talk about rather than something really strange. Yeah. How else is it like showing up for people? You're Eval's Eval's Eval's. Good. It's all about Eval's. Everyone's doing Eval's. The training team is doing Eval's all the time. We're trying to figure out, like, has this model gotten enough better that it has the potential to be dangerous. So how many teams do we have that are Eval's teams? You know, front to your red team. There must be. There's a lot of people that are in the middle of the night. Every team like Eval's Eval's basically. And that means you're just measuring against the RSP, like measuring for certain signs of things that would concern you or not concern you. Exactly. Like, it's easy to lower bound the abilities of a model, but it's hard to upper bound. So we just put tons and tons of research effort into saying, can this model do this dangerous thing or not? Maybe there's some trick that we haven't thought of, like, chain of thought or best of end or some kind of tool use that's going to make it so it can help you do something very dangerous. It's been really useful in policy because it's been a really abstract concept, what safety is. And when I'm like, we have an Eval, which changes whether we deploy the model or not. And then you can go and calibrate with policy makers or experts in national security or some of the CBRN areas that we do to actually help us build Eval's but are well calibrated. And that can't have actually just wouldn't have happened otherwise. But once you've got the specific thing, people are a lot more motivated to help you make it accurate. So it's been useful for that. How is it for you? The RSC shows up for me for sure. Often. I actually think that the weirdly the way that I think about the RSP the most is what it sounds like. Just like the tone. I think we just did a big rewrite of the tone of the RSP because it felt overly technocratic and even a little bit adversarial. I spent a lot of time thinking about, how do you build a system that people just want to be a part of? It's so much better if the RSP is something that everyone in the company can walk around and tell you, just like with OKRRs, like we do right now. Like, what are the top goals of the RSP need? How do we know if we're meeting them? What AI safety level are we at right now? Are we at ASL2? Are we at ASL3? That people know what to look for because that is how you're going to have good common knowledge of if something's going wrong. If it's overly technocratic and it's something that only particular people in the company feel as accessible to them. It's just like not as productive. And I think it's been really cool to watch it sort of transition into this document where I actually think most, if not everybody at the company, regardless of their role, could read it and say, this feels really reasonable. I want to make sure that we're building AI in the following ways. And I see why I would be worried about these things. And I also kind of know what to look for if I bump into something, right? It's almost like make it simple enough that if you are working at a manufacturing plant and you're like, huh, it looks like the safety seat belt on this should connect this way, but it doesn't connect that you can spot it. And that there's just like healthy feedback flow between leadership and the board and the rest of the company and the people that are actually building it. Because I actually think the way this stuff goes wrong in most cases is just like the wires don't connect or like they get crossed. And that would just be like a really sad way for things to go wrong, right? It's just all about operationalizing it, making it easy for people to understand. The thing I would say is none of us wanted to find a company. We felt it. We felt like it was our duty, right? I felt like we had to. We have to do this thing. This is the way we're going to make things go better with the AI. That's also why we did the pledge, right? Because the reason we're doing this is it feels like our duty. I wanted to invent and discover things in some kind of beneficial way. That was how I came to it. And that led to working on AI and AI required a lot of engineering and eventually AI required a lot of capital. But what I found was that if you don't do this in a way where you're setting the environment, where you set up the company, then a lot of it gets done. A lot of it repeats the same mistakes that I found so alienating about the about the tech community. It's the same people, it's the same attitude, it's the same pattern matching. And so at some point it just seemed inevitable that we do it in a different way. When we were hanging out in graduate school, I remember you had kind of this whole program of trying to figure out how to do science in a way that would sort of advance the public good. And I think that's like pretty similar to how we think about this. Maybe you have this like project Vannevar or something to do that. I was a professor. I think basically I just looked at the situation and I was convinced that AI was on a very, very, very steep trajectory in terms of impact. It didn't seem like because of the necessity for capital, like as a physics professor, I could continue doing that. And I kind of wanted to work with people that I trusted in building institution to try to make kind of AI go well. But yeah, I would never recommend founding a company or really want to do it. I think it just, it just a means to an end. I mean, I think that's like usually how things go well though. If you're doing something just to sort of like enrich yourself or gain power or like you have to sort of actually care about accomplishing a real goal in the world and then you find whatever means you have to. Well, something I think about a lot is just a strategic advantage for us is I mean, it sounds really funny to say but just like how much trust there is at this table, right? Like I think that's not, I mean, Tom, you were at other startups. I was never a founder before but it's actually really hard to get a group of like a big group of people to have like the same mission, right? And I think the thing that I feel like the happiest about when I come into work and probably most proud of Adanthropic is how well that has scaled to a lot of people. It feels to me like in this group and with the rest of leadership, everyone is here for the mission and our mission is really clear and it's very pure, right? And I think that is something that I don't see as often to Dario's point in sort of the tech industry. It feels like there's just a wholesomeness to what we're trying to do. Like, no, I agree. Like none of us were like, let's just go and founded a company. I felt like we had to do it, right? It just felt like we couldn't keep doing what we were doing, the place where we were doing it. We had to do it by ourselves. I mean, it felt like with GPT-3, you know, which all of us have like touched or worked on and scaling laws and everything else we could see it in front of us in 2020 and it felt like, well, if we don't do something like soon altogether, you're going to hit the point of no return and you have to do something to have any ability to change the environment. I think building up, Danielle, I do think that there's just like a lot of trust. In this group, I think like each of us knows that we got into this because we want to help out with the world. We did the 80% pledge thing and I was like, I think that everybody was just like, yes, obviously, we're going to do this. It was, yeah. And yeah, I do think that the trust thing is a special thing that's extremely rare. I credit Danielle with like keeping the bar high. I credit you with the fact that we still keep it out. Oh, it's a touch. It's a Claude right there. It's a good thing. You're the reason I called you skilled, I think. People say how nice people are here, which is actually a wildly important thing. I think anthropic is really low politics. And of course, we all have a different vantage point than average and I try to remember that. Because of low ego. But it's low ego. And I think I do think our interview process and just the type of people who work here, like there's almost a like allergic reaction to politics. And unity. Unity is so important. The idea that the product team, the research team, the trust and safety team, you know, the go-to-market team, the policy team, like that, the safety folks, they're all trying to contribute to kind of the same goal, the same mission of the company, right? I think it's dysfunctional when different parts of the company are, think they're trying to accomplish different things. Yeah. And about different things or think that other parts of the company are trying to undermine what they're doing. Yeah. And I think the most important thing we've managed to preserve is, and again, things like the RSP drive it. This idea that it's not, you know, there are some parts of the company causing damage in other parts of the company trying to repair it, but that there are different parts of the company doing different functions and that they all function under a single theory of change. Extreme pragmatism, right? Yeah. The reason I went to OpenAI in the first place, you know, it was a nonprofit, it was a place where I could go and focus on safety. And I think over time, you know, that maybe wasn't as good a fit, and there were some difficult decisions. And I think, in a lot of ways, I really trusted Dario and Danielle on that. But I didn't want to leave. That was like something that I think I was actually like pretty reluctant to go along with because I think for one thing, I didn't know that it was good for the world to have more AI labs, and I think it was something that I was pretty reluctant for. And I think as well, when we did leave, I think I was reluctant to start a company, I was like, I think I was arguing for a long time that we should do a nonprofit instead and just focus on safety research. And I think it really took pragmatism and confronting the constraints and just being honest about what the constraints implied for accomplishing that mission that led to a nonprofit. But just a really important lesson that we were good about early on is like make less promises and keep more of them. Yeah. Like try to be calibrated, be realistic, confront the trade-offs because trust and credibility are more important than any particular policy. It is so unusual to have what we have. And watching Mike Krieger defend safety things of reasons why we shouldn't ship a product yet, but also then to watch Vene sort of say like, okay, we have to do the right thing for the business. How do we get this across the finish line? And to hear people like deep in the technical safety or talking about how it's also important that we build things that are practical for people and hearing engineers on inference talk about safety, that's amazing. Like I think that is, I think that is, again, one of the most special things about working here is everybody with that unity is prioritizing the pragmatism, the safety, the business. That's wild. I think about it as spreading the trade-offs from just the leadership of the company to everyone. I think the dysfunctional world is like, you have a bunch of people who only see a big, you know, safety is like, we always have to do this. And product is like, we always have to do this. And research is like, you know, this is the only thing we hear about. And then you're stuck at the top, right? You're stuck at the top. You have to decide between, you don't have as much information as either of them. That's the dysfunctional world. The functional world is when you're able to communicate to everyone, there are these trade-offs we're all facing together. The world is a far from perfect place. There's trade-offs, everything you do is going to be suboptimal. Everything you do is going to be some attempt to get the best of both worlds that, you know, doesn't work out as well as you thought it was. And everyone is on the same page about confronting those trade-offs together. They just feel like they're confronting them from a particular post, from a particular job as part of the overall job of confronting all the trade-offs. It's a bet on race to the top, right? It's a bet on race to the top. Like it's not up here upside bet. Things could go wrong. But we're all aligned on like, this is the bet that we're making. And markets are pragmatic. So if the more successful and proper it becomes as a company, the more incentive for us for people to copy the things that make us successful, and the more that success is tied to actual safety stuff we do, the more it just creates a gravitational force in the industry, but we'll actually get the rest of industry to compete. And it's like, sure, we'll build seat belts and everyone else can copy them. That's good. That's like good worlds. That's really good. Yeah, this is the race to the top, right? But if you're saying, well, we're not going to build the technology, you're not going to build it better than someone else, that in the end, that just doesn't work because you're not proving that it's possible to get from here to there. Where the world needs to get, never mind the industry, never mind one company is, it needs to get us successfully through from this technology, does, doesn't exist to the technology exists in a very powerful way and society has actually managed it. And I think the only way that's going to happen is that if you have at the level of a single company and eventually at the level of the industry, you're actually confronting those trade offs. You have to find a way to actually be competitive, to actually lead the industry in some cases and yet manage to do things safely. And if you can do that, the gravitational pull you exert is so great. There's so many factors from the regulatory environment, the kinds of people who want to work at different places, to even sometimes the views of customers that kind of drive in the direction of if you can show that you can do well on safety without sacrificing competitiveness, right? If you can find these kind of win-wins, then others aren't incentivized to do the same thing. Yeah, I mean, I think that's why getting things like the RSP right is so important because I think that we ourselves, seeing where the technology is headed, have often thought, oh, wow, we need to be really careful of this thing. At the same time, we have to be even more careful not to be crying wolf saying that like innovation needs to stop here. We need to sort of find a way to make AI useful, innovative, delightful for customers, but also figure out what the constraints really have to be that we can stand behind that make systems safe so that it's possible for others to think that they can do that too and they can succeed, they can compete with us. We're not, we're not doomers, right? Like we want to build the positive thing, we want to build the good thing. And we've seen it happen in practice. A few months after we came out with our RSP, the three most prominent AI companies had one, right? Interprety ability research, that's another area we've done it. Just the focus on safety overall, like collaboration with the AI Safety Institutes, other areas. Yeah, but Frontier Red team got cloned almost immediately, which is good. You want all the labs to be testing for very, very secure, scary risks, export the seat belts. Yeah, export the seat belts. Well Jack also mentioned it earlier, but customers also really care about safety, right? Customers don't want models that are hallucinating, they don't want models that are easy to jailbreak, they want models that are helpful and formless, right? And so a lot of the time what we hear in customer calls is just we're going with Claude because we know it's safer. I think that is also a huge market impact, right? Because our ability to have models that are trustworthy and reliable, that matters for the market pressure that it puts on competitors too. Maybe to unpack something that Dario said a little bit more. I think there's the narrative or this idea that maybe the virtuous thing is to almost like nobly fail, right? It's like you should go and put safety, you should go and put things, you should sort of demonstrate like in an impregmatic way so you can sort of demonstrate your purity to the cause or something like this. And I think if you do that, it's actually very self-defeating. For one thing, it means that you're going to have the people who are deciding, making decisions, be self-selected for being people who don't care and for people who aren't prioritizing safety and who don't care about it. And I think on the other hand, if you try really hard to find the way to align the incentives and make it so that if there are hard decisions, they happen at the points where there is the most force to go and support making the correct hard decisions and where there's the most evidence. Then you can sort of start to trigger this race to the top that Dario is describing. Or instead of going and having the people who care get pushed out of influence, you instead pull other people to have to go and follow. So what are you all excited about when it comes to the next thing we'll be working on? I think there's a bunch of reasons you can be excited about Interpreter. One is obviously safety, but there's another one that I think I find at an emotional level equally exciting or equally meaningful to me, which is just that I think neural networks are beautiful. And I think that there's a lot of beauty in them that we don't see. We treat them like these black boxes that were not particularly interested in the internals. But when you start to go and look inside them, they're just full of amazing, beautiful structure. It's sort of like if people looked at biology and they were like, you know, like evolution is really boring. It's a simple thing that goes and runs for a long time and then it makes animals. And instead, it's like actually, each one of those animals that evolution produces. And I think it's an optimization process like training and neural network. They're full of incredible complexity and structure. And like we have an entire sort of artificial biology inside of neural networks. If you're just willing to look inside them, there's all this amazing stuff. And I think that we're just starting to slowly unpack it and it's incredible. And there's so much there. But there's just so much we discovered there. We're just starting to crack it open. And I think it's going to be amazing and beautiful. And sometimes I imagine, you know, like a decade in the future, walking into a bookstore and buying, you know, the textbook on neural network and the biology of neural networks and just the kind of wild things that are going to be inside of it. And I think that in the next decade, we're going to, in the next couple of years, even, we're going to go and start to go and really discover all of those things. And it's going to be wild and incredible. It's also going to be great that you get to buy your own textbook. I mean, I'm excited that a few years ago, if you had said like governments will set up new bodies to like test and evaluate AI systems and they will actually be competent and good. You would have not thought that was going to be the case, but it's happened. And there it's kind of like governments have built these new embassies almost to deal with this new kind of class of technology or like thing as the Chris studies. And I'm just very excited to see where that goes. I think it actually means that we have state capacity to deal with this kind of societal transition. So it's not just companies, and I'm excited to help with that. I'm already excited about this to a certain extent today. But I think just imagining the future world of what AI is going to be able to do for people is it's impossible to not feel excited about that. Dario talks about this a lot, but I think even just the sort of glimmers of Claude being able to help with vaccine development and cancer research and biological research is crazy, like just to be able to watch what it can do now. But when I fast forward three years in the future or five years in the future, imagining that Claude could actually solve so many of the fundamental problems that we just face as humans, just even just from a health perspective alone, even if you sort of take everything else out, feels really exciting to me, just thinking back to my international development. Times, it would be amazing if Claude was responsible for helping to do a lot of the work that I was trying to do a lot less effectively when I was like 25. I mean, I get, I guess similarly I'm excited to build Claude for work. Like I'm excited to build, like I'm excited to build Claude into the company and into companies all over the world. I guess I'm excited just for, I guess like personally, like I like using Claude a lot. So like, I definitely, there's been increasing amounts of like home times with like me just like chatting with Claude about stuff. I think the biggest recent thing has been code where like six months ago, like I didn't use Claude to do any coding work like our teams didn't really use Claude that much for coding. And now it's like just face difference. Like I give a talk at YC like a week before last and at the beginning I just asked like, okay, so like how many how many folks here use Claude for coding now? And literally 95% of hands. Like all the hands in the room, which is just like is totally different than how it was four months ago. So when I think about what I'm excited about, I think about places where, you know, like I said before, where there's this kind of consensus that that again, seems like consensus, seems like what everyone wise thinks and then it just kind of breaks. And so places where I think that's about to happen and it hasn't happened yet. One of them is interpretability. I think interpretability is both the key to steering and making safe AI systems and we're about to understand and interpretability contains insights about intelligent optimization problems and about how the human brain works. I've said and I'm really not joking. Chris Ola is going to be a future Nobel medicine lawyer. I'm serious. I'm serious because a lot of these, I used to be a neuroscientist. A lot of these mental illnesses, the ones we haven't figured out, right? Schizophrenia or the mood disorders. I suspect there's some higher level system thing going on and that it's hard to make sense of those with brains because brains are so mushy and hard to open up and interact with. Neural nets are not like this. They're not a perfect analogy, but as time goes on, they will be a better analogy. That's one area. That is related to that. I think just the use of AI for biology. Biology is an incredibly difficult problem. People continue to be skeptical for a number of reasons. I think that consensus is starting to break. We saw a Nobel Prize in chemistry awarded for AlphaFold Remarkable Accomplishment. We should be trying to build things that can help us create 100 AlphaFolds. And then finally, using AI to enhance democracy. We worry about if AI is built in the wrong way, it can be a tool for authoritarianism. How can AI be a tool for freedom and self-determination? I think that one is earlier than the other two, but it's going to be just as important. Yeah, I mean, I guess two things that at least connect to what you were saying earlier. One is I feel like people frequently join in, and they're sort of scientifically really curious about AI. And then kind of get convinced by AI progress to sort of share the vision of the need, not just to advance the technology, but to understand it more deeply, and to make sure that it's safe. But I feel like it's actually just sort of exciting to have people that you're working with, like kind of more and more united in their vision for both what AI development looks like, and these sort of sense of responsibility associated with it. I feel like that's been happening a lot due to a lot of advances that have happened in the last year, like when Tom talked about. Another is that, I mean, going back really to concrete problems, I feel like we've done a lot of work on AI safety up until this point, a lot of it's really important. But I think we're now with some recent developments really getting a glimmer of what kinds of risks might literally come about from systems that are very, very advanced so that we can investigate and study them directly with interpretability, with other kinds of safety mechanisms, and really understand what the risks from very advanced AI might look like. And I think that that's something that is really going to allow us to further the mission in a really deeply scientific and critical way. And so I'm excited about the next six months of how we use our understanding of what can go wrong with advanced systems to characterize that and figure out how to avoid those pitfalls. Perfect. Finn. Okay. Good job. We got to do this all the time. This is the only time we ever get to do it.
TL;DR
- The rapid advancement of AI was fueled by a shift away from the cautious "AI winter" mentality, embracing ambitious scaling laws that demonstrated consistent performance gains with larger models.
- Early AI safety efforts, including the "Concrete Problems in AI Safety" paper, aimed to ground abstract concerns in practical machine learning, building consensus across institutions.
- Anthropic's Responsible Scaling Policy (RSP) is presented as a foundational, iterative framework designed to integrate safety as a core product requirement, ensuring models are tested and secured at various development thresholds.
Takeaways
- The prevailing "AI winter" mindset initially discouraged ambitious AI visions, which was gradually overcome as empirical evidence for scaling laws emerged.
- Scaling laws revealed that simply making AI models larger, by increasing compute and parameters, led to eerily consistent performance improvements across diverse tasks.
- Language models, particularly when combined with Reinforcement Learning from Human Feedback (RLHF), were identified as a promising path to align AI systems with human values by enabling them to understand implicit knowledge.
- The "Concrete Problems in AI Safety" paper (circa 2016) was a strategic effort to build consensus around AI safety by presenting practical problems grounded in contemporary ML, thereby making safety a credible area of research.
- "Constitutional AI" leverages the ability of large language models to read and internalize principles, effectively acting like a "multiple-choice exam solver" for ethical guidelines.
- Anthropic's Responsible Scaling Policy (RSP) functions as an internal "holy document" that establishes thresholds for model capabilities, requiring specific safety tests and security measures at each stage of development.
- The RSP ensures organizational alignment by making safety a non-negotiable product requirement, clarifying expectations, and preventing both excessive caution and uncalibrated "fire drills."
- Developing and implementing effective AI safety policies like the RSP is an iterative, challenging process that frequently uncovers unforeseen "gray areas," necessitating continuous adaptation and refinement.
Vocabulary
AI winter — A period of reduced funding and interest in artificial intelligence research, typically following periods of over-optimism.
Google Brain — A research division at Google focused on deep learning and artificial intelligence.
OpenAI — An AI research and deployment company that aims to ensure artificial general intelligence benefits all of humanity.
scaling laws — Empirical relationships observed in AI that describe how model performance improves predictably as resources like compute, data, and model size increase.
GPT-2 — A transformer-based language model developed by OpenAI, notable for its ability to generate coherent text.
GPT-3 — A successor to GPT-2, even larger and more powerful, capable of performing a wide range of natural language tasks.
Anthropic — An AI safety and research company, founded by former OpenAI members, focused on building reliable, interpretable, and steerable AI systems.
language models — AI models designed to understand, generate, and process human language.
RLHF — Reinforcement Learning from Human Feedback; a technique used to align AI models with human preferences by training them with human-generated feedback.
Constitutional AI — A method for aligning AI models by giving them a "constitution" of principles or rules, which they use to evaluate and refine their own outputs.
Responsible Scaling Policy (RSP) — A framework developed by Anthropic to manage the risks associated with increasingly powerful AI models, setting thresholds for capabilities and requiring safety measures.
ImageNet — A large visual database designed for use in visual object recognition software research.
GPUs — Graphics Processing Units; specialized electronic circuits designed to rapidly manipulate and alter memory to accelerate the creation of images, crucial for AI training.
arXiv papers — Pre-print research papers published on arXiv.org, a repository for electronic preprints of scientific papers in fields like physics, mathematics, computer science, and more.
Transcript
Why are we working on AI in the first place? I'm just going to arbitrarily pick Jared. Why are you doing AI at all? I mean, I was working on physics for a long time, and I got bored, and I wanted to hang out with more of my friends. So I thought, I thought, are you pitched you on it? I don't think I explicitly pitched you any point. I just kind of like showed you results of like AI models. And I was trying to make the point that like, they're very general and like they don't apply to one thing. And then like, just at some point after I showed you enough of them, you were like, oh yeah, it seems like it's right. How long have you been a professor before, like when you started? I think like six years or so. I think I helped recruit Sam. I talked to you and you were like, I think I've created a good bubble here. And like, my goal is to get Tom to come back. And then it worked. And did you meet everyone through Google when you were doing the interpreter level C stuff, Chris? No, so I guess I actually met a bunch of you when I was 19. And I was listening to the air, yeah, for the first time. So I guess I met Dario and Jared then, I guess they were postdocs, which I thought was very cool at the time. And then I was working at Google Brain and Dario joined. And we sat side by side actually for a while. We had desks beside each other. And I worked with Tom there as well. And then of course, I went to work with all of you at OpenAI when I went there. Yeah. So I guess I've known a lot of you for like more than a decade, which is kind of wild. Farron, remember correctly, I met Dario in 2015 when I went to a conference era and I tried to interview you and Google PR said I would have read all of your research papers that you need. Yeah, I think I was writing concrete problems in AI scene when I was at Google. I think you wrote a story about that paper. I did. I remember right before I started working with you, we had I think you invited me to the office to come chat and just like tell me everything about AI. And you explained, I remember afterwards being like, oh, I guess this is, I guess this is not as much we're serious than I realized. And you were like probably explaining the big love of compute and like parameter counting and how many neurons are in the brain, everything. I feel like Dario often has that effect on people. This is much more serious than I realized. Yeah, I'm the bringer of happy times. But I remember when we were at OpenAI, where there was the scaling law stuff and just making things bigger and it started to feel like it was working. And then it kind of kept on eerily working on a bunch of different projects, which I think is how we all ended up working closely together because it was first DPC2. And then scaling laws and GPC3 and we ended up being people. We're at the plot of people that were making things work. Yeah. That's right. I think we're also excited about safety, because there was sort of this idea that AI would become very powerful, but like potentially not understand human values or not even be able to communicate with us. And so I think we were all like pretty excited about language models as a way to kind of guarantee that AI systems would have to understand kind of implicit knowledge that. And RL for human feedback on top of language models, which was the whole reason for scaling these models up was that we couldn't do, the models weren't smart enough to do RLHS F on top of. So that's the kind of intertwinement of safety and scaling of the models that we still believe in today. Yeah, I think there was also an element of like the scaling work was done as part of the safety team that Gario started at OpenAI because we thought that forecasting AI trends was important to be able to have us take them seriously and take safety seriously as a problem. Correct. Yeah, I mean, we took, I remember being in some airport in England sampling from GPT-2 and using it to write fake news articles and slacking Dario and being like, oh, this stuff actually works and by F like huge policy implications. I think Dario said something like, yes. That's a tough one. That's a tough one. It's a tough one. But then we worked on that a bunch as well as the release stuff, which was kind of wild. Yeah, I remember the release stuff. I think that was when we first started working together. Yeah. That was a fun time that GPT-2 launched. Yeah, but I think it was good for us because we did a kind of slightly strange safety oriented thing altogether and then we ended up doing Anthropic, which is a much larger, slightly strange, safety oriented thing. That's right. So I guess just like going back to the concrete problems because I remember so I joined OpenAI 2016, one of the first 20 employees or whatever with Udario. And I remember at that time the concrete problems in ASAFD seemed like it was the first mainstream AI safety. Yes. Paper? I don't really know if I ever asked you what the story was for how that came about. Chris knows the story because he was involved in it. I think we were both at Google. I forget what other project I was working on. But with many things, it was my attempt to procrastinate from whatever other project I was working on that I've now completely forgotten what it was. But I think it was like Chris and I decided to write down what are some open problems in terms of AI safety and also AI safety you usually talked about in this very kind of absurface abstract way. Can we kind of ground it in the ML that was going on at the time? I mean, now there's been like six, seven years of work in that vein, but there was almost a strange idea at the time. Yeah, I think there's a way in which it was almost a kind of political project where at the time a lot of people didn't take safety seriously. So I think that there was sort of this goal to collate a list of problems that sort of people agreed were reasonable, often already existed in literature, and then get a bunch of people across different institutions who were credible to be authors. And like I remember I had this like whole long period where I just talked to like 20 different researchers at brain to build support for publishing the paper. Like in some ways if you look at it in terms of the problems and a lot of things that emphasized, I think it hasn't held up that well in that it's, I think it's not really the right problems. But I think if you sort of see it instead as a consensus building exercise that there's something here that is real and that is worth taking seriously, then it was a pretty important moment. I mean, you end up in this really weird sci-fi world where I remember at the start of Anthropic we were talking about constitutional AI. And I think Jared said, oh we're just gonna write like a constitution for a language model and that'll change all of its behavior. And I remember that sounded like incredibly crazy. Of the time. But why did you guys think that was gonna work? Because I remember that was one of the first early big research ideas we had of the company. Yeah, I mean I think Darryan and I had talked about it for a while. I guess I think simple things just work really, really well in AI. And so like I think the first versions of that were like quite complicated but then we kind of like whittled away into like just use the fact that AI systems are good at solving multiple choice exams and like give them a prompt that tells them like what they're looking for. And that was kind of what we needed. And then we were able to just write down these principles. I mean it goes back to like the big blob of computer, the bitter lesson or the scaling hypothesis. If you can identify you know, something that you can give the AI data for and that's kind of a clear target, you'll get it to do it. Here's this set of instructions. Here's this set of principles. AI language models can like read that set of principles and they can like compare it to the behavior they themselves are engaging in. And so like you've got your training target there. So once you know that, I think my view and Jared's view is there's a way to get it to work. You just have to fiddle with enough of the details. Yeah, I think it was always weird for me, especially in these early areas. Because like I was in physics and then coming from physics. And I think now we forget about this because everyone's excited about AI. But like I remember talking to Dario about concrete problems and other things. And I just got the sense that AI researchers were very, very kind of psychologically damaged by the AI winter where they were, they just kind of felt like having like really ambitious ideas or ambitious visions was like very disallowed. And that's kind of how I imagine it was in terms of talking about safety. In order to care about safety, you have to believe that AI systems could actually be really powerful and really useful. And I think that like there was kind of a prohibition against being ambitious. And I think one of the benefits is that physicists are very arrogant and so they're constantly doing really ambitious things and talking about things in terms of grand schemes. And so yeah. I mean, I think that's, I think that's definitely true. Like I remember in 2014, it was like there were just like, I don't know, there were just like some things you couldn't say. Right? But I actually think it was kind of an extension of problems that exist across academia other than maybe theoretical physics. Where they've kind of evolved into very risk-averse institutions for a number of reasons. And even the industrial parts of AI had kind of transplanted or forklifted that mentality. And it took a long time. I think it took until like 2022 to get out of that mentality. So there's a weird thing about like what does it mean to be conservative and respectful? Where you might think like one one version you could have is that what it means to be conservative is to take the risks or the potential harms of what you're doing really seriously and worry about that. But another kind of conservatism is to be like, ah, taking an idea too seriously and believing that it might succeed is sort of like scientific arrogance. And so I think there's like kind of two different kinds of conservatism or caution. And I think we were sort of in a regime that was very controlled by that one. I mean, you see it historically, right? Like if you look at the like early discussions in 1939 between you know, people involved in nuclear physics about what the nuclear bombs were sort of a serious concern. You see exactly the same thing with like for me resisting these ideas because it just seemed kind of like a crazy thing and other people like Zillard or Teller taking the ideas seriously because they were worried about the risks. Perhaps the deepest lesson that I've learned in the last 10 years and probably all all, you know, all of you have learned some form of it as well. Is there can be this kind of seeming consensus? These things that kind of everyone knows that that I don't know seems sort of why seem like their common sense. But really they're just they're just kind of hurting behavior, masquerading as maturity and sophistication. And when you've seen the consensus can change overnight. And when you've seen it happen a number of times, you you you suspected, but you didn't really bet on it. And you're like, oh man, I kind of thought this. But what do I know? Ha, ha, you know, how can I be right and all these people are wrong? You see that a few times. Then you just start saying, nope, this is the bad we're going to make. I don't know for sure if we're right, but like just ignore all this other stuff. See it happen. And I don't know, even if you're right, 50% of the time being right, 50% of the time contribute so much, right? You're you're you're adding so much that is not being added by anyone else. Yeah. It feels like that's where we are today with some safety stuff. Where there's like a consensus for you that a lot of this safety stuff is unusual or doesn't naturally fall out of the technology. And then I'm not going to pick we do all of this research where weird safety misalignment problems fall out as a natural dividend of the tech we're building. So it feels like we're in that counter consensus for you right now. But I feel like that has been shifting over the past, even just like 18. We've been helping. We've definitely. I know. I mean, the public should be doing research. The public should be. But I also think just like world sentiment around AI has shifted really dramatically. And you know, it's more common in the user research that we do to hear just customers, regular people all say, I'm really worried about what the impact of AI on the world more broadly is going to be. And sometimes that means jobs or bias or toxicity. But it also sometimes means like, is this just going to mess up the world? Right? Is it how is this going to contribute to fundamentally changing how humans work together or operate? Which is, I wouldn't have predicted that actually. Yeah. But yeah, for whatever reason it seems like it seems like people in the ML research sphere have always been more pessimistic about AI coming very powerful and the general public. Maybe it's a real public disability or something. I went to Aryan. I went to the White House in 2023. In that meeting like Harris and Romondo and stuff basically said, uh, paraphrase, but basically said like, we've got a high on you guys like, AI is going to be a really big deal and we're now actually paying attention. Which is, and they're right. They're right. Absolutely right. But I think in 2018 you wouldn't have been like, the president will call you to the White House to tell you they're paying close attention to the development of language models. Yeah. Just like a crazy place. That's not the big thing. Yeah. Like to hear to you. One thing that I think is interesting to just, uh, is like, I guess like all of us kind of got into this when it didn't seem like there was, like, we thought that it could happen. But yeah, it was like, like fair maybe in like skeptical of the atomic bomb. It was like, he had, he was just a good scientist and like, there was some evidence that it could happen, but there also was a lot evidence against it happening. And he, I guess decided that it would be worthwhile because if it was true, then it would be a big deal. And I think for all of us is like, yeah, like 2015, 2016, 2017, like there was some evidence and increasing evidence that this might be a big deal. But like I remember in 2016, like talking to all my advisors, when I was like, I've done startup stuff. Like I want to help out with the, with like, hey, I saved you, but like, I'm like, not great at math. I don't know, don't know how to, don't exactly know how I can do it. And I think at the time people were like, either were like, well, you need to be super good at like decision theory in order to help out. And I was like, that probably not going to work. Or they were like, like, it doesn't really seem like we're going to get some crazy AI thing. And so I had only a few people basically that were like, like, yeah, okay, that seems like a good thing to do. I remember in 2014 making graphs of image net results over time, when I was a journalist and trying to like get stories published about them, and people thought it was completely mad. And then I remember in 2015, trying to persuade Bloomberg to let me write a story about Nvidia, because every AI research paper had started mentioning the use of GPUs. And they said that was completely mad. And then in 2016, when I left journalism to go into AI, I have these emails saying, you're making like the worst mistake of your life. Which I now occasionally look back on. But it was like, it was seemed crazy at the time. From like, any many perspectives to go and take this seriously, it's scaling was going to work. And something was maybe different about the technology paradigm. You're like Michael Jordan and that coach that didn't believe in him and he's. How did you actually make the decision though? Was it, did you feel torrent or was it obvious to you? I did a crazy counter bet where I said, let me become your full-time AI reporter and like double my salary, which I knew they wouldn't say yes to. And then I went to sleep and then I woke up and resigned. It was all fairly relaxed. So you're just a decisive guy? In that instance, so far. I think it's because I was like going to work, reading archive papers, and then printing archive papers off and coming home and reading archive papers, including like Dario's papers and from the, the buy do stuff. And being like something like completely crazy is happening here. And at some point I thought you should bet with conviction. Which I think everyone here has done in that career. This is just betting of conviction that this is going to work. Yeah. I definitely, I definitely was not as decisive as you guys. But like six months, like, like, like, walking like okay, like, should I do it? Like, should I try to do a startup? Should I kind of do this thing? But I also feel like back then there were, there wasn't as much talk of engineers and the impact that an engineer could have in the AI, right? That feels so natural to us now. And we're at the same sort of talent raise for engineers of all different types. But the time it was like you're a researcher. And that's the only people that can work on AI. So I don't think it was crazy that you were spending time thinking about that. Yeah. And I think that that was, that was basically the thing that got me to join Open AI. I was like, I like, I messaged the people there. And they were like, yeah, we actually think that you can help out by engineering, engineering work. Yeah. And like that, that you can help out with AI safety in that way. Which I think there hadn't really been an opportunity for that. So that was what I mean. You were my manager at Open AI. I was. That's right. I think I joined after you'd been there for a while. A little bit. I was at brain for a bit. Yeah. I don't know if I ever asked you like, what it was that they got you to join? Yeah. So I had been at Stripe for about five and a half years. And I knew Greg. He had been my boss. He was my boss at Stripe for a while. And I actually introduced him in Dario because I said when he was starting Open AI, I was like, the smartest person that I know is Dario. You should, you would be really lucky to get him. So Dario was at Open AI. I had a few friends from Stripe that had gone there too. And I think sort of like you, I'd been thinking about what I wanted to do after Stripe. I had gone there just because I wanted to get more skills after working in nonprofit and international development. And I actually thought I was going to go back to doing that. Essentially, I had always been working. I was like, I really want to help people that have less than I do. But I didn't have the skills when I was doing it before Stripe. And so I looked at going back to public health. I thought about going back into politics very briefly. But I was also looking around at other tech companies and other ways of having impact. And Open AI at the time felt like it was a really nice intersection. It was a nonprofit. They were working on this really big lofty mission. I really believed in sort of the AI potential. Because I mean, I know Dario a little bit. And so they made it management helped. Yeah, they did it in a mental health. That is a fact. And so I think that felt very me-shaped. I was like, oh, there's this nonprofit. And there's all these really great people with these really good intentions. But it seems like they're a little bit of a mess. And that felt really exciting to me to get to come in. And even just I was such a utility player. I was running like people, but I was also running some of the technical teams. Yeah, the scaling or I worked on the language team. I took over. I worked on policy stuff. I worked on some policy stuff. I worked with Chris. And I felt like there was just so much goodness in so many of the employees there. And I felt a very strong desire to come and sort of try to help make the company a little more functional. I move towards the end after we've done GPT-3. You were like, have you guys heard of something called trust and say, yes. No, no, we're not. I was like, yeah. I said, you know, I used to run some trust and safety teams at Stripe. There's a thing called trust and safety that you might want to consider for a technology like this. And it's funny because it's sort of is the intermediary step between AI safety research, which is how do you actually make the model save to something just much more practical? I do think there was a value in saying, this is going to be a big thing. We also have to be doing this sort of practical work day to build the muscles for when things are going to be a lot higher stakes. That might be a good transition point to talk about. I think that the responsible scaling policy and how we came up with that or why we came up with it and how we're using it now, especially given how much trust and safety work we do on today's models. So whose idea was VRSP? You and Soul? Yeah, it was me and Paul first talked about it in late, Paul Cristiano in late 2022. First it was like, oh, should we cap scaling at a particular point until we've discovered how to solve certain safety problems? And then it was like, well, it's kind of strange to have this one place where you cap it and then you uncap it. So let's have a bunch of thresholds. And then at each threshold, you have to do certain tests to see if the model is capable and you have to take increasing safety and security measures. Originally we had this idea. And then the thought was just look like, this will go better if it's done by some third party. We shouldn't be the ones to do it. It shouldn't come from one company because then other companies are less likely to adopt it. So Paul actually went off and designed it and many features of it changed. And we were kind of on our side working on how it should work. And once Paul had something together, then pretty much immediately after he announced the concept, we announced ours within a month or two. I mean, many of us were heavily involved in it. I remember writing at least one draft of it myself, but there were several drafts of it. There were so many drafts. I think it's got the most drafts of any doc. Which makes sense, right? It's like, I feel like it is in the same way that the US treats the Constitution as the holy document. It's like, I think it's just a big thing that strengthens the US. And we don't expect the US to go off the rails in part because just every single person in the US is like, the Constitution is a big deal. And if you tread on that, I'm mad. Yeah. I think that the RSP is our, it holds that thing. It's the holy document for Anthropics. So it's like, we're doing a lot of iterations getting right. Some of what I think has been so cool to watch about the RSP development at Anthropic too is it feels like it has gone through so many different phases and there's so many different skills that are needed to make it work, right? There's the big ideas, which I feel like Dario and Paul and Sam and Jared and so many others are like, what are the principles? Like, what are we trying to say? How do we know if we're right? But there's also this very operational approach to just iterating where we're like, well, we thought that we were gonna see this at this safety level and we didn't. So should we change it so that we're making sure that we're holding ourselves accountable? And then there's all kinds of organizational things, right? We just, we're like, let's change the structure of the RSP organization for clearer accountability. And I think my sense is that for a document that's as important as this, right? I love the Constitution analogy. It's like there's all of these bodies and systems that exist in the US to like make sure that we follow the Constitution, right? There's the courts, there's the Supreme Court, there's the presidency, there's the, you know, the, both houses of, you know, Congress and they do all kinds of other things, of course, but there's like all of this infrastructure that you need around this like one document and I feel like we're also learning that lesson here. I think it's sort of reflective view, we a lot of us have about safety, which is that it's a solvable problem. It's just a very, very hard problem it's gonna take tons and tons of work. Yeah. All of these institutions that we need to build up, like there's all sorts of institutions build up around like automotive safety, build up over many, many years. But we're like, do we have the time to do that? We've got to like, go as fast as we can to like, figure out what the institutions we need for ASAPDR and build those and like try to build them here first but make it exportable. It forces unity also because if any part of the org is not kind of in line with our safety values, it shows up through kind of the RSP. Like the RSP is gonna block them from doing what they wanna do. And so it's a way to remind everyone over and over again, basically to make safety a product requirement, part of the product planning process. And so like, it's not just a bunch of kind of like broad lives that we repeat. It's something that you actually, if you show up here and you're not aligned, you actually run into it. And like you either have to learn to get with the program or it doesn't work out. The RSP's become kind of funny over time because we spend thousands of hours of work on it. And then I go and talk to senators and I explain the RSP and I'm like, we have some stuff that means it's hard to steal what we make and also that it's safe. And they're like, yes, that's a completely normal thing to do. Are you telling me not everyone does this? You're like, oh, okay, yeah. It's not true, everyone does this. But it's amazing because we've said so much effort and we love it. Yeah, when you boil it down, they're like, yes, that sounds like a normal way. That sounds good. That's been the goal. Like, in yellow was saying, let's make this as boring and normal, like, let's make this a financial thing. Yeah, imagine it's like an audit. Yeah, yeah, yeah. Boring, boring, boring and normal is what we want. Certainly right respect. Well, also, Dario, I think in addition to driving alignment, it also drives clarity because it's really, it's written down what we're trying to do. And it's legible to everyone in the company and it's legible externally. What we think we're supposed to be aiming towards from a safety perspective. It's not perfect. We're iterating on it. We're making it better. But I think there's some value in saying, like, this is what we're worried about. This thing over here, like, you can't just use this word to sort of derail something in either direction, right? To say, oh, because of safety, we can't do X. Or because of safety, we have to do X. We're really trying to make it clearer what we mean. Yeah, you can't, it prevents you from worrying about every last little thing under the sun, right? Because it's actually, it's actually a fire drill that damage the cause of safety in the long run. I've said, like, if there's a building and then the fire alarm goes off every week, like, that's a really unsafe building is when there's actually a fire. It's like, oh, it just goes off all the time. So it's very important to be calibrated. Yes, that's right. Yeah. A slightly different frame that I find kind of clarifying is that I think that RSP creates healthy incentives at a lot of levels. So I think internally it aligns the incentives of every team with safety because it means if we don't make progress on safety, we're going to block. I also think that externally it creates a lot of healthier incentives than other possibilities, at least that I see, because it means that, you know, if we at some point have to take some kind of dramatic action, like if at some point we have to say, you know, our model, we've reached some point and we can't make a model safe. It aligns that with sort of the point where there's evidence that supports that decision. And there's sort of a pre-existing framework for thinking about it, and it's legible. And so I think there's a lot of levels at which the RSP, I think in ways that maybe I tend to initially understand when we are talking about the early versions of it, it creates a better framework than any of the other ones that I've thought about. I think this is all true, but I feel like it undersells how challenging it's been to sort of figure out what the right policies and evaluations and what the lines should be. I think that we have and continue to sort of iterate a lot on that, and I think there is a question. Also, that's difficult of sort of, you could be at a point where it's very clear some things dangerous or very clear that something's safe, but with some technology that's so new, there's actually like a big gray area. And so I think that has been like all of the things that we're saying were things that made me really, really excited about the RSP at the beginning and still do, but also I think enacting this in a clear way and making it work has been much harder and more complicated than I anticipated. I think this is exactly the point. The gray areas are impossible to predict. There's so many of them until you actually try to implement everything, you don't know what's going to go wrong, so what we're trying to do is go and implement everything so we can see as early as possible what's going to go wrong. You have to do three or four passes before you really get it right. Like iteration is just very powerful, and you're not going to get it right on the first time, and so if the stakes are increasing, you want to get your iterations in early, you don't want to get them in late. You're also building the internal institutions and processes, so the specifics might change a lot, but building the muscle of just doing it is the really valuable thing. I'm responsible for compute at Anthropic and so it's important. So thank you. So I think that for me, for me, we have to deal with external folks. And different external folks are on different spectrums of the how fast do they think stuff is going to get. And I think that's also been a thing where I started out, not thinking stuff would be that fast and have changed over time, and so I have sympathy for that. And so I think the RSP has been extremely useful for me in communicating with people who think that things might take longer, because then we have a thing where it's like we don't need to do extreme safety measures until stuff gets really intense. And then we can be like, they might be like, I don't think stuff will get intense for a long time, and then I'll be like, OK, yeah, we don't have to do extreme safety measures. And so that makes it a lot easier to communicate with other folks externally. Yeah, yeah, it makes it like a normal thing you can talk about rather than something really strange. Yeah. How else is it like showing up for people? You're Eval's Eval's Eval's. Good. It's all about Eval's. Everyone's doing Eval's. The training team is doing Eval's all the time. We're trying to figure out, like, has this model gotten enough better that it has the potential to be dangerous. So how many teams do we have that are Eval's teams? You know, front to your red team. There must be. There's a lot of people that are in the middle of the night. Every team like Eval's Eval's basically. And that means you're just measuring against the RSP, like measuring for certain signs of things that would concern you or not concern you. Exactly. Like, it's easy to lower bound the abilities of a model, but it's hard to upper bound. So we just put tons and tons of research effort into saying, can this model do this dangerous thing or not? Maybe there's some trick that we haven't thought of, like, chain of thought or best of end or some kind of tool use that's going to make it so it can help you do something very dangerous. It's been really useful in policy because it's been a really abstract concept, what safety is. And when I'm like, we have an Eval, which changes whether we deploy the model or not. And then you can go and calibrate with policy makers or experts in national security or some of the CBRN areas that we do to actually help us build Eval's but are well calibrated. And that can't have actually just wouldn't have happened otherwise. But once you've got the specific thing, people are a lot more motivated to help you make it accurate. So it's been useful for that. How is it for you? The RSC shows up for me for sure. Often. I actually think that the weirdly the way that I think about the RSP the most is what it sounds like. Just like the tone. I think we just did a big rewrite of the tone of the RSP because it felt overly technocratic and even a little bit adversarial. I spent a lot of time thinking about, how do you build a system that people just want to be a part of? It's so much better if the RSP is something that everyone in the company can walk around and tell you, just like with OKRRs, like we do right now. Like, what are the top goals of the RSP need? How do we know if we're meeting them? What AI safety level are we at right now? Are we at ASL2? Are we at ASL3? That people know what to look for because that is how you're going to have good common knowledge of if something's going wrong. If it's overly technocratic and it's something that only particular people in the company feel as accessible to them. It's just like not as productive. And I think it's been really cool to watch it sort of transition into this document where I actually think most, if not everybody at the company, regardless of their role, could read it and say, this feels really reasonable. I want to make sure that we're building AI in the following ways. And I see why I would be worried about these things. And I also kind of know what to look for if I bump into something, right? It's almost like make it simple enough that if you are working at a manufacturing plant and you're like, huh, it looks like the safety seat belt on this should connect this way, but it doesn't connect that you can spot it. And that there's just like healthy feedback flow between leadership and the board and the rest of the company and the people that are actually building it. Because I actually think the way this stuff goes wrong in most cases is just like the wires don't connect or like they get crossed. And that would just be like a really sad way for things to go wrong, right? It's just all about operationalizing it, making it easy for people to understand. The thing I would say is none of us wanted to find a company. We felt it. We felt like it was our duty, right? I felt like we had to. We have to do this thing. This is the way we're going to make things go better with the AI. That's also why we did the pledge, right? Because the reason we're doing this is it feels like our duty. I wanted to invent and discover things in some kind of beneficial way. That was how I came to it. And that led to working on AI and AI required a lot of engineering and eventually AI required a lot of capital. But what I found was that if you don't do this in a way where you're setting the environment, where you set up the company, then a lot of it gets done. A lot of it repeats the same mistakes that I found so alienating about the about the tech community. It's the same people, it's the same attitude, it's the same pattern matching. And so at some point it just seemed inevitable that we do it in a different way. When we were hanging out in graduate school, I remember you had kind of this whole program of trying to figure out how to do science in a way that would sort of advance the public good. And I think that's like pretty similar to how we think about this. Maybe you have this like project Vannevar or something to do that. I was a professor. I think basically I just looked at the situation and I was convinced that AI was on a very, very, very steep trajectory in terms of impact. It didn't seem like because of the necessity for capital, like as a physics professor, I could continue doing that. And I kind of wanted to work with people that I trusted in building institution to try to make kind of AI go well. But yeah, I would never recommend founding a company or really want to do it. I think it just, it just a means to an end. I mean, I think that's like usually how things go well though. If you're doing something just to sort of like enrich yourself or gain power or like you have to sort of actually care about accomplishing a real goal in the world and then you find whatever means you have to. Well, something I think about a lot is just a strategic advantage for us is I mean, it sounds really funny to say but just like how much trust there is at this table, right? Like I think that's not, I mean, Tom, you were at other startups. I was never a founder before but it's actually really hard to get a group of like a big group of people to have like the same mission, right? And I think the thing that I feel like the happiest about when I come into work and probably most proud of Adanthropic is how well that has scaled to a lot of people. It feels to me like in this group and with the rest of leadership, everyone is here for the mission and our mission is really clear and it's very pure, right? And I think that is something that I don't see as often to Dario's point in sort of the tech industry. It feels like there's just a wholesomeness to what we're trying to do. Like, no, I agree. Like none of us were like, let's just go and founded a company. I felt like we had to do it, right? It just felt like we couldn't keep doing what we were doing, the place where we were doing it. We had to do it by ourselves. I mean, it felt like with GPT-3, you know, which all of us have like touched or worked on and scaling laws and everything else we could see it in front of us in 2020 and it felt like, well, if we don't do something like soon altogether, you're going to hit the point of no return and you have to do something to have any ability to change the environment. I think building up, Danielle, I do think that there's just like a lot of trust. In this group, I think like each of us knows that we got into this because we want to help out with the world. We did the 80% pledge thing and I was like, I think that everybody was just like, yes, obviously, we're going to do this. It was, yeah. And yeah, I do think that the trust thing is a special thing that's extremely rare. I credit Danielle with like keeping the bar high. I credit you with the fact that we still keep it out. Oh, it's a touch. It's a Claude right there. It's a good thing. You're the reason I called you skilled, I think. People say how nice people are here, which is actually a wildly important thing. I think anthropic is really low politics. And of course, we all have a different vantage point than average and I try to remember that. Because of low ego. But it's low ego. And I think I do think our interview process and just the type of people who work here, like there's almost a like allergic reaction to politics. And unity. Unity is so important. The idea that the product team, the research team, the trust and safety team, you know, the go-to-market team, the policy team, like that, the safety folks, they're all trying to contribute to kind of the same goal, the same mission of the company, right? I think it's dysfunctional when different parts of the company are, think they're trying to accomplish different things. Yeah. And about different things or think that other parts of the company are trying to undermine what they're doing. Yeah. And I think the most important thing we've managed to preserve is, and again, things like the RSP drive it. This idea that it's not, you know, there are some parts of the company causing damage in other parts of the company trying to repair it, but that there are different parts of the company doing different functions and that they all function under a single theory of change. Extreme pragmatism, right? Yeah. The reason I went to OpenAI in the first place, you know, it was a nonprofit, it was a place where I could go and focus on safety. And I think over time, you know, that maybe wasn't as good a fit, and there were some difficult decisions. And I think, in a lot of ways, I really trusted Dario and Danielle on that. But I didn't want to leave. That was like something that I think I was actually like pretty reluctant to go along with because I think for one thing, I didn't know that it was good for the world to have more AI labs, and I think it was something that I was pretty reluctant for. And I think as well, when we did leave, I think I was reluctant to start a company, I was like, I think I was arguing for a long time that we should do a nonprofit instead and just focus on safety research. And I think it really took pragmatism and confronting the constraints and just being honest about what the constraints implied for accomplishing that mission that led to a nonprofit. But just a really important lesson that we were good about early on is like make less promises and keep more of them. Yeah. Like try to be calibrated, be realistic, confront the trade-offs because trust and credibility are more important than any particular policy. It is so unusual to have what we have. And watching Mike Krieger defend safety things of reasons why we shouldn't ship a product yet, but also then to watch Vene sort of say like, okay, we have to do the right thing for the business. How do we get this across the finish line? And to hear people like deep in the technical safety or talking about how it's also important that we build things that are practical for people and hearing engineers on inference talk about safety, that's amazing. Like I think that is, I think that is, again, one of the most special things about working here is everybody with that unity is prioritizing the pragmatism, the safety, the business. That's wild. I think about it as spreading the trade-offs from just the leadership of the company to everyone. I think the dysfunctional world is like, you have a bunch of people who only see a big, you know, safety is like, we always have to do this. And product is like, we always have to do this. And research is like, you know, this is the only thing we hear about. And then you're stuck at the top, right? You're stuck at the top. You have to decide between, you don't have as much information as either of them. That's the dysfunctional world. The functional world is when you're able to communicate to everyone, there are these trade-offs we're all facing together. The world is a far from perfect place. There's trade-offs, everything you do is going to be suboptimal. Everything you do is going to be some attempt to get the best of both worlds that, you know, doesn't work out as well as you thought it was. And everyone is on the same page about confronting those trade-offs together. They just feel like they're confronting them from a particular post, from a particular job as part of the overall job of confronting all the trade-offs. It's a bet on race to the top, right? It's a bet on race to the top. Like it's not up here upside bet. Things could go wrong. But we're all aligned on like, this is the bet that we're making. And markets are pragmatic. So if the more successful and proper it becomes as a company, the more incentive for us for people to copy the things that make us successful, and the more that success is tied to actual safety stuff we do, the more it just creates a gravitational force in the industry, but we'll actually get the rest of industry to compete. And it's like, sure, we'll build seat belts and everyone else can copy them. That's good. That's like good worlds. That's really good. Yeah, this is the race to the top, right? But if you're saying, well, we're not going to build the technology, you're not going to build it better than someone else, that in the end, that just doesn't work because you're not proving that it's possible to get from here to there. Where the world needs to get, never mind the industry, never mind one company is, it needs to get us successfully through from this technology, does, doesn't exist to the technology exists in a very powerful way and society has actually managed it. And I think the only way that's going to happen is that if you have at the level of a single company and eventually at the level of the industry, you're actually confronting those trade offs. You have to find a way to actually be competitive, to actually lead the industry in some cases and yet manage to do things safely. And if you can do that, the gravitational pull you exert is so great. There's so many factors from the regulatory environment, the kinds of people who want to work at different places, to even sometimes the views of customers that kind of drive in the direction of if you can show that you can do well on safety without sacrificing competitiveness, right? If you can find these kind of win-wins, then others aren't incentivized to do the same thing. Yeah, I mean, I think that's why getting things like the RSP right is so important because I think that we ourselves, seeing where the technology is headed, have often thought, oh, wow, we need to be really careful of this thing. At the same time, we have to be even more careful not to be crying wolf saying that like innovation needs to stop here. We need to sort of find a way to make AI useful, innovative, delightful for customers, but also figure out what the constraints really have to be that we can stand behind that make systems safe so that it's possible for others to think that they can do that too and they can succeed, they can compete with us. We're not, we're not doomers, right? Like we want to build the positive thing, we want to build the good thing. And we've seen it happen in practice. A few months after we came out with our RSP, the three most prominent AI companies had one, right? Interprety ability research, that's another area we've done it. Just the focus on safety overall, like collaboration with the AI Safety Institutes, other areas. Yeah, but Frontier Red team got cloned almost immediately, which is good. You want all the labs to be testing for very, very secure, scary risks, export the seat belts. Yeah, export the seat belts. Well Jack also mentioned it earlier, but customers also really care about safety, right? Customers don't want models that are hallucinating, they don't want models that are easy to jailbreak, they want models that are helpful and formless, right? And so a lot of the time what we hear in customer calls is just we're going with Claude because we know it's safer. I think that is also a huge market impact, right? Because our ability to have models that are trustworthy and reliable, that matters for the market pressure that it puts on competitors too. Maybe to unpack something that Dario said a little bit more. I think there's the narrative or this idea that maybe the virtuous thing is to almost like nobly fail, right? It's like you should go and put safety, you should go and put things, you should sort of demonstrate like in an impregmatic way so you can sort of demonstrate your purity to the cause or something like this. And I think if you do that, it's actually very self-defeating. For one thing, it means that you're going to have the people who are deciding, making decisions, be self-selected for being people who don't care and for people who aren't prioritizing safety and who don't care about it. And I think on the other hand, if you try really hard to find the way to align the incentives and make it so that if there are hard decisions, they happen at the points where there is the most force to go and support making the correct hard decisions and where there's the most evidence. Then you can sort of start to trigger this race to the top that Dario is describing. Or instead of going and having the people who care get pushed out of influence, you instead pull other people to have to go and follow. So what are you all excited about when it comes to the next thing we'll be working on? I think there's a bunch of reasons you can be excited about Interpreter. One is obviously safety, but there's another one that I think I find at an emotional level equally exciting or equally meaningful to me, which is just that I think neural networks are beautiful. And I think that there's a lot of beauty in them that we don't see. We treat them like these black boxes that were not particularly interested in the internals. But when you start to go and look inside them, they're just full of amazing, beautiful structure. It's sort of like if people looked at biology and they were like, you know, like evolution is really boring. It's a simple thing that goes and runs for a long time and then it makes animals. And instead, it's like actually, each one of those animals that evolution produces. And I think it's an optimization process like training and neural network. They're full of incredible complexity and structure. And like we have an entire sort of artificial biology inside of neural networks. If you're just willing to look inside them, there's all this amazing stuff. And I think that we're just starting to slowly unpack it and it's incredible. And there's so much there. But there's just so much we discovered there. We're just starting to crack it open. And I think it's going to be amazing and beautiful. And sometimes I imagine, you know, like a decade in the future, walking into a bookstore and buying, you know, the textbook on neural network and the biology of neural networks and just the kind of wild things that are going to be inside of it. And I think that in the next decade, we're going to, in the next couple of years, even, we're going to go and start to go and really discover all of those things. And it's going to be wild and incredible. It's also going to be great that you get to buy your own textbook. I mean, I'm excited that a few years ago, if you had said like governments will set up new bodies to like test and evaluate AI systems and they will actually be competent and good. You would have not thought that was going to be the case, but it's happened. And there it's kind of like governments have built these new embassies almost to deal with this new kind of class of technology or like thing as the Chris studies. And I'm just very excited to see where that goes. I think it actually means that we have state capacity to deal with this kind of societal transition. So it's not just companies, and I'm excited to help with that. I'm already excited about this to a certain extent today. But I think just imagining the future world of what AI is going to be able to do for people is it's impossible to not feel excited about that. Dario talks about this a lot, but I think even just the sort of glimmers of Claude being able to help with vaccine development and cancer research and biological research is crazy, like just to be able to watch what it can do now. But when I fast forward three years in the future or five years in the future, imagining that Claude could actually solve so many of the fundamental problems that we just face as humans, just even just from a health perspective alone, even if you sort of take everything else out, feels really exciting to me, just thinking back to my international development. Times, it would be amazing if Claude was responsible for helping to do a lot of the work that I was trying to do a lot less effectively when I was like 25. I mean, I get, I guess similarly I'm excited to build Claude for work. Like I'm excited to build, like I'm excited to build Claude into the company and into companies all over the world. I guess I'm excited just for, I guess like personally, like I like using Claude a lot. So like, I definitely, there's been increasing amounts of like home times with like me just like chatting with Claude about stuff. I think the biggest recent thing has been code where like six months ago, like I didn't use Claude to do any coding work like our teams didn't really use Claude that much for coding. And now it's like just face difference. Like I give a talk at YC like a week before last and at the beginning I just asked like, okay, so like how many how many folks here use Claude for coding now? And literally 95% of hands. Like all the hands in the room, which is just like is totally different than how it was four months ago. So when I think about what I'm excited about, I think about places where, you know, like I said before, where there's this kind of consensus that that again, seems like consensus, seems like what everyone wise thinks and then it just kind of breaks. And so places where I think that's about to happen and it hasn't happened yet. One of them is interpretability. I think interpretability is both the key to steering and making safe AI systems and we're about to understand and interpretability contains insights about intelligent optimization problems and about how the human brain works. I've said and I'm really not joking. Chris Ola is going to be a future Nobel medicine lawyer. I'm serious. I'm serious because a lot of these, I used to be a neuroscientist. A lot of these mental illnesses, the ones we haven't figured out, right? Schizophrenia or the mood disorders. I suspect there's some higher level system thing going on and that it's hard to make sense of those with brains because brains are so mushy and hard to open up and interact with. Neural nets are not like this. They're not a perfect analogy, but as time goes on, they will be a better analogy. That's one area. That is related to that. I think just the use of AI for biology. Biology is an incredibly difficult problem. People continue to be skeptical for a number of reasons. I think that consensus is starting to break. We saw a Nobel Prize in chemistry awarded for AlphaFold Remarkable Accomplishment. We should be trying to build things that can help us create 100 AlphaFolds. And then finally, using AI to enhance democracy. We worry about if AI is built in the wrong way, it can be a tool for authoritarianism. How can AI be a tool for freedom and self-determination? I think that one is earlier than the other two, but it's going to be just as important. Yeah, I mean, I guess two things that at least connect to what you were saying earlier. One is I feel like people frequently join in, and they're sort of scientifically really curious about AI. And then kind of get convinced by AI progress to sort of share the vision of the need, not just to advance the technology, but to understand it more deeply, and to make sure that it's safe. But I feel like it's actually just sort of exciting to have people that you're working with, like kind of more and more united in their vision for both what AI development looks like, and these sort of sense of responsibility associated with it. I feel like that's been happening a lot due to a lot of advances that have happened in the last year, like when Tom talked about. Another is that, I mean, going back really to concrete problems, I feel like we've done a lot of work on AI safety up until this point, a lot of it's really important. But I think we're now with some recent developments really getting a glimmer of what kinds of risks might literally come about from systems that are very, very advanced so that we can investigate and study them directly with interpretability, with other kinds of safety mechanisms, and really understand what the risks from very advanced AI might look like. And I think that that's something that is really going to allow us to further the mission in a really deeply scientific and critical way. And so I'm excited about the next six months of how we use our understanding of what can go wrong with advanced systems to characterize that and figure out how to avoid those pitfalls. Perfect. Finn. Okay. Good job. We got to do this all the time. This is the only time we ever get to do it.