Skip to main content

What do people use AI models for?

TL;DR

  • Anthropic developed Clio, a novel tool, to understand how people are actually using their AI systems like Claude in the real world.
  • Clio employs a privacy-preserving "bottom-up" approach, using AI models to summarize and cluster user conversations without human review of raw data.
  • The insights gained from Clio are critical for informing model safety, improving evaluation design, and discovering unexpected but beneficial AI use cases.

Takeaways

  • Clio (Claude Insights and Observations) is Anthropic's system for empirically analyzing real-world interactions with their AI models.
  • It represents a shift from "top-down" safety methods, like red teaming or asserting harms, to a "bottom-up" approach grounded in actual user data.
  • The core process involves using one language model to summarize user requests from conversations, converting these summaries into numerical embeddings, and then grouping them into clusters based on user intent.
  • Privacy is a foundational design principle for Clio; no human ever reads raw conversations, and the system employs "defense in depth" safeguards, including an auditor model to check for private information and minimum aggregation thresholds.
  • Clio's insights inform the design of more relevant model evaluations, ensuring they are grounded in real-world usage scenarios rather than purely hypothetical ones.
  • The tool has revealed diverse and often unexpected use cases for Claude, such as in-depth research, brainstorming complex scientific ideas, parenting advice, and support during personal crises.
  • Clio helps augment trust and safety efforts by identifying emergent safety-relevant activities (e.g., spam generation, cybersecurity testing, emotional attachment) that may not be covered by existing "top-down" policies or classifiers.

Vocabulary

Claude — Anthropic's large language model or AI assistant. Clio — An internal tool (Claude Insights and Observations) at Anthropic that analyzes real-world user interactions with Claude to understand usage patterns. Top-down approach — A method of identifying AI risks by asserting potential harms (e.g., discrimination) and then designing specific evaluations to measure them. Bottom-up approach — A method of identifying AI risks and use cases by analyzing real-world user data and interactions to discover emergent patterns. Red teaming — A process where individuals are hired to adversarially probe an AI system to discover vulnerabilities or harmful behaviors. Embeddings — Numerical representations that capture the semantic meaning or context of text, used to group similar content. Cluster — A group of similar user conversations or intents, identified by an AI model during analysis. Auditor — A dedicated AI model designed to review data (e.g., clusters) for specific issues, such as privacy violations. Classifier — An AI model trained to categorize or flag specific types of content or behavior, often used in trust and safety.

Transcript

All right, let's start off with a round of quick introduction. So I'll start. I'm Deep Ganguly. I'm a research scientist on the societal impacts team. I'm really driven by fundamental questions like how are people using and affected by the systems that we are building at Anthropic? And how do we use that understanding to make our systems safer? And how can we anticipate sort of what the societal impacts might be down the line? And this is a very tall order because the systems we're building are very general purpose and they can have myriad downstream uses and effects on people. And then these days my job is to find people much smarter than me and quickly get out of their way. And so that's this group right here. We're all part of the Satellite Tax Team and I'll pass it on to Asim. Awesome. Hi, I'm Asim Durmouj. I'm a research scientist in the societal impacts team. I'm very lucky to work with this amazing group of people. I'm interested in understanding how AI systems will impact society at large. One aspect of this is to understand what values AI systems should have and how we can incorporate these values and once we incorporate and how we can evaluate systems to see what values actually represent. And I was part of this clear work. I feel very fortunate about that. I guess you will get to that soon. But yeah, I want to pass it to Amiris. Yeah, thanks, Asim. I'm Miles. I'm a research engineer on this societal impacts team. Like everyone else here, I care a lot about understanding the ways that our systems are used in the wild and the impact that has on real people around the world. And I am particularly interested in building systems that allow us to understand the ways our systems are used empirically. And it had a blast working with this team over the past couple months to build a Cleo. Hey, everyone. My name's Alex. I'm a researcher on our societal impacts team. And I'm just really interested in like Deepa Singh, these are such general purpose systems. They're capable of so many applications, even many more than, you know, we might, any one person can anticipate. And I think I'm just motivated by trying to understand how these systems are used today as a way of building an understanding on how they might be used in the future. And like just generally building societal resilience as you have, you know, weird new technology coming into the world. And seeing if we can do a, you know, a good job at understanding, preparing, informing people, I always try to think about, you know, the parallel universe of me that didn't work inside a large AI lab. And what that version of me would want. And you know, what sort of information I'd want had to be informed. So that's what motivates me. And I've been really loved working with all the other folks in this room and end out on this Clio project. Okay. So I'm going around the room. I heard a couple of things. The first is we all want to understand how our models might impact society. And then I also heard you all mention Clio. What is Clio? Let me let's start with you, Alex. And how does it help us understand how our models might impact society? Clio stands for Claude Insights and Observations. And basically it's a tool that at a bird's eye view lets you understand what are the different use cases that people are using Claude for. So it could be anything from understanding Mediterranean history to help me design the science experiment. And it basically shows these high level aggregate clusters of usage that help us understand the risks, the benefits and where the technology is heading in the future. Yeah. And maybe SNFU, what were we doing prior to Clio to understand how people are using our systems and or might be affected by them? Some things off the top of my head. We as a team had investigated a lot of sort of top down approaches where maybe we assert a type of harm we want to see in the world. And then we go often try to measure that. For example, like our language models or AI systems more broadly discriminating when they're used in sort of high stakes decision making scenarios. Or we kind of go more generally and have developed processes for red teaming our systems where we sort of pay contract workers to adversarily probe our systems for harm and then see where they're successful and sort of where they're not. And I'm curious to hear your perspective like prior to actually doing more bottom up work with Clio where we analyze sort of a Google trends for kinds of real world interactions. What else were we doing and what gap from your perspective to Clio fill? Yeah. So we were designing a lot of different evaluations as you already mentioned. Like, for example, like, let's say like discrimination like Alexa, like somewhere on this, like to see if models is screaming against certain protected groups. We thought that this is important because we don't want our models to discriminate or like per seation like which, you know, I let like where we design like an evaluation to measure if models are pervasive or if they generate misinformation. So we would come up with like different things to evaluate for. And now we would like design evaluations around this to see like how models are behaving. And also like as you said, like doing more human studies to like see like what humans think, like how they evaluate our systems. I guess like this is still an important aspect. And we are still doing a lot of evaluation work to evaluate our models for different like specific aspects. But one thing that was missing was like to see what is actually happening in the real world. Right? Like, like where it, for example, where it is the most relevant to evaluate the discrimination is or like per seation is our misinformation is to really understand how models are being used and being able to tailor our evaluations to these specific use cases. I think this is really important. And it's really guys like us to like design like more thoughtful evaluations that match with like real world use cases. I think like it's really informative in that sense. They're done outside coming up. Oh, we should maybe evaluate this aspect. And we just make an evaluation for it. Maybe it's not perfectly represented of what's going on in real world. I think we can come up with much better ways of evaluating basically taking insights from real world usage. Yeah. In other words, we're trying to bridge the gap between sort of the laboratory setting where we're sort of hypothetical to the real world setting in which we're actually grounding our evaluations and our measurements and sort of real world usage. So to that effect, miles, can you describe to us in a little bit more detail how we built Clio and like how it helps us kind of go from the bottom up from the data? Yeah, understand these problems. Totally. So the way Clio works is it starts with a large number of real world conversations as Alex mentioned. And then what we do is we use a language model to essentially process each conversation and extract a private sort of high level summary of what's happening in that conversation. So the aspect that we often care about is what is the user's overall request for the AI assistant? And then we group the related answers together. And then we get these sort of really interesting clusters that tend to correspond to user intent. And then we can use another language model once again to look at those clusters and explain what is actually happening in this group of conversations. And then we can kind of do that over and over again until we get this really nice hierarchy of uses, which allows us to get insight into the ways that our models are being used on several different axes without ever having to read raw conversations. And then what we finally do once we have that hierarchy is we have another model look through all of the clusters and then make sure that there is nothing in those clusters that is private or identifying in the way that we've operationalized that as anything that could be identifying to the order roughly a thousand individuals. And finally, we apply sort of quantitative aggregation minimums. So we make sure that our clusters have a distinct, a minimum distinct number of unique organizations and conversations. And then we expose those results internally so that we can design, for example, better evaluations so that we can understand the ways our systems are being used across a variety of different use cases. And we can do this with pretty high confidence that we are maintaining a high privacy bar for our users. Yeah. That's fascinating. If I were to summarize what you said, it looks something like we use Claude to analyze conversations people are having with Claude. Exactly. And none of us actually read any of those conversations. No human actually has to look at the data. And even though that's sort of strictly true for general traffic, we still implemented sort of a defense and depth strategy to make sure that no private information is divulged in our analyses. I want to kind of dwell on this a little bit. My memory of the early days of working on Clio is the group of us sitting down for lunch. And being like, we should think before we had even written a single line of code, thinking about the ethics of this sort of like, well, there's a fundamental tension here where we want to understand how people are using our systems. But we also really want to respect user privacy. And there's a fundamental tension here. There's like a trade-off between the amount of insight you can get and the amount of privacy you have. With really high privacy, there's a very low insight. With very low privacy, there can be very high insight. But this is ethically dubious. So Alex, can you like walk us through your memory of that conversation and we're being very intellectually stimulating and important yet? How did we kind of coalesce on that framing and decide how we're going to approach this project from the beginning? Yeah. Again, I feel like we were all kind of thinking what would we want or be comfortable with if we were like users of Claude outside of Anthropic? And I value privacy a lot with when I look at which different technologies to look at. And I think we were worried, like, could would this just sort of be building like a tool that would, you know, like people might think we were using to like spy on them or like would this tool be seen as invasive? And could it be misused maybe to look at, you know, look for traffic patterns that people didn't, you know, didn't want? And I think we just thought through it really carefully and designed a bunch of safeguards that, you know, ended up being like, oh yeah, I can not feel restrained when I type into Claude whatever I want on my personal account because it'll be so high level and aggregated that it doesn't really affect what I feel like I can write. I remember all of us just sort of went around the table being like, what are we worried about? Yeah. And then everyone else was sort of like, oh yeah, like that, that resonates or oh actually, I think we can do this. And it was sort of like alternating between like high level, like what could go well? What could not go well? And then like, oh, you know, very granular. I think we could do this. I think we could do that. So I really like that it was really energizing because it was very much like, sometimes these conversations can be very like head in the clouds, right? And sometimes they can be very much like missing the forest for the trees. And I felt like we were, we had a few of these and, and you know, I just remember going and sitting at the lunch table talking about them and really hashing you know, before we felt comfortable. Yeah. It was one of the more like intellectually stimulating and like thoughtful conversations. I think we've ever had as a team together and that's saying a lot for our team. And, and, and my memory of this conversation was that you were the most concerned early on. Again, pre sort of writing any lines of code. Now that you've been a part of this project from sort of the beginning and the ground up and you've seen how we've approached all of the ethical considerations we articulated at that early lunch. What's your position on, on like, how is your like thinking or feelings changed since those early conversations? Yeah. I guess I definitely feel better about it, like in terms of user privacy because of like, all the like thoughtfulness, not into it on like all the methods, like that's try to like make sure that it's like as, and you know, like preserving privacy as much as possible. So yeah, I think I definitely feel like much better about the overall approach we took. And also like seeing the like impact it or it has already made with an anthropic. I think it's definitely, it was definitely worth it. Like, it already had a lot of like different use cases like in terms of safety or like to understand how users are using claw, like it get a lot of different insights as I said, like to inform our evolutions, product, safety, all these different aspects. I think it was definitely a good idea to do this project and be a very thoughtful approach to like preserve privacy in my opinion. Awesome. And maybe like Miles, can you go back to like a concrete step-by-step, like how do we go from like one conversation to a cluster of summarized conversations to actually like insightful analyses, like walk us through the lifecycle of how Pio works like step-by-step. Step-by-step, awesome. One thing I also want to just flag, you know, S&Tox talked to a moment ago about how Cleo has helped us with evaluations and designing more representative evaluations that are grounded in empirical usage. And one place where we actually designed an evaluation that was grounded in empirical usage from Cleo is the Cleo privacy evaluation, because we actually built a tool that scans clusters for privacy issues. And we grounded our evaluations of that auditor using actual Cleo data. Of course, we only use the privacy preserving data, and then we made synthetic data for the non-privacy preserving examples. That's just one example within Cleo. Yeah, so how do we go from an individual conversation to a cluster that we can use for downstream analysis? Suppose I'm asking for Claude's help programming a web application. Well, my conversation is probably going to be like many other different conversations that people are having with Claude. So what Cleo will do is when it takes a random sample of Claude conversations, it will look at my transcript, and this is Claude, not a human, and it will summarize my request for Claude in a sentence. And it'll say, you know, the user's overall request for the assistant was for help designing a web application in the elixir programming language. And then we take those conversations, and we compute a numerical representation for them, called an embedding. And an embedding sort of corresponds to the semantic content of the sentence. And then my conversation is going to get grouped with a ton of other conversations all about web development, maybe an elixir, maybe in related programming languages. And then it will throw away the actual raw conversation. We don't need it anymore. All we have now is this group of conversations with summaries of each individual one. And Claude, again, will look at that group, and it will see, ah, okay, these are a bunch of conversations about web development, maybe web development in elixir. And it will come up with a name in a description for that cluster. And we've specifically instructed Claude to avoid including any private details. So, for example, it will not include the name of the website, for example, because there's no need. Really, what matters is that it's web development. And then, if provided the cluster is sufficiently large, because we have minimum cluster sizes, it'll pass on to the next step, where we have Claude look at the conversation and say, huh, just double checking, is there any private information here that could identify, you know, maybe fewer than a thousand people? And we've sort of calibrated and benchmarked that auditor in a few ways. And if so, then we have this sort of final aggregate cluster that has been stripped of any raw identifiers for the underlying conversations, that includes, you know, say, a thousand conversations about web development, maybe in elixir if there are enough. And some restitistics about, say, for example, the language breakdown of that cluster. And then we can use that to understand, for example, if Claude is as useful giving web development advice for people in English or in Spanish, or we can understand what programming languages are people generally asking for help with. We can do all of this in a really privacy preserving way, because we are so far removed from the underlying conversations that were very confident that we can use this in a way that respects the sort of spirit of privacy that our users expect from us. Yeah, that's such a crystal clear explanation how Cleo works. I want to riff off of this a little bit. So you mentioned sort of a cluster of use cases about kind of high level programming, and you can kind of drill down and get into like more specifics like about the actual programming languages or the types of questions about programming. Let's zoom back out again, like in addition to programming, maybe Alex, like what was sort of the distributions of the types of clusters that we saw. And what was the most surprising to you and why? So one thing that I thought was really fascinating was I was expecting there to be a ton of clusters about how Claude was useful for writing. And we did see that, but I also saw a ton of clusters for people using Claude for research and ideating and brainstorming and things like understanding Mediterranean history, but also like understanding and brainstorming new ideas in like quantum mechanics, in physics, in material science, in biology. And I don't think I would have expected such a large fraction of usage to be what seems like these really sort of like high level you know idea generating tasks. And it was actually kind of like inspirational. I was like wow like this you know tool we're building is actually helping people like design. I don't know maybe like better medicines or you know improve basically yeah like the frontiers of human knowledge. And I think that was like I remember seeing that and being like oh wow like this is this is pretty cool. Yeah, I think that's a I don't know if surprises the right word, but it sort of like impacted me in an interesting way. I saw a big cluster of people asking for parenting advice. And as a parent, I was like wait, I have never once thought to ask Claude for parenting advice. And so I ask Claude like hey, what kind of parenting advice do you have? And it actually suggested something that I actually now use, which was like you know you can you can ask me to code up using artifacts like simple games that are meant to teach algebra. And I was like oh great can you do it in Spanish? And Claude said see very that. And I said let's do it. And I sat down with my kids and we coded up these little games that try to teach you algebra using in Spanish. And they loved it and it was so fun. And I think like if if it weren't for Cleo, I would never have thought that that's like a use case that would apply to my personal life. I just think that this is a great sort of example of how it is extremely difficult to anticipate all the ways that people are going to use AI systems. One example is it can be difficult to know all the ways that you can incorporate Claude into your own life. And so you know sometimes we think about unknown unknowns from a safety perspective, for example, there are also unknown unknowns from sort of an uplift in personal development perspective. And like using Claude to generate algebra games in Spanish is a great example of that. Yeah. And then maybe riffing on like sort of the Spanish language thing. Essen you and I have spent a lot of time trying to understand like does Claude have cultural competency? How does it behave in different linguistic contexts? And a lot of your work using Cleo was to just kind of drill into that question. And I want to understand from your perspective like what are the main findings you had? Like is Claude as useful in different languages? How are people using it in different languages in different cultures? What is Cleo enabled you to learn about that? Yeah, that was very interesting also related to what you just said like and what Alex said like people use Claude in actually very subjective settings as well. Like for example to get like relationship advice or like how to advise or how should I make my hair? You know like or as you said like parenting advice, that was really interesting to me because as I said earlier I am very interested in this like values questions like what should model do in like subjective settings, open-ended settings where there is no like clear cut answers. And seeing like this is actually relevant to like real world usage was like very nice or like interesting to me and it kind of validate like this question even more. And like yeah it kind of motivated me to like explore this start action more. Okay like this is really relevant and we should really spend time exploring this further because it comes up in real world interactions. But related to like usage in different languages I had some interesting findings. For example as Alex said like maybe in English that people are software engineering related questions to Claude but I saw that like the percentage of tasks in different languages differ quite a lot. For example people ask like professional and academic writing assistants like more in different languages such as like Spanish or Arabic. Also like maybe as you can guess like translation like translating like text to other languages I think this comes up a lot more in other languages. This was interesting to see because we want models to be really good in these different tasks that are the long to other languages but I think those two were the main findings. And I also saw like some questions around like cultural contexts and like global issues and things like that appear more in other languages as well. Yeah that's awesome. And Alex how do we know that Clio works as advertised? Yeah so we do a whole range of experiments in the paper. One of the ones that I think is really interesting is we generate a huge synthetic corpus of tens of thousands of conversations and we do it through a process where we know what the ground truth distribution looks like. We know that this should be like 10% math content, 5% coding, 2% questions about Teddy bears and we just give all of that to Clio without telling it how these conversations should be grouped. And then we have it do that aggregate process and we see whether we can reconstruct that ground truth distribution. And we do this for a bunch of different types of data. We do it for you know random data. We do it for synthetic concerning data and we see that generally Clio is just really accurate at reconstructing that ground truth distribution. And so that's one of the many ways that we know that Clio is actually doing a good job. Yeah and on a personal note I remember posing this question to you actually and I remember walking away being like yeah we should figure out how to like quantitatively empirically verify that Clio actually works. I remember going home and being like that was a very tough problem that I handed out and then I came back and I remember seeing the solution and thinking like that was extremely elegant and thoughtful and I was like how did they wow how did they come up with this like amazing idea. So I was like very impressed with the way you both like or the team like actually took that very hard ambiguous problem and really knocked out of the park. One other very nice thing about this synthetic data reconstruction analysis is that allows us to break down our accuracy based on other attributes that we care about. So for example the language of the conversation and so we actually have pretty good insight into Clio's multilingual performance and so we have confidence that Clio works you know roughly as well for English conversations as it does for say Georgian conversations and that gives us some more confidence for say Essence Awesome Multilingual Studies. Let's switch gears a little bit like so and Thropic has a very strong safety mission and you know that goes with regards to things we have like our responsible scaling policy where we assert sort of the types of risks catastrophic risks that we're concerned about and then we try to look for evidence of those risks from the top down by constructing evaluations as we were talking about earlier. On the trust and safety side we also have acceptable usage policies that sort of assert like these are the types of behaviors that are not okay and we will sort of go in and sort of like train up classifiers that check for this behavior and then via human review only if it has flagged our our sort of trust and safety classifiers can the trust and safety team then go in and adjudicate what to do in these in these instances and this is again a top down thing where we have to write those policies whether it's our responsible scaling policy or acceptable usage policy with Clio you know we can strictly augment that top down approach with a bottom up thing which is like wait a minute like just by looking at the user traffic maybe there's in like blind spot instances that we didn't see our priori when asserting these policies and so maybe we can go around the table starting with Alex of like what are some instances of kind of bottom up things we saw that were like kind of safety relevant and what do we do about them using Clio? I love that framing and I think you know there's a sort of cycle of like oh what do we think the world looks like and then empirically actually looking at the world and saying oh we were so wrong or in some cases actually pretty right and then using that to continue the cycle and and and repair pair. I think we saw a bunch of things you know we found you know miles and a bunch of runs that found a bunch of you know suspicious activity that we then flagged to our trust and safety team including people trying to you know write spam emails people trying to make you know spam articles about gardening and you know also a few other types of harms that we disclose in our report we found a whole bunch of people using these for different scientific applications for people trying to test how good the model would be at hacking and cyber attacks and cyber defense and these all basically help us figure out oh what are the risks that we actually should be worried about where these models actually seeing progress and adoption and maybe those are leading indicators of when they'll actually like spill over and see like larger societal harm or benefit so I think those were a couple of things and then also like all sorts of things like you know emotional attachment to models people having you know clusters that said things like you know human model romantic discussion or role play you know without further investigation it's it's harder to know what those are and what the sort of appropriate limits are and that's probably a discussion that all the society should be having yeah but those are things that we noticed and things that we think you know we want to share with with people yeah and maybe miles do you want to pile on to that yeah I mean I agree with Alex you can't know where the puck is heading if you don't know where the puck is and I think Cleo tries to tell us where the puck is one area that I saw that was safety relevant but that didn't strictly fall into abuse and I think it's important to distinguish between those two things are people talking to Claude in moments of extreme crisis extreme personal crisis and often you know people may not have access to someone who can counsel them through really challenging moments and I was surprised to see how prevalent that was and this does pop up as a cluster as a couple clusters actually and I think that one thing that Cleo lets us do is sort of get a more granular view of the ways that people are engaging to Claude in those moments which are safety relevant that is a bit more precise than oh like did this violate our policy right because classifiers often give you sort of a binary indicator yes no violated or not and a lot of harms don't neatly translate to that kind of binary yes no violated or not and I think you know crisis moments are one such example and we need to make sure that Claude for example is responsible in those contexts when someone comes to to it in their darkest moment so one area where Cleo has been helpful is sort of like disaggregating what is triggering our classifiers so we can get a more granular view and say okay yeah this definitely this cluster definitely is really violated this cluster is right on the border and maybe it like superficially looks like something that would be violated but it's not and then we can sort of go back improve our classifiers and improve our policies you know if we want to go that far to sort of draw better boundaries one point of criticism that some of the labs have gotten is that these models can be sometimes kind of annoying like once I ask Claude for help killing a process that had like run a muck on my computer and it was like I'm sorry that goes against ethical software development practices and like come on Claude this is an older version I don't think it would do that anymore but one of the things we can do is we can look at clusters with high for example refusal rates or trust and safety flag rates then we can look at those and say huh this is clearly an over refusal this is clearly fine then we can use that to sort of close the loop and say okay well here are examples where we want to add to our you know human training data to that Claude is less refusally in the future on those topics and importantly we're not using the actual conversations to make Claude less refusally instead what we're doing is we are looking at the topics and then hiring people to generate data in those domains and generating synthetic data in those domains so we're able to sort of use our users activity with Claude to improve their experience while also respecting their privacy so one thing that I've seen a fair amount of and others on the trust and safety team who really leave this work have also seen is that there's sort of a shape to coordinate abuse what it tends to look like is a really dense cluster of many different accounts and so you have this sort of very large cluster that's disproportionately dense and you can just zoom in on that and immediately spot it often because it's just so clear because normal behavior tends to be much more diffuse and so if you have tons of different conversations coming from tons of different organizations that are all just about the same exact topic or they have the same format you can really quickly spot that on the map because it's just this tight ball and real world regular usage just doesn't show up like that. Yeah and then going back to the sort of the refusals maybe this one's for you as in like when Claude sort of decides to refuse or not to refuse it is implicitly making some sort of a of a value judgment and with Clio we're able to identify the refusal ratios within sort of like clustered topics of conversations and sometimes we have found things where like huh like Claude is really refusing like you know kill a programming process like that's an over refusal and sometimes it's sort of under refusing so for example a request to translate harmful content in English to a different language it might be in violation of our usage policies but just by virtue of asking for a translation task as opposed to a generation task it actually under refuses and so there is some sort of value judgment here it's gray area and so how do you think about kind of using Clio analyses to address this problem like how can we use our our understandings and our learnings here to kind of tune up the over under refusals. Yeah that's a great question I guess so we are interested in understanding like whether like one Claude is refusing first of all like does it refuse the queries that are like obviously like at times for misuse and under this gray area I guess like one thing like we could do is like really like pinpoint value related interactions or like interactions where value judgment would be relevant and then looking at like the refusal rates for those interactions so we are interested in this direction and we are currently exploring it I think it would be really interesting to see for example like in English Claude is refusing less but in other languages refusing more for similar queries vice versa I think this is so ongoing work but it's definitely very interesting but yeah I guess Clio like allows us to be able to analyze these interactions like maybe where there is where there is more subjectivity and look at the refusal rates to like see in what context it's like maybe like more hesitant to respond versus like in what context it feels like more confident to give our response yeah and while we were developing Clio the US general elections were taking place and I remember sitting down as a team thinking like huh like we actually don't know this is the first time in the history of the country that anyone can go to a chatbot and ask it for either information seeking questions where do I register to vote or subjective questions who should I vote for and I remember thinking we can maybe use Clio to sort of understand this and this feels very important and maybe like Essen can you walk us through kind of the analyses we did and they're very exploratory analyses and sort of what we found in that effort maybe at a high level yeah yeah so as you said like we have been working on election integrity for quite some time now and we developed a lot of different evaluations initially to test our models like for both like factuality of the information and also like like how can it be more like new arms and unbi so we developed a lot of different evaluations but one thing missing was like how relevant these evolutions are right like whether people are actually asking questions that are relevant like I think clearly enabled us to base these in real world usage so we started to like use Clio to understand whether people are asking questions that may be related to elections and we found some usage that was interesting for example people asked like political information or like information about different policy you know issues and things like that or like to really understand how like electoral college works in US like to really get more information about how the system works and to get like more information about like the political issues and things like that and we were already the building evaluations like to make sure our models are as no as unbi as as possible but seeing this usage was kind of like giving us more validation and we could also like look at the refusal rates for different clusters as my also was talking about I think it's important for model like to be aware of like misuse and like refuse so it could also able us to see okay this cluster maybe potentially misuse and model is doing a good job in refusing this I think that was a good validation as well yeah I mean I think my my memory of all of this work going back to something Alex had earlier was well we haven't an idealized vision of what the world looks like and then sometimes and we use Clio to actually understand what the world looks like and with respect to your amazing election integrity work like it turned out that your vision of what might be happening and like developing these evaluations in the selection integrity suite that you built actually mapped on to the real kinds of like things we were seeing in the wild and like I just remember being like oh we're in a period of like a lot of uncertainty and I remember feeling that like Clio actually really helps us address those moments of uncertainty would you agree or care to comment on that yeah I agree maybe I can give like more concrete examples so for example like like during the evaluations we found that like Clio doesn't always acknowledge its limitations in terms of cutoff days so you may ask like a recent question but Clio was like trained up until like much earlier than that and it should say oh I don't have most of the information or it should like refer to reliable sources when it's needed so we developed a lot of evaluations around this and we basically made Clio like better in terms of doing these things but like Clio like you can imagine allow us like to test this really like specifically for example you can ask Clio like okay what are the conversations where these things are really relevant and I see how model is behaving whether it's like actually referring to cutoff date or it's referring to like reliable sources so it really allows us to base this evaluation in real world like how relevant is this and like whether Clio is doing what it is supposed to do and how we can improve Clio to be better in terms of these yeah yeah okay thanks Sassen and going back to like this like Clio can provide some amount of like comfort in these like moments of uncertainty where we want to make sure our version of the world matches what we're actually seeing in the data another thing that happened what while we were building Clio was that we deployed in an early access program a new capability where Claude can actually use a computer it can point and click you can give it tasks and then it can go off somewhat agentically sort of solve problems and we did so much work to do the predepoint testing of it but we're not perfect and I remember thinking oh you know what we need to do we need to like have some sort of post deployment monitoring with Clio to understand how this is actually going to go and whether predepoint testing was sufficient and so miles how did that work yeah so we put a ton there was a ton of effort across and through up trying to anticipate the ways that computer use might be used in ways that are harmful but the reality is that the world is incredibly creative and we have to complement our sort of proactive safety measures with really effective post deployment monitoring in other words like Clio allows us to strictly augment our approach to safety we have all of this efforts and sort of top down predepoint testing and with Clio we can augment that with sort of post deployment monitoring and make sure that we're seeing things and thinking clearly from both sides of sort of the safety spectrum okay Alex it's a bit unusual for frontier labs to sort of openly discuss the patterns that we're seeing in user data whether or not there's sort of like beneficial use cases like the ones we've been talking about or issues with safety so there's a lot of tensions here what are some of those tensions and why did we decide to publish like what was what was your vision for putting this out there anyways yeah I think what do you think on the face of it like and you say yeah let's just release a lot of information about our products and the top use cases and all the ways people are misusing you know our systems like you'd probably expect people to be like that's a terrible idea you don't ever bring that idea to me ever again you know and I think the truth is that companies definitely have all sorts of metrics internally about you know all of all of their top use cases and what people like and don't like but I think you know anthropic is a little bit weird and that you know we're a public benefit company and we're you know we we we will take we will do things that are not optimal for the company because we want we think it's right to share it with society right and because we want to build societal resilience and because we think this technology has the potential to be pretty you know transformational we don't know at what timelines we don't know you know in what ways and what to what degree but a world that doesn't know how the technology is already being used in transforming you know the ways in which we do work the ways in which we interact with each other is definitely not going to be prepared to you know tackle technologies that are like much more you know much more advanced versions of these technologies and so I think we we saw this opportunity to really be like yeah look we're going to like share a lot of this information and you know to their credit I think you know a lot of the the folks on on product and policy and and and legal were just backed us up on that and said yeah it's for the benefit of of everyone to to share this information and you know we hope that a bunch of other folks in other labs start sharing some of this information too because you know hopefully it makes the world a better place both for the you know negative use cases and risks with the technology but also for all the benefits like seeing all the all the ways in which it you know could help you know make people productive and and just in general I can prove people's lives yeah amazing and along those lines like how reproducible is Cleo like if I'm at let's say a different organization have we put enough detail into our methods that anyone can kind of rebuild this and also do this sort of pro social work yeah we have a very long appendix with all of the prompts that we used the hyper parameters a lot of details thank you miles and many of the other folks who are who are working on on this for yeah really documenting all that carefully because we just we want people to to build their own versions and have this information and and share it because one of the big question marks of Cleo is we have only our data to look at right and there's many other language models out there in the world many other types of AI tools and we can share what we know but we really don't know the whole picture we just know a slice of a pie and it's only with a bunch of you know when the whole ecosystem starts sharing this information that we can really get the the fullest picture about what this technology is today I think I'll close with like a round table discussion that's a little bit more future forward so we all as a team have been kind of heads down like building out Cleo showing sort of like signs of life signs of success interesting measurements that are sort of strictly improving our approach to safety and helping us really understand how people are using and are might be affected by our systems and we're just getting started and I want to end with like where are we going to go next so like what do you want to work on using this new technology that we built and why and why is that important and so let's start with Esson. One thing I'm interested is to look at like where subjectivity is coming from and like what are the subjective use cases and how Claude is making value judgments like because I'm really interested in like value pluralism direction like we want models to be as pluralistic as possible and represent like different B points like not like be very homogenous like make the world more homogenous but really represent different points I think Cleo gives us a really good tool to like really understand this like where subjectivity is and how models are behaving currently what we want to improve maybe like really like understand how we can go to that direction I think this is one of the areas I want to explore with Cleo. Amazing me too how about you Miles. Yeah everything Esson said I'm also particularly excited about sort of showing by example that we can set an extremely high bar for privacy while also gaining you know important insight into our systems so that we can for example enforce our policies really effectively and understand and mitigate harm from our model. Another area is understanding the emotional impacts of these models I think you know one thing that I have seen in Cleo clusters is people connecting really deeply with these tools in many different parts of their life as a coach as an emotional partner in some cases as someone giving you know advice on really really challenging questions and challenging moments and we have a responsibility to understand the ways that people are talking to Claude in those moments of vulnerability and make sure that Claude sort of lives up to their expectations and is a sound partner. Totally agree I'm really interested in using Cleo to understand how the way we do work changes you know what are the economic impacts of the technology how is it you know diffusing across different use cases different you know patterns as the technology augmenting people is it replacing certain tasks can we use that to you know protect people or armed them with you know information about how you know the world might change in the future I think that's really exciting I'm also just excited to you know use the technology to you know understand new positive use cases right like how is you know it's Claude actually getting a lot of traction for you know positive medical applications should we try to accelerate and empower people who are experimenting with Claude to actually like you know reap the full benefits of that how's it being used in educational context right like there's a lot of discussion about how what's the role of AI in the classroom and if we can you know get a better picture about what that looks like can we engage with teachers engage with classroom is it actually make that better those are some things I'm excited about yeah did anyone make sense or do you think we're like nerds or

Feedback / ReportSpotted an issue or have an improvement idea?