Demand-Driven Context: A Methodology for Coherent Knowledge Bases Through Agent Failure

uh thank you. Maybe we can get started. Uh first of all, thank you so much for uh coming for the workshop especially ones who didn't get the seat. Uh I I promise you I'll do my best to make it entertaining especially for you who are studying. Uh thank you so much. >> Yeah. Uh actually it it makes sense. So now I know like why the tickets got sold out, right? Uh which workshop actually sold out the tickets. So uh let's start with uh my introduction. So I'm Raj. Uh I work as a staff software engineer uh at IKEA. uh I work for a domain called deliverance services. Uh basically we are like almost more than 100 engineers and six pedex teams altogether. It's like a mini company within the company itself. Uh I'm very interested with architecture, neuroscience and linguistics and now AI. So if anyone has some cool projects because everybody is building cool projects these days, please find me after this meeting. So quick pulse check uh with the uh audience. Uh who is visiting London for the first time? Okay, cool. Welcome to London. Uh who is here uh from the engineering background also with white coding and prototyping? >> No, all of okay. So uh who actively uses agents like co-pilot? Okay, this is going to be tough for me then. So extensions. Okay. So everybody's pro. Okay. Fine. >> Not so much as I want. >> Okay. So much tension now. Uh so you're going to sit here in this hot room for more than an hour. So first I will uh give a bit of introduction of what I'm going to present today. Uh it's basically on agent and the context management. Uh I I divided into three parts. One is the situation which all of you already know. So I'll keep it tight and short like five minutes. Uh then I'll talk about the problem. This is where I'll spend a bit more time on the slides because I think like uh nobody is actually seriously looking into the problem. Uh and I want to bring it up then less slides and more into some kind of a hands-on uh how the uh actual demand driven context actually works. All good. Okay. So, let's start with the first one. How many have seen this movie? Momento. >> Okay. Okay. Okay. Cool. So, I I'll give the gist of the movie. So, this guy is very skilled uh very talented. The only problem he has is uh he can't hold memory more than 15 minutes. So every 15 minutes he has to take his notebook uh watch his tattoos that he put it on his one and figure out okay what I was doing before the 15 minutes and he does it again and again. If you relate to the AI and AA agents things and all it actually fits exactly how the movie is and how the agents are right now. If you go and watch this movie you don't need to watch YouTube or blogs to understand agents and MCP. This movie actually tells you about everything literally. uh and as this guy has a memory problem uh in the same way the AI uh that we got introduced couple of years ago uh is very good with reasoning computation uh code generation it's it's benchmark as like uh above the par the only problem is the institutional knowledge right the domain knowledge that you have that's that's the only thing we have to be uh a bit problematic so from AI to agent if you look at the uh evolution uh it exploded So it started from prompt engineering first of all uh then there was rags uh now MCPS then multi- aent now it is deep agents uh I recently found out like uh using replet actually you can build a full stack app in 10 minutes that means the by the time you make the instant noodles you already have a million dollar app already working on your laptop so we we got it to this point like it's extraordinarily good now That's AI and agents. So let's talk about enterprise AI. Okay. Uh I don't know how how many of you have this question but most of the enterprises I see the question is okay. A is pretty smart. It's doing uh code generation fullstack apps reviewing your PRs uh doing incident management all those things. Uh right. So if I is doing that much why is the Jira tickets or Apex are not moving on the dashboard right? Why do I don't see the delivery actually? So everybody is speaking about look at like 3 minutes everything is ready. Yeah. Okay fine. Why are my J pics are not moving? Because that defines the business delivery and that defines the return of investment, right? Uh and as you see like uh it's it's from the Mckenzie this year like 80 8% of all companies use AI but they only see like uh 6% of value creation. Okay. So I think this is the problem uh that we have. Uh I have four Jira tickets different ones uh sample ones and you can see the green ones that I have uh marked is basically which LLM is or already trained on like API standards or like uh things very they already know it's a general knowledge right that's fine those tasks from the ticket it can pick up and it can do it now there are second part orange ones which we have to teach them actually so you have you know this but do this in this way so all this kind of an orange ular things will fit into you know your agent extension like skills or like uh but the red ones that's what the institutional knowledge is which sits within the company and within the people. So unless if it picks a task uh if it picks a ticket it has to fulfill all of them. It is so good with uh green ones and orange ones but it struggles with the red one with the institutional knowledge and what I believe is uh right now the coding agents are getting so so better. I feel like if there is an AGI coming the first AGI will be a coding agent for sure. uh so to fix the uh giving the institutional knowledge to uh the agents so we have an industry solution already so this is basically your return of investment on AI pipeline will look like right so you have LLM model quality uh you have agents and you have agent harness and your institutional knowledge sits under conflence jira sharepoint gith GitHub all those things and basically retrieval layer is what industry is telling us will fix that issue. So you build that retrieval layer and then it will fetch all those things and give it to an agent and agent should be able to do it right. So basically uh so 40% uh actu factual accuracy can be achieved through rag or like knowledge graphs actually uh but with a documented uh knowledge base. Now basically if you build a retrieval layer it has to work right now let me ask you a question. How many have you built uh retrieval layer things like rags and mcps? Okay, cool. All of them. Okay, now the question is how many did you build? How many MCPS did you build? How many have you built more than 20 MCP servers? Okay. Okay. So, nobody beats my record then. Uh so what I see is uh mostly in the enterprise organizations and all people are building like 10 to 15 or like 20 MCP servers or like rags uh knowledge graphs on top of their institutional knowledge and to the agent right so the assumption is if we if I can build all those MCP servers and give that agent I don't need to work anything like it will do but the thing is when you plugging this MCP servers basically all this data coming out is mostly undeterministic it's unreliable and it's untested right so especially in engineering nobody does evals actually it's it's more like a data machine learning concept but we don't do eval so for me if you sorry uh uh if you plug uh an MCP or like a rag and all we see whether the output is coming or not rather than is it really valuable actually is it really solving the problem or not. Uh that's the main problem that I see. I'm not saying pointing other people because I was that person. I was like okay let me build all MCP servers plug in my institutional knowledge. I'm going to prove that point that agents can semi-autonomously can continue and fill those zero tickets and finish it. Right? But every time when I build those MCP servers 10% 20% 30% of time it was accurate but rest of the time I was doing the data enter job for them actually. So I was filling the gaps asking the questions. So basically I'm doing more work than actually doing less work. Uh so I think this is the main problem and I I actually was in this fourth stage where I literally started to write the domain context with handwritten actually. So okay let me write everything and prove the point but I got really exhausted of doing it. Okay so how I don't know how many can you relate uh with this pie chart uh but most of the enterprise uh the institutional knowledge is kind of something like this. So 20% if you see it's outdated, 20% it's unreliable. Uh 20% 10% is always duplicated with different places. And the major problem is 40% of the knowledge is always uh tribal knowledge which means people know how things work. So it's it's never documented actually. So in this situation of an enterprise and you build like 100 MCP servers and plug into that monolith, it doesn't matter how many you build, it won't work because basically your whole institutional knowledge is a is a monolith. Uh I think like because you're all from the engineering background so you already know uh the transformation of monolith legacy system to microservices, right? So in the same way unless we break down that monolith knowledge base into some kind of a context blocks which are useful for agents then only we can actually make it useful for them uh for the agents and actually make them semi-autonomously can actually do the tasks. So that said uh we are going to talk in this workshop mostly on that monolith how to break it uh what is the approach to break it and how it is useful uh when once we break it and this is a job we need to do because uh uh the LLM providers will focus on the uh LLM model quality the agents will focus on the harness things and there is a big uh uh retrieval market of 9 billion they're focusing on retrieval but nobody is going to come to your company and fix your knowledge base. You have to fix it yourself, right? So, how can we do it? Uh, okay. So, the demand-driven context uh is what the as a solution I was trying to propose, right? So, basically if I have to give an abstract of it what it is like we have my uh monolith uh services and we have this process of breaking them into microservices. We have waterfall model which we transformed into agile. In the same way when you have a monolith of institutional knowledge how do you transform into a context blocks using an approach. So this is an approach of how we can do it. Uh before starting uh just not an idea. So uh we already tried with some data sets and try to prove this approach works and uh in the March we have published a preprint uh uh in RXV. So if anyone interested in reading academic papers uh you can find it with the demanddriven context or I can I can also give you a link after the uh workshop. Okay. So how does it work actually uh when we are giving institutional knowledge to agents basically what we are trying to do is we're trying to do a push strategy right so we build everything and we push it to uh to it. So in in this approach it's more pull approach which means for example let's say a new joiny has joined your company right how do you onboard a person so you onboard them for a one two days you give some initial orientation and then you tell them like okay these are the confluence links these are the github this is the some some kind of a documentation you have to follow things and all then you just assign a task to that person so but you're not going to tell okay go and get graduated on on this knowledge and come back then I will give you work right so you'll just assign the work item and when you assign the work item the person will start asking questions fill the gaps if the person is very much into documentation he will also fill the documentation for you uh he gradually uh get his knowledge of of the institutional knowledge right in the same way we don't push all the knowledge to the agent rather than we start giving problems to agents like work items and let them actually pull the information from us and once uh pull the information also ask them to document it uh in in a in a better way rather than in a monolithic structure. So if you so that's the four layers. So you have a monolith uh a framework and it actually pulls and actually creates a good better context uh blocks you can actually relate it to more into a legacy monolith to microservices directly if you have to have an knowledge of it. So this is how it works. So this is one cycle of uh how a problem to an agent and the in the first attempt the agent will fail to do it. So it will say you know what you gave me a problem but most of the uh documentation I couldn't able to find anything I couldn't able to do it then these are the things I need to do to finish this task and it gives a checklist of things so we fulfill the checklist like we fulfill this checklist so once it is uh given the problem is solved it will take that knowledge and also it will update that means curate the knowledge in a particular place so that it can reuse or like other agents can also reuse. This is one cycle. So the idea is if we can do it in multiple uh sessions with multiple problems. So it will gradually uh curate your knowledge monolithic knowledge base uh and also document it for you. Uh you can also relate it to TDD. So how many are uh do TDD or like nobody hates TDD right before I uh Yeah. >> Okay. Okay. So, uh in in the same way, right, in in a TDD approach, what we do, we just write the failed test cases. We don't build the product first of all. We just write the failed test cases. Uh we see what is the code that is missing uh for the failed test case to pass and we just give the code and we gradually build the product based on the failed test cases. In the same way we give problems that agent will definitely fail and we gradually uh fill those gaps and at a certain point it becomes semi- autonomous with a good uh institutional knowledge already. Okay. So I think I can jump into uh some kind of a demo already. Uh so I will use terminal. So don't hate me. Uh I think like all of you from engineering background so I think like you'll like terminal. Uh let me switch to terminal. Okay. Okay. So on the on the far left what uh right what you see is how uh under the hood it works actually. So when you have given a problem uh how does the agent will fail? How does it demand for the knowledge uh that the problem has to be solved and a human like a domain expert and all uh fills those gaps and then it will curate uh a new knowledge base for you which is which is much better and then the agent succeeds and you repeat on the next problems. So that is how one cycle uh things. So how I it can be implemented? It can be implemented using any agent. There is no uh it can be implemented on Claude or copilot because it's an approach you can uh do uh in any way you want. At work I use copilot. Uh so I implemented this using copilot. Uh but because everybody I I believe loves more Claude Code. So uh I created this demo with Claude Code. Uh and you can see it's just a combination of skills, rules, uh agents and hooks and some kind of a place to save the knowledge base. On the middle pane, what you're seeing on the top is your monolith basically. Uh this is a representation of your conflence, Slack, GitHub and all. But just for a sake of demo, I just put some flat files that look like them. Uh so that is how your monolith uh knowledge base will look like. On the down, what you're seeing is on a live. So when it's solving a problem how it is actually adding the new knowledge uh to it. So this is how uh it is. So, let me So, what I'm going to do is I'm going to go to an agent. Okay, I'm going to basically give an incident problem to do the root cause analysis, right? So, okay, what I did is you remember in the previous slide there is a GT samples that I showed, right? It's a combination of uh knowledge that is documented, not documented things at all. So, this incident also represent the same kind of combination. So, there is some knowledge that is documented already on your monolith. some it is not there or outdated or things like that and most of it it doesn't couldn't wouldn't able to find because it's never written down actually so when I gave this problem so it uses those skills that I have developed using this approach and it will try to actually first go to your uh monolith actually on the knowledge base and try to find information on what is uh there so think about it like this so First part is retrieval. That means it's already doing what RA and MCP is doing first part. But what else it is doing is after it fetches the data, what it will do with the data actually. So that is a missing part. For example, when you give a new conference links to a new employee, the employee goes there, looks into it, but doesn't find information. But it he doesn't stop there actually. He continue asking questions. So to solve the problem, then then just adding more knowledge and things and all. Right? Those are the next steps missing right now. It's we just stop at retrieval. So this is the next three steps it does actually. So you can see the confidence score is almost 1 to five because it says these are the particular terminologies. I don't understand actually these terminologies and these business logics uh is not needed. So one thing you need to look at here is whatever it has said this is the undocumented information that means it was never written down. So unless you don't do this way you will never know what is not documented. For example if somebody says like okay there is documentation missing we need to write okay what do you want me to write actually so there is so much in the people's head I can't write so much right somehow it has to surface. So when you give a problem it actually surfaces what is not documented and it tells me okay this is missing I need to have a new information there. Uh so what I will do is so it does all the three steps. So then what I will do is I already have a pre-prepared answer very high level prepare uh answer I gave it to it uh of like what is the missing information so okay this is the missing information you asked me uh to solve this problem can you solve this problem now uh I didn't expected this one Uh notifications is yes. I'll just say yes. >> It asks also what fictional name should I use. >> Okay. I didn't see it. See when I did the test run, it didn't ask me the questions. >> Let's see. It knows it is a demo. >> I trained it to Okay. No, it is already what it is doing is already. So you can see on the live it is already adding the entities that uh the new knowledge base has been come into the place. >> So the knowledge base uh manage as a file system right as a system of files. For the demo, I'm just showing it as a file system, but it's basically your MCP servers. Uh the data will stay in conflence, slack or things. Uh you can just plug in and use the same MCP servers or rag and all. So it no need to be a flat file. >> You treat this as your persistence layer for for this, >> right? >> It's like a me do you use any like a memory tool for for this? Yeah, I will show you on the next look like how I'm going to see uh okay so it started from 56 entities or something with the uh this one right now one problem actually surfaced six entities that are never been documented and when I gave that information to it it is able to actually discover curate another five or six uh new entities that were never documented. So it does discovery of the gaps. It also gets information from me and also stores information, new information and all. Uh this is one. Okay. Next let's see this is a busy window. I tried to actually do things but uh it didn't worked out. Okay. So what you are seeing on the on the window is like 14 incidents. You have seen one problem that I solved with an agent, right? The communication. What if I took like 14 incidents uh and I just go and have 14 cycles of this thing and how it does? So if you see on the left side it was the first incident right. So right now it has 1.5 confidence and everything is critical every so basically nothing is documented so everything is critical high the data is missing. So I started giving answers to on the first incident then I repeated for the same second and third and continuously for like 14 incidents but on the 14 incidents it basically actually able to go to a confidence level of 4.4 because first it disco on every incident it got the list of answers for me and also it documented everything for me. So it gradually from 1.4 four to almost like five range of knowledge uh it improved. uh so if you look at the traditional way in traditional way what we do is we solve all the context problem right we have to deal with it first then I have to give it to agent in this one we are moving agent from a consumer to a knowledge manager so you just don't consume from me I'm going to tell you but the whole knowledge management is also your job and you have to do it for Okay, I think we can get back to the slides a bit. Okay, so what we have seen is uh we have I have run one cycle and also I have shown how it look like when I run in like 15 or 16 different cycles, right? But if you have want to do it manually, it would be really painful because I tried it after 15 cycles like nobody would like to actually sit with an agent and you know you have an incident but you won't be sitt with your agent and keep actually asking questions and telling you about your problems right so that is super painful. Uh so but the thing is we can automate this process. So this is where actually it's it's really good and gets interesting. So here is the thing you all we already have all the work items right we have Jira we have incidents uh we have uh customer support tickets like that all those kind of a work items already there right sitting in the archive so why can't we take uh them and actually use the framework uh and uh validate across your monolith database run an automation and see actually what is the state of your actually right Okay, let me see let me show you how it looks like. So rather than actually doing it manually at a scale if we do this approach. So how does it look like? So the demo that you're seeing is almost like everything is preset. Uh for example, I have the demo. I have like a platform operations agent and uh I'm saying okay these are the recent incidents let's say I have 20 recent past incidents I have uh like an MD file or a JSON file right it has all the details of description of it things and all uh comments and everything and the rest of the files are your knowledge base so it's it's a file system but you can also actually connect with the same way the conflence and things and all just for a demo purpose and just showing it as a flat files now what I'm trying to do is I'm going take all those incidents and validate each incident across my knowledge base and ask the agent okay tell me uh how much of the document is good how much of the documentation is I can't trust it or like old or outdated and how much is actually missing not documented as per this incident so let me run it okay it will take some time so it will take three steps actually so one is uh it generates probes which means a basic test it will write to actually test your knowledge. Then it will run those tests and then analyze the gaps actually. Okay. It's a little bit hot in the room actually. I mean the apps it's just like a clever problem I imagine. Uh okay for example let's let's say you have an instant call the notification service is not sending uh customer uh messages to uh SMS service right so the notification service then you mentioned that the agent sees is there a documentation related to notification service >> so it doesn't find that means you never wrote a documentation and notification service I do understand what is a customer SMSS and all the customer notification service when you mention it's a gap because it's never documented or it takes a customer notification service goes to conflence and sees like the documentation how old it is. If it is says like uh it's like one year old, it will tell you look I looked into it. It's like one year old. I don't know whether I need to trust this documentation or not or like incomplete uh documentation. So if you see it scored each incident it took it and it looked at all the knowledge uh base that you have connected and have a consolidated list of like scoring of like okay partially the agents can handle uh the basic edge cases of the incidents that you give uh because your knowledge base is not complete and it will show you how much of the tribal knowledge is missing system information business process what are actually are missing from your uh uh institutional knowledge when whatever is not documented. Uh these are the probes and it will also identify uh what is critical and what is high. This is really important because let's say there is some kind of an example of notification service which I mentioned right it is repeatedly uh appearing in like 20 occasions and you don't have this is the first that you need to fix as per your documentation. So it will also help us actually understand when you're uh breaking down your uh knowledge base you need to understand what is critical actually what I need to focus on first what makes value for me so organized into critical high medium so this is what like uh I showed the flat files but you can also connect it to the various data sources that we So the step one is basically what it does demand extraction. That means every incident it will extract uh the checklist of information what is missing. On the second step is what it will consolidate everything what is missing. So it will create like systems and APS and all and how many are clean, how many are stale, which is incomplete, what is entirely missing. Uh something is tribal. uh those kind of a classification it will also do and it will create a kban board for you. So what happens is so if you want to fix your institutional knowledge base basically you just just like Jira tickets we finish it we actually has to document these missing pieces and all and the the moment you started to so it also saves in the context lake so it also has to build its own uh uh knowledge base and you can see the performance so once you're fixing the sticks on the can uh the knowledge. So that's how so what we've seen is one is the approach first of all which means not the pull push approach but the pull approach how to do it one cycle or multiple cycle how it look like but if you put it in a scale of automation then how much valuable it uh would Okay. Now, uh uh the important question is >> sorry about this. >> Double mic. Yeah, it's cutting out a bit. So, I'm just going to put that one on as well. >> Should I do a voice over later? Actually, >> I think you need to repeat. I have the patience who has the patience for a second. >> So the question is so I was all all the time I was talking about okay it receives the context so we give the information it will store it but the question is where does it actually store it? So I have a very opinionated opinion. Uh hear me out. Uh I prefer uh it has to go to a GitHub repository. Uh because eventually somebody will actually come up with a you know 20 million seedfunded uh SAS solution for you. Uh but before that I prefer uh to actually put it in GitHub as a repository. Why? Because if you look at at a scale, if you want to do this, there will be multiple agents, multiple teams actually contributing to the same knowledge base and there will be conflicts and resolutions, right? So the uh the easiest way to do is using GitHub because it actually comes with inbuilt uh uh PR processes uh review processes things and all. So if multiple domain expert are sitting and uploading the files or like agents are contributing to it the most efficient way to manage is in a GitHub something like a structure like this and the other advantage is also if you put it on GitHub you can also publish it to conference later or like slack wherever you want to publish it to another uh uh solution that you want to use. uh so I prefer to have it on GitHub but if you want to directly integrate it to conflence and all you can also insert do it. Next is uh a meta model. Uh how many are aware of the word meta model in the okay maybe I can quickly show you how does it look like. So meta model is basically something like this right. So and uh how does your uh domain actually structured around like uh is a business process or uh how it is related to a systems how systems are related to uh APIs and how is this uh business jargon or like tech jarens are actually linked to which one. So these kind of a relationship uh meta model is really important. It's not necessary uh for the approach that I have proposed but it's an add-on and why uh you need to have this one is right now think of it as like a map right now your agents doesn't have any map actually to navigate with your uh knowledge base basically what you're doing is you're dumping like uh these many number of files and it need to figure out which file I need to need right but your file structure is actually a representation of your meta model it actually knows how to navigate For example, let's say can you fix this system? It will understand if I make changes, which business processes will be affected and which APIs I need to change or like touch these kind of a things. So, it's also important to have a meta model. If you have it, uh then it will produce more value. So, I strongly prefer to have a meta model along with this approach. Okay. So the last part uh is what is the value it created. So there's a lot of slides that you have seen lot of demos that you've seen. So personally I need to also share like what is the value that I see when we uh I was using it or like uh the other people who I shared with already were using it came back with the feed feedback and told me uh first the most valuable thing is knowing the unknown. So what is never documented is something can be surfaced only by this approach actually. Uh otherwise you'll just end up in u an endless mroboard of like putting tickets on okay this is missing this is missing I need to add it I need to add it and uh keep on doing it. So this is the fastest and better way to discover uh with your previous work items and all what is never documented uh things and all. uh second is uh basically I can now give work to agents rather than I do all those things like uh rather than I become I give the agents all this information let it manage my knowledge management I don't want to be the knowledge manager of it so let let it do it so those are the two big values that I have seen if you want to use it I think like you will also see the those two as the most valuable uh but these are the other things what I seen now. Okay. So, I also need to tell you like what is the drawbacks of also using it, right? First of all, if you are coming from a small team or like if you say like no no no my documentation, my knowledge base is really good. I'm like super happy for you. Uh you're the lucky ones in this world right now with agents. uh for you it might not be really relevant unless you have a very uh very complicated uh uh documentation that you have uh second is I already mentioned the manual manually doing is it's very painful I don't prefer anyone to do it if you want to just try it for testing purposes you can also do it but u automation is the most best way to actually use this one uh this is very early this approach so by tomorrow warning on YouTube somebody would have already posted something differently uh better than me. So uh in the in the uh era of AI nobody knows like uh how long a thesis or an approach or an app product going to survive. So for now I see this is the best approach. Okay. So the whole workshop so we started with uh one pipeline right on the ROI and so the demanddriven context actually sits between this monolith and also the retrieval layer actually and what it does is uh it actually helps you build curated context blocks for you. You can also think of it like a cache database that you have. So every time you agent doesn't need to go and know boil the ocean for fixing an issue rather than if you have a good context block of information most of the time 80% of the time that can be usable because what I also believe is it's always the 80/20% rule so 20% of your documentation is most useful 80% is some corner cases you have to look into it so rather than giving 100% of things you need to figure out what is my 20% of that uh that is super helpful for agent and have it like a cache database uh the context block of it using it and rest of it you can leave it like a links so whenever agent feels I need more information then only it can go and check uh the whole monolithical uh institution knowledge okay so from here what you can take from this workshop is three things one uh I hope uh I makes I made sense of this approach So uh there is a GitHub repo uh which I detailed it out and also a starter guide on it one if you want to go home and try with it you can you can try it. Uh you already have know how the framework works. So you want to go home and just remix the whole approach you can do it and let me also know uh I'll leave this one and I'll join with you for contribution. Uh you have a context gap gap scanner that I showed you which is live already with presets. Uh I think like added like $20 on it. So hit it as much as possible as well. 200 right? Okay. Okay. >> So after $20 so first serve. Uh so all these three you can use uh you can take away from this workshop. Okay. So because this is a workshop so I also would like to want you to try something. Uh what you can try is three things. One is either uh if if you say like you know what I'm so so tired already it's almost like 4 uh it's almost about to go for a party I don't want to do it so you can just go to the context gap scanner uh everything is a preset here you can just try it out hit it and see how it works if you think it can be done better let me know so that we can work together uh or otherwise uh let's say now I I'm very technical. I want to know how it works under the hood. Uh this is a GitHub repository. Uh it's it's under uh maybe I'll just take this out. Uh this is a GitHub repository and it has all the information. Uh plus there's a starter guide also if you want to if you want to try it out. But uh if you still feel like no, I want much more simpler You can also try this one. So you don't need to do anything. Basically take this prompt uh take one of your Jira ticket or incident that you have right now. If you already built MCP servers uh or like uh any other kind of a things, you just use that uh prompt, give it to your agent uh with the incident or a Jira ticket and ask it uh give me the quality of the knowledge base that I have as per this incident or uh Jira ticket in this way and see how many how much of it comes in red which is never documented. So you can you can try also the simple one. I'll just leave it like this. Maybe you can take a picture. Maybe I can switch to the slide if any anyone want to. Cool. >> This is your >> So this approach find it very interesting. So first question is have you already used this way of working at scale or because we've seen most of the toy examples right >> y >> uh I used it uh not at a scale uh I started with simpler because you also need to see what is the scope of it let's say I have an enterprise and I try an enterprise level I can't do it because it's multiple domains things and all even if I do it at domain level I need to understand I tried it at a domain level then even at a domain level there is so much of a domain expertise I need to fill it up and uh fill those gaps. So again I cut down into maybe what is the smallest team that I have and the smallest teams Jira tickets the smallest teams instance and the team's confluence page uh with a bit of a scope then if I drill down the scope then I feel like it's more uh fast more useful but if I do it at a bigger scope what happens is not one person has the whole uh domain expertise so basically it again becomes like uh somebody has to come and uh you know five or six people has to sit down and start doing this things. >> Yeah, I'm a bit concerned that this might denial of service attack your team members in a certain way because our LLMs are fine tuned to keep eliciting information to keep getting more information to ask follow >> so I think it will be hard on the engineers that have to do the question answering and secondly the scanner is nice but still built on the assumption that all of your team members and the rest of the enterprise are still using your enterprise it >> well as planned that they're actually filling in their tickets with all the details etc and I know from practice that is most of the time not the case >> that is true that is true I agree with you even if I go my my assumption is also even if I go to a leadership to buy in like hey can you give me a bandwidth or like uh you know I need these people to actually sit and fix the context. I don't think right at this point of time nobody will do but I think it will happen because slowly I think we are slowly moving towards an agent managers where agents are becoming semi-autonomous or autonomous and we manage them but at the certain point of time somebody has to fix that knowledge because it's not going to come from anywhere you have to so then the enterprise focus will shift towards the gap that's what I started saying I don't think nobody is looking into the problem Yet uh everybody is very focused with agent how good the agent is, how good the retrieval is but how good the context is uh you're not solving it. I think like in down the line in a year or so I think people will realize importance of it and uh the KBAN board will definitely come into reality actually very soon. >> Yeah. Thanks. >> I think actually the same point I think when we look at large search actually documentation actually code. Just wondering have you applied it to the code base? >> Uh I did uh I also applied the code base. Uh but I got a mixed result when I uh so so here is the thing. What happened is when I only use codebase uh it is particularly good or when I only use conflence or like textual data uh like uh uh it gives a good results. But when I combine it somehow actually uh it conflicts because it it creates a theory out of the GitHub repository but the same GitHub repository documentation is also on confluence. So there it gets a conflict of okay what is the source of truth to it code says this should I implement it this way as per the documentation. So then again I need to create an additional skill or rules like okay what is the ranking that you need to give if you see it in GitHub that means that is the source of truth or if you see it if you don't see it then you have to uh look the information in confluence things and all but those are still I'm trying to fix those things actually so seeing the gaps and fix those but I definitely see uh that issue combining those two >> and the second question is um interestingly actually skills because what we find out is actually like you have your kind of like your um like the process that's like run a context and identify the right >> then you do the task identify >> you go back and then you're fixing the knowledge back Okay. >> Uh I think right now the skill that I have built is static but what you are more proposing if I'm not wrong it's like evolving skill right if the skill fails it has to evolve right I agree with you I never tried it but I think like it has to be uh like that because I'm also more concentrating on how to do it at scale. uh the reason is also uh I I want I want the context to be fixed before retrieval itself not during operational. So first when I started with it I I started doing with when operational which means oh I have a work item I will assign to it it will fail then I'll start giving context and all but it takes a lot of time it takes a lot of patience for me so rather than doing it you know what I'm going to fix the context but before retrieving so if I can uh while I was answering her question if you take a team the context that you need to fix is very small so you can use a uh context gap scanner uh kind of a thing and maybe if you're good have a good demoing expert I think like couple of weeks you can actually fix your documentation not like 100% at least like 60 70 80% of a good quality that you can already build it so my proposal would always be don't do it at an operational level uh at an real-time level but do it before retrieval uh itself that is much better in this approach >> y especially sitting in situations where you ask questions that may need I don't know like five or six different there >> right now after Claude Code announced 1 million of tokens in the context window. I know I don't have any problem. So I calculated it uh at an average it's like uh 96k tokens because I tried with different domains actually per domain I see like around 96k tokens. uh if I consolidate everything like confluence things and all uh so easily fits in the context window actually uh I tried to do some experimentation around you know a graph rag put them there rather than just take all the files use a graph rag understand the intent but for me just putting the whole context right now in the window gives you more results than actually doing uh uh rack unless you have a very big uh almost around a million tokens of a context that you want to fit in maybe then you have to use a bit more uh retrieval mechanisms between it but otherwise I think like uh it should be fine. >> I have a question uh I opened your paper and could you explain this graph like comparison between different techniques like domain knowledge strategy knowledge access uh >> uh which one? Yep. Sure. Okay. So, uh I also did the citations from other papers. Okay. Uh so, not directly related but u you have the paper of ACE uh uh which is also does a similar thing. So, uh but ACE is not exactly into uh uh how do you say uh discovery and curation actually uh if I remember it correctly. Maybe I need to refresh my memory. >> What do you mean between uh the difference between domain knowledge and strategic knowledge here. >> Okay. So strategic knowledge. Okay. So what as and all uh are doing is when you are trying to have a conversation with uh AI uh you can see in the Claude Code and all it updates its uh memory or like the relationship with your things like that right. So and also from the chat history it understands what is the most important context I need to remember those kind of a things. So when you are in communication with it that operational conversations with AI improvement they propose. So what I my proposal is not based upon your conversation with an AI but rather than your domain knowledge which is documented actually. >> Uh somebody else has a question. >> Sorry. When to remote knowledge so things are confident how do you ensure that your agent only points to the relevant documentation in your local you have liked >> okay so when I wrote a pipeline for extracting from conflence it also allows actually to give you a date and also last updated who created kind of an analytics on on the space. So you can use it uh to actually put a threshold of like okay on this particular date whatever is is old consider it as an outdated one and let me know. Don't just consider it as an outdated one you let me know because sometimes the document can be stale for so long but it could be an important document actually. So it lets you know but not like take decisions actually right now on this one. So you decide uh which one is stale and which one is not. you don't have like an intermediate layer where in the repo you store this is still >> okay so uh when it is curated in the context uh also it updates with a uh date and also the state of the document like stale active and clean so it also looks into okay this is st I don't I'm not going to touch it and I'll just go to look for any other new other documents are there uh in this one >> you think about how to manage access accessible. >> Okay. So, because it's not a product or a SAS solution, it's just basically GitHub for me right now permissions and things are not difficult to implement because GitHub out of the box gives me who I can give the permission to is GitHub. who can have write read access things and all who can merge those things and all but in case if it evolves into a product and for example context gap scanner as a product and I want you to test it because the the reason why I was using presets for this uh workshop not actually asking you to upload the files is because I don't want to take your IP data on this one right so unless it becomes a product you don't have any problem uh GitHub and all but if you have a SAS solution for this uh then it's between how the SAS solution will manage it right right now but the approach has nothing to do with the access things and all how you implement those access on on the knowledge is up to you >> you have so this is of course about documentation but did you give any consideration about using it on some like central tooling that a company would use like let's say that you have a platform team and you have a CLI that the different teams are using and so now it's used by different agents right >> okay >> and so the agents can also be like well this action is available for a resource but I don't want to do 500 calls just because I have a list of 500 resource okay >> it would be nice if the tool could do that I don't know if if you've given any consideration to this >> I think that is how it has to work uh in an organization you need to have a central solution for it but how you want to do the solution is up to the organization for example we are doing agile right So agile can do by scrum kban or like lean or something and also you can do different apps to do it. The the process is the same but how you do it which method you will choose and which app will you choose in your organization is different. In the same way what we have discussed is the approach. If you want to put it in the organization you can use the approach and you do you can do it in whichever way you want. My point is more like so with this you can identify gaps in your >> right could you use it to identify gaps on your tooling for >> okay uh when you say tooling it's the agentic >> internal tools that I don't know maybe a team is building for the rest of the company >> infrastructure in general right >> maybe yeah >> uh can you give me an example of uh like uh how >> let's say that uh I don't know you build some sort of abstraction on top of kubernetes >> okay >> you don't want your developers to necessarily know what to do with that. Okay. >> Then you have a different CLI or you have something right. >> Um but then like I say I don't know maybe you thought that they would release one of your custom applications or corporate applications one by one >> but a team has grown into using more of that and suddenly they have a lot and they don't want to do that many calls or >> perhaps even the agent is like well this is inefficient. I would like this internal tool to work in a different way. >> Okay. And in that way you would identify like gap in the tool or a performance improvement kind of like this does for documentation >> could be extended actually. >> So uh because uh we have seen the business processes also right. So it can also document business process. The business process is nothing but how uh the process in the application it actually runs and does things right. So you can extend it to also find out the gaps in the business process or like how it works. It could be an extinction to it. >> Sorry. >> How would you ensure that you know changes? >> Yeah. The answer for question is tomorrow on Friday to identify B1 and B and over text that's problem. >> Okay. So uh when I showed the context gap scanner you also saw like an indicator of duplication right? So if today you have a document tomorrow you have version 2.0 zero and something else actually it will find out the same information is having in three different it it will also will find it. If you have only one it is changed it will take the latest updated one because as a human you changed it so it will take it as a source of truth right but if you have three versions of the same document that's a duplication and it will flag it as a duplicate >> right but it's a s how do you ensure the performance Let's say document of 100,000 words >> just change a word like let's say there for example right has changed >> okay >> so you have to find compare those three documents tools you use to ensure >> uh Um I didn't quite get your question actually. Uh is it like the token usage you're worried about like uh the that many tokens that we used how how is what is the costsaving? Yeah, precisely changes maintain maintain that whole database right >> I don't know whatever right my question is well because what you presented is sort of a happy path where you have a cup it and then you reuse it >> but in a while you will have a where you pretend to have that gap but actually it doesn't contain up to date information contains wrong information. So you want to preserve that. >> Ah okay. Okay. Uh so it can flag as per when it is created or last updated. You can set such kind of a filters. But let's say you have a latest document which has a wrong information. >> Right. know that right? >> No that's true. uh but as for example as a human being right so you go and look into documentation you told somebody to look into documentation and the person looked into the documentation and as per the documentation this is be implemented in this way the person will do it right it's not an agent or a human issue >> too late I mean you solved one >> and the agent is trying to assume >> so it's solving it but the solution is wrong Okay. >> So you say there is a scanner fine. Do you run it daily? How much will it cost? >> You you can have another process spot that would try to see like last updated >> uh try to see on some cadence try to run a process that will update the knowledge. >> Fantastic. How much will it cost? >> Uh for for example uh sorry you're saying >> less than than doing more meetings uh every day to onboard someone or That's a strong claim >> but I don't think it will cost that much as >> premise of AI. >> Yep. As I said like uh when I tested it there is none of the domains which cross more than 100k tokens actually. So I don't think we will for example contact gap scanner right you I don't think like you have to do it like on a daily basis or anything even if you run daily basis like 100 tokens and do one scan for example right if you try to start hitting all of them uh all of you the context cap scanner I think you can't even burn like one $1 I think so if I'm not wrong it already had like oh you're ready okay uh okay I'll I'll cancel the subscription now But I >> go back and defend two things. One is a compository system documentation for so I agree with you. So scale you would have to solve this question. >> Okay. which is going to be depend also how fast the how fast the the data change I think not that much you get to like 80% >> yeah use case by use case yeah any other questions >> okay >> how do I know that's uh >> it it >> it actually tries to detail out as much as possible. Right now I haven't actually exposed everything what it did just for the UI purpose. Uh but the all the per ticket what it actually found like uh it writes like a 100 or like 150 lines of uh markdown files and save it somewhere. So that gives you more details in case if you want to know actually. So for for the demo I just put the you know nice UX stuff on on top of it but you also have a detailed information at uh test. >> Anyone else has any questions? >> Can I have one? >> Yeah, sure. Go on. >> Right. >> Easy one. Huh? I destroy >> if I translation right your claim is >> not >> not fill the gaps uh but discover but if you scope down to a team within weeks you can do I mean if you wouldn't feel it and keep asking those questions then make sense right so at some point you have another value >> yeah so you do it one time first of all or like multiple times at first see the whole picture first of all what is the state of your knowledge base first fix it uh at that level then you go into operations right you still can actually uh also you you can still continue doing it with the agent with skills. >> Yep. >> And I think the same process of kind of providing that knowledge is >> happens pretty frequently when a new member joins >> the organization. So >> wouldn't a replacement that be just Zoom calls, transfer them and use them as source. Assuming all the calls or all the knowledge transfers or maybe introductions happen >> or team or whatever the new member join asking questions and you have access to that. >> Okay. So you mean like you can also give all the transcripts rather than uh uh doing the cycle you mean uh that can also be done uh if you're only have the all the time you have discussions in everything is documented meetings transcript itself but I don't think like that's the same case for everyone at least uh >> but easier that I don't know I'm unsure I think the amount of time people spending in teams if you do use the transcripts actually those are the ones actually who which uh have more tokens actually there are so many useless meetings uh the transcripts actually >> totally agree but compression only one and then part of the database >> could be again it depends upon institution to institution right so are you like more into meetings, have solving problems within the conversations and those conversations has the data or like your conflence or things has the data. If you have it, use that those transcripts as your knowledge base and at the same time like the compression actually that works actually that that is more useful. Yeah. Anyone else? Any questions? No. All good. Uh then uh thank you so much uh for attending this session.

Demand-Driven Context: A Methodology for Coherent Knowledge Bases Through Agent Failure

TL;DR

Takeaways

Vocabulary

Transcript