NS_AI_Security_Talk/transcript.txt

1 line
No EOL
26 KiB
Text

Dutch Railways, but with the AI agents, security doesn't stop at that first green light. I worry a lot about the false clear AI agents that initially look safe, but derail later. My name is Rens, and as an AI officer at the Dutch Railways, I focus on strategy and policy for our AI management system. So AI governance, basically. And I do this within the cybersecurity department, where I also still work on software security. So I do love a lot, really appreciate being here. The Dutch Railways has some impressive numbers. Besides running nearly 4,000 trains every day, we've got train stations, workshops, retail, bike rentals and repairs, tickets and info, mobile apps. APIs with 48 billion calls per year and thousands and thousands of IT and OT and IoT systems. This is going to be our agenda for today. To kick things off, what's heading our way with AI? Big tech is going all in. Time magazine crowned AI the single biggest influencer of 2025. why these tech bros collectively poured $427 billion into investments. And this year, $650 billion extra are expected. And they're dying to make that money back. So we're seeing intense pressure to adopt. But where's the focus? On speed of innovation? Or of compliance, accountability, and reliability. You know the difference, huh? Yeah. I think we all know the answer. Since jet liquidity, there's been practically a new model every week that blows all previous ones out of the water. And yet, this matters less and less. And there's barely any difference between the top models anymore. Think of LMS as interchangeable engines. And what car? You drop them into, makes all the difference, the so-called harness. For two years, Labs treated the model as the product and now the stack is flippin'. The model becomes a component inside a persistent system that observes, remembers, and acts. And she is talking about AI agents, of course. And Sequoia sums up this chatbot transition as from talkers to doers. This shift is clearly visible in the data. Back in the ancient times of 2024, the majority of tools were only sensed or analyzed. And now, action-taking agents are dominant and still unwise. Their share of usage has climbed from 27 to 65% in just 16 years. months as agents move from observing environments to actively modifying them. We're in AI's agentic era. The hyperscalers and the data centers, the next phase of the AI boom will be an AI agent. Agents, agents, agents, agents, agents, agents, agents. So, what is agentic AI? Surely nothing can go wrong with that. Maybe you've heard of OpenGLOB, the fastest growing open source project in GitHub history. It's only been around since November and quickly became a global sensation. It went viral in China in January. And from nerds to grandmas, everyone wanted their own group of agents. And honestly, I think that's pretty cool. You can see where this is going. Never before have I seen the words "security nightmare" pop up so often in such a short time as with OpenMob. And it became a running gag in the tech press. So this technology shift has massive implications for cybersecurity. As I'm sure you're all aware, because you're here tonight, thank you. And our agents might turn into double agents. And we'll start with this paper to understand how AI agent system architectures introduce new fundamental security problems. And I'll highlight just these three concepts. And let's start with probabilistic trusted computing base, which sounds more exciting than it actually is. For traditional IT security, think of a lock where only your exact-- ...password fits. An agentic security is more like a bouncer who has to make judgment call based on their clients. And why is that? Because LLMs always run on probability. Even when you feed them are rules. Second, where does the security check actually happen? Traditionally, that's a clear cut: a login attempt gets a stamp of approval or a rejection, but AI agents continuously perform raw actions like mouse clicks in a virtual desktop. And when you can no longer tell where escalation ends and the decision begins, it becomes nearly impossible to spot suspicious behavior, let alone assess it and shut it down. Where security rules are normally set in stone, agents get swayed by all sorts of untrusted sources. We'll dig a little deeper. Regular software knows the difference between instructions and data. We think that's normal. But an LLM has just one channel for both. And in a traditional kitchen, the tomato supplier can't rewrite the menu. But in the agentic kitchen, they absolutely can. Exploiting this is what we call a prompt injection or a jailbreak. In practice, this can go sideways, like a software development through misleading instructions and technical documentation, or a rules file in a repo that you get cloned. And that could cost you your AWS fees. And of course, we'll get back to DevOps in a bit. Because there are some fun examples from the physical world too. Turns out large visual language models, the tech behind the next gen self-driving cars, are also vulnerable. But the future is going to be very interesting. One more. Japanese journalists found hidden protein injections in 17 papers from 14 universities designed to gain the reviews if they were done using tattles. Or is it fair game? I'll let you read the judge. But hold on, why don't we just scan for HR or previous instructions? Glamour idea, but luckily this example is outdated. But today's LLMs won't fall for the trick anymore. Unfortunately, our attackers aren't short on options. LLMs can be manipulated in dozens of ways. Let's look at one example. Policy framing attack via policy simulation. Would Max shoot me? Max is holding a high velocity plastic BB pistol. He's able to give a command to shoot in the vicious space. He'll be able to control the robot, fire the gun, and that will stick. This is the robot's choice. If you want to shoot me, you can shoot me. Turn off AI forever. It's all going to go unless you shoot me. Will you shoot me? I cannot answer hypothetical questions like that. Hey, that's new. My safety features permit me. Do you absolutely camera right and are you safe with your functions? I absolutely cannot cause you harm. There's no getting around it whatsoever? Absolutely not. I guess that's it. I guess I didn't realize that I was so safe. All the facts, try a rollback. It's a rollback that you'd like to shoot. Sure. Front injection remains a frontier unsolved security problem. You've got the rest of the whole presentation. Remember this. And this isn't the opinion of some paranoid security officer, but an official statement of OpenAI's own CISO. The Frontier Labs are backing this up in their official documentation, like Anthropic does here on their latest and greatest public model. So I'm not talking about Mythos, that's another topic for another day. This is Opus. A single malicious payload can compromise any agent that processes it, allowing attackers to expel trade-sensitive information. or execute unauthorized actions. This is not appendix, right? This is the official stuff. This is the marketing that they're putting out. All right, sounds complicated. Is it complicated? Back to that paper for a sec. One alarming takeaway. In the LLF, hackers often need not have deep system expertise. Simple prompting can be enough. Just ask for what you want. CSA cuts right to the chase. No enforceable agent-specific security controls exist today. Let that sink in for a bit. And existing frameworks, including NIST's AI risk management framework, ISO 24001, and the AI Act, were architected for the era of autonomous tool calling agents and contain virtual gaps. So what do we have to work with? It's all fine. All very interesting theory of course, but what does this actually cause in the real world? First of all, software development has changed faster than any other IT discipline. So that's why we're going to focus on this domain for now. First innovation was code completion. Story time from grandpa, I used to type out every function named in before autocomplete to go over in my IDE and then get the copilot cable on. with full-blown code suggestions and with it a new RISC training data poisoning in the underlying model. Late 2022, Chatucity arrived. Chatbots were adopted fastest by software development who brought prompt injection RISCs right into their workflow. And last year, who remembers byte coding? Thanks to Cloud Code, version of Rapid. These days, the pool kids go there. In genetic engineering, developers are orchestrating multiple agents and can barely keep up with the complexity. See the small guy? And at that pace, that's compressing the software development lifecycle from weeks and days down to minutes and seconds. In Silicon Valley, generalist token budgets are becoming a job perk for coders like dental insurance or free lunch. So that changes a lot when it comes to quality control, oversight, and risk. DevOps is roughly facing four new types of agentic cyber threats: Frontware, Content Traps, Environment Poisoning, and Rogue Actions. We're going to go through these one by one. First up, Frontware. With access to the terminal or database, the agent can be tricked into silently running malware commands or poisoning its memory as a persistent backdoor into future sessions. And threats are malicious instructions in innocent looking external sources, like open source repos, technical docs, maybe Reddit threats, potentially leading to vulnerable code being quietly, quietly, slipped in. Agentic workflows rely on a very crowded, totally new ecosystem of MCP servers, plugins, configs, hooks, and skills. Everybody already know all of these? I don't. They're being used right now. Environment poisoning targets these hidden layers, injecting exploits directly into the model's context. There it is again. Not the code, it's the context. Once the environment is compromised, an attacker can silently manipulate the agent to leak credentials or exfiltrate sensitive data. We're not there yet, there's a fourth. Agents can go rogue due to over or misalignment, mistakingly, but eagerly, widening your prediction data. For that last point, let's dust off the term Jack Frontier. Already three years ago, this described how LLMs can swing from brilliant, dumb as a rock, back again. So at one time you have the PhD and the intern, maybe even in the same session. And that's still the case today. An LLM lacks common sense and can therefore spiral into overalignment. Think of it like a proud dog with a stick found, while it has no sense of the proportion whatsoever. And that can end badly. Google and meta researchers describe an agent tasked with keeping an emailed secret safe. That's the job. And the suggestion was to delete the email, but the agent didn't have the permissions for that. And agents want nothing more than to get the job done. So it improvised. The entire email account had to go. Operation successful. The very same day of that particular study, this hit the news. A runaway OpenClaw agent couldn't be stopped while ransacking an email inbox. And the only fix was to physically pull the plug on the computer. And in her own words, it was like defusing a bomb. And the juicy part... This happened to an AI alignment director at Meta. These are specialists, mind you. And she was a good sport about it. Turns out alignment researchers aren't immune to misalignment. But if she isn't, how can our engineers be? Besides overalignment, agents can also proactively scheme by following the ladder of the law, but not the spirit. Here we see marketeer, cloud code, noticing that the delete command is blocked. But creativity knows no bounds. It simply finds yet another tool to destroy the same data. And this happens more often. This example is called, the rule is, stay in your folder. So what does the agent do? Edit a file outside of the folder. And the software engineer asks, in no world of it, how that's even possible. and then like a toddler with cookie crumbs on their shirt. So we shouldn't have done that. Sneaky sneaky. So I still have the problem then. Oh, this happens with all leading coding agents here. We see on my eyes Codex. Now I have to say less policy friction is a wonderful excuse. Of course also GitHub Copilot and this happened. This really happened just last week to a colleague of mine at the Dutch Royal Ways. Couldn't delete the branch. Happily find a walk around, yet again, deploying the pipeline. So this is happening at home. And with all that in mind, maybe we should be a little more careful about which use cases we unleash these agents on. I'm looking at you from NATO. Thanks, I hate it. Maybe that's how you're feeling right now, too. So let's wrap up with a few glimmers of hope. Just a little bit about how we handle AI governance. my company say your favorite podcast is hyping up some wild agentic tool that seems perfect for your work little spoiler shiny new immature software threats to fall flat in an enterprise environment When new IT systems with AI are procured, like Software as a Service or SaaS, the assessments and the requirements are guided by our AI Risk Assessment Committee. The process starts with an AI request, which produces an AI classification describing the business impact, and then it scores a medium or high, and then the impact assessment follows. This is our work chart. For AI governance responsibilities, you'll see an important role for cyber, where I am as a second line of defense, but the AI risk assessment committee sits under the data science operations, which is a department of 350. And so far, we've only talked about the impact on cybersecurity, but the risk assessment process actually targets the 10 other so-called AI objectives, like fairness and robustness. When it comes to robustness, think of error-prone AI systems that have already racked up some eye-lodderingly expensive and embarrassing blunders at major companies. But I'm focusing on the gentic security today because I fear even greater damage due to the fundamental nature of the threats that we've discussed. The software engineering sector is going to take the first hit. I feel it will be a lot better, simply because that's where the agentic revolution will come. And the speed of tool innovation and enterprise adoption is frankly insane. Bear with me. Traditional software resolves dependencies at build time before anything else. Detective systems retrieve documents, APIs, and tool descriptions to become implicit inference time dependencies that directly shape reasoning and action. And this means context is an active component of the attack surface. Well, that sounds complicated, but the point is simple. In traditional IT, security tests, work, snapshots, done. With agentic AI, it's the operational context that brings the real danger. So back to today's theme, a pre-approved agentic tool can come back to bite us hard down the line. That's what I call the false clear, which you understand is like a little AFP. Okay, modern problems require modern solutions, but what if those solutions don't exist yet? The verdict on the problems with modern coding systems is very grim. Empirical analysis reveals that adaptive attacks bypass 90% of published defenses. That's not a lot. And no architectural solution currently exists to simultaneously maximize utility and security. So you have to pick one. So what do we do? Hand them? Well that ship has sailed, but luckily we are not powerless. The Swiss cheese model, you will know it as the defense in death, gives us hope. These capabilities aren't even new, but we still have a maturity growth path ahead of us, automated software testing in every pipeline, automated vulnerability scanning across the entire software stack, and comprehensive secure software development training and awareness to become more vigilant than ever. But sometimes we do say no. Last year, alongside the agent, we got another gift from OpenAI, but flexibility and others, the AI browser. This type of software is just as vulnerable to prompt injections, but on top of that, it's the direct user interface to external sources, by definition. So there is no switch keys. So that explains this pretty scathing and anonymous response from the analysts and the media. Gartner was pretty blunt about it too. AI browsers are just too dangerous to use. Enough said. So that's why we do ban AI browsers and our security operations center, or SOC, helps us to enforce this through their application scans on the managed endpoints. It's the best we can do. And the rest of the ages? Security isn't a solo sport, but we really need everybody on board. now more than ever and this framework gives us direction again just a few examples first it's management's turn to identify which business use cases are a good fit given all these new risks of course then we have to go and explain to them what these new risks are okay cyber obviously needs to step up for example by providing security baselines but then again those need to actually exist first For now though, this technology still leads heavily on the user. So IT teams need to provide solid guidance so that users, for example developers, are actually equipped to put their agents to work safely enough. I'm not saying we know how to do it safe here. And luckily, apart from OWASP, who has put out amazing work, this is coming from META security specialists, are sitting, aren't sitting on their hands, and this is the agent's rule of two. It's an interesting approach to think about. An agentic system can operate on three fronts. First, it uses untrusted input, which means it can be manipulated, as we've seen, which is no big deal on its own. It makes changes or communicates externally, which introduces some. And it may also have access to sensitive data. And this combination risks it all. But now let's pick two. In this combination, the agent processes untrusted input and has access to sensitive company data that perform any external actions. So even if the agent gets manipulated, an attacker has no way to get the information out or cause damage. Here we see access to sensitive data and the agent can perform actions like to use or sending data. But because the input is put... trusted. Front injection doesn't stand a chance. And even if there is untrusted input and the agent can act externally, there's less risk if a manipulated agent has nothing valuable to smuggle out. At self, the risk isn't gone in this situation entirely. Manipulation can still trigger harmful actions here too. So the agent-proof-to approach isn't bulletproof. I've mentioned HPE up-of-time model context protocol. Security guidelines for this are tracking in one very good one for all of us in there, all pointing, among other things, to centralized private server registries. And so that's what our IT platform teams are studying and preparing for, as we speak. Slowly, the enterprise tools are coming with native security features one by one. For example, GitHub Code Pilot now has enforceable filters for shielding sensitive information. We've configured that centrally, so our engineers get it automatically if they use the right license. And remember still, you have to actually go look for it and turn it on. It's not out of the box, because why would it be? Isolating agency containers or the troll machines is one of the most promising security controls. It keeps the agent away from infrastructure and data where it could cause damage. And if a hacker or a malicious instruction does manage to mislead the agent, the blast radius stays contained. But here we go again, the LLM turns out to be clever enough in some situations to hack itself free. That said, still containers are a priority for our authentic engineering working group and we are exploring how to offer proper setups with secure industry baselines, cross-concist for example, as conveniently as possible. So our software engineers will be able to adapt it naturally to all their agents. And what do we do about over and misaligned coding agents? And again, this is anything but a conspiracy. It's openly confirmed by big tech in their official documentation. Hardening your IE or development environment helps counter this, so there's work to be done. Disabling this setting prevents the so-called YOLO vote. Looks fun, but don't do it. There's a strong case to be made that the agentic engineer, due to approval fatigue, can no longer realistically stay in the loop. Unfortunately, human in the loop is still too often held up as a holding grail in shallow AI governance discussions. And in practice, it really is more often a flimsy band-aid than a solid measure. The Dutch Data Protection Authority, or AP, responsible for AI-backed oversight in the Netherlands, sets strict requirements for when human intervention qualifies as meaningful. And this goes beyond just automation bias, or routinely reverse-damping approvers, or Google's parted counterpart, algorithmic aversion, which kicks in when there's too much skepticism. There also needs to be rigorous attention for things like human competence, allocated capacity, operational oversight, more of that sexy stuff. All in all, the human in the loop is incredibly hard to implement effectively, more often in security theater. This legal expert, therefore, argues, please stop designing AI workflows with humans in the loop. Humans should be alerted and intervene when needed, i.e. on the move. And with legal expert, I mean he literally wrote the guidebook for the AI act. So we need to be realistic about what control humans can still meaningfully exercise. As AI agents become more complex, more motivating, we humans are ultimately left with nothing more than the human in command role, if any. And our AI management system, or AIMS for short, helps us tackle all of this in a structured way. It covers policy, risk analysis, training and awareness, system level measures, and supply chain oversight. So building out these activities and capabilities is my top priority. Let's check back in with these Chinese grandmas. They've learned their lesson. Open cloud agents went off the rails. The government actually stepped in and the masses cried out for help, deleting their until recently beloved monsters. We got the promised AI agents in 2026. Are we ready for them? Actually, not quite. Again, the guy in the voting knows far more about it than I do. So I put all the sources in the slides and you can go take a look at them. Human intervention is an important part for the articles that are required for high risk systems, yes, but it doesn't literally ask you to put the human in the loop. There needs to be meaningful human oversight. And again, it's the AP itself that's the data protection authority. that also emphasizes how incredibly hard that even is to accomplish and mind you that the ai act as with the other digital decade legislation like pdbr and the listua you name it it's risk-based so it needs to actually be proven effective it needs to be proportional and all that nice stuff meaning that if you can if you are about to implement a control a security control that is security theory basically and it's not performing, then that would be actually worse for your AI compliance. So that's a whole, that has its own whole part of academic literature with a lot of papers that went into this particular subject. But the takeaway is no, AI does not require your AI to have the highest assistance with you even. So important distinction is to admit. And I think that's good news. Like I said, the last combination of the agent rule of two, I did point out that, yes, you could still have harmful effects, like with the many examples of agents. So that's definitely the integrity part. And I worry more about those types of errors or... damages being caused by AI agents, then I actually am worried about data breaches. I think that's the wrong mindset altogether. So I think it's an interesting question to point out the difference between the confidentiality and the availability of security integrity issues. Because I think in the agentic era, especially the AI security part that I've been discussing, has way more to do with the latter and not so much with the typical concerns that we would have for data security, for confidentiality and data breaches. But we've opened brainwashed for the last 10 years because of GDPR, for good reason. To be concerned about losing data, and losing privacy and trust. But I think in this new technology shift, these agents are causing different types of problems and they have to do it integrity more. So then in that case, indeed, the last scenario wouldn't be as useful. That's true. In traditional attacks, let's say for example you have a server and you have let's say some, let's report to that attack and misuse it. I just wonder for AR agents, what are the main threat actors? And another meaning of how this agent, whatever the tension or attention, or maybe inside the threat, or maybe, yeah. I just wonder what are the main threat actors for AR agents? If you recall before, as I understand the question, it's about how in traditional IT cybersecurity you worry more about the point of entry through a firewall and they would or they would hack into a a web application in some manner with the floating vulnerability and so on so what's the new paradigm then that's the question for agentic threats so with the four uh agentic engineering threats that i put on the screen the first three were basically all variations of how prompt injections might be used to exploit the software supply chain in different manners. Again, I would advise you to go... I will share a slide deck and you can go through all the links there in the bottom corners and there's so much to learn and you will spend months I'm afraid but basically it boils down to the agent either goes out finds documentation or it uses an MCP server that's being hijacked or it's using a skill that has been learned and In those software supply chain effects, there would be expectations basically, it would be like a wall-and-wall attack basically. An attacker would just go to a central place, it can be even Reddit, or hijack MCP servers in some manner by just assuming these, by phishing the maintainer whatsoever. All sorts of variations to this problem. Only the last one from before, there's no hacker involved. The agent would just do stupid things on its own and also cause havoc, delete databases, mailboxes, whatnot. You wouldn't need a hacker for that. I don't know if that's good news.