Margin of Safety #45 — Two Security Investors' Take on Anthropic’s Code Security Push
Will AI kill cybersecurity?
You don’t have to be a security investor to have noticed the stock market waves coming from Anthropic’s recent spate of capability announcements. It’s tempting to view Anthropic’s recent code security push as an entry into the AppSec market. But economically, Anthropic is not in the vulnerability management business. It is in the Claude consumption business. Code security likely emerges for two reasons. First, it reduces enterprise hesitation by addressing a key buyer concern. Second, the model capabilities needed for code security overlap cleanly with the core capabilities of reasoning about code, and code security provides an excellent test bed for those capabilities1. Security is serving the goal of developing and deploying Claude’s broader coding capabilities, which target the multi-trillion dollar market for software engineering skills, rather than standing alone as a priority monetization vector.
We have seen this pattern repeatedly. GitHub added baseline code scanning not to become a full security platform, but to make the developer ecosystem more trustworthy and to get ahead of a weakness that may have allowed for competitive entrance. AWS embedded foundational security controls to accelerate cloud adoption. Modern browsers introduced sandboxing to make the web safer to use at scale. In each case, the platform raised the floor while leaving the enterprise risk ownership problem largely intact.
Anthropic appears to be following the same playbook. A frontier lab valued in hundreds of billions has limited incentive to aggressively pursue a maybe ten or twenty billion standalone AppSec TAM, particularly when doing so could create ecosystem friction. The more likely outcome is continued investment in (1) developer-adjacent safety features that unlock usage (2) code security to the extent it furthers the core goal of accelerating models’ abilities to reason about complex codebases.
Why the improvements still matter
None of this means the progress is trivial. Directionally, the models are getting meaningfully better at producing secure-by-default code, identifying obviously unsafe patterns during generation, and incorporating more context into their reasoning about risk. There are early signals that the SDLC itself is becoming more agent-aware.
For developers, this translates into less first-pass vulnerability noise, faster iteration velocity, and lower psychological resistance to AI-generated code entering production. Those are real and meaningful gains. Where the market sometimes over-rotates is in assuming that better code generation meaningfully solves enterprise security. In our perspective, it does not, at least not in the way large organizations actually experience risk.
At scale, the hard problems are not confined to whether a single function contains a known vulnerability pattern. Enterprises care about cross-environment risk aggregation or mitigation, policy enforcement across thousands of services, runtime correlation, audit evidence generation, identity-aware access modeling, and governance orchestration across increasingly heterogeneous systems. These are largely issues that emerge only once software is deployed into messy, dynamic production environments. Generating safer code and governing enterprise risk are structurally different problem classes. Frontier labs are optimizing the former. Enterprise security vendors, for now, still own most of the latter. And notably, most of these problems require skills beyond reasoning about large codebases. Reasoning must instead things like include networks and IT deployments or international policy languages. These may well become priority targets for model development; after all, reasoning about deployed IT environments is a key part of debugging production systems and we’re certainly seeing interest in agentic solutions for that space. However, it’s not currently at the core of coding model investment.
Where pressure shows up first
These distinctions matter when thinking about where disruption lands inside the cybersecurity market. Traditional pattern-heavy SAST is an obvious pressure point. Large language models are already reasonably good at reasoning through many common vulnerability classes that legacy tools historically detected through syntactic or regex-heavy approaches. The capability of static analysis does not disappear, but the standalone SKU becomes harder to defend. Over time, we would expect increasing bundling, embedding into developer platforms, and aggressive downward price compression for the lowest-differentiation offerings.
Software composition analysis (SCA) remains necessary because open source risk is not going away (and where tensions2[2] between open source and model-powdered vulnerability scanners are unresolved, to put it mildly). But developer-only dependency scanning looks increasingly insufficient in a world where the real question is exploitability in context. The more durable platforms are likely to be those that combine reachability, runtime signals, and environment-aware prioritization rather than stopping at inventory.
A similar dynamic is emerging for standalone ASPM and SBOM workflow tools whose primary value proposition is aggregation and prioritization. LLMs are becoming surprisingly capable at stitching together context across disparate sources of truth. That creates subtle but real bundling pressure. Vendors in this category likely need to move up the stack toward risk intelligence, exploitability modeling, and automated remediation if they want to remain structurally differentiated.
There is also a slower-moving but important pricing implication for seat-based DevSecOps platforms. AI is changing the software production function. Fewer engineers can produce more output, and more of the workflow becomes automated or agent-driven. Any business model tightly coupled to per-seat expansion should at least be stress-tested against that trajectory. Usage-based and automation-aligned pricing models appear better positioned over the medium term.
Even parts of attack surface management may feel early pressure at the lowest end of the signal spectrum. As models improve at reasoning about system architecture (e.g. service boundaries, trust zones, privilege segmentation, and network exposure) they may increasingly help developers avoid the most obvious classes of external misconfiguration. That does not eliminate ASM as a category, but it could compress some of the more commoditized findings over time.
The paradox: AI may expand security TAM
The piece many observers miss is that none of this necessarily shrinks the long-term security opportunity. If anything, the opposite may be true. The cost of producing software is collapsing. When production costs fall, volume expands. We are already seeing early signs of what that means: more services, more agents, more ephemeral infrastructure, and more machine-generated code entering production environments at unprecedented speed.
Even if each individual artifact is cleaner than before, the system-level attack surface is growing faster. Historically, security spend has tracked complexity and exposure far more reliably than it has tracked raw vulnerability counts. From that perspective, AI may reduce local bug density while simultaneously expanding the global risk surface that enterprises must manage.
Where frontier labs likely go next
Looking forward, the most probable path for frontier labs is continued investment in areas that directly support model adoption: stronger code reasoning capabilities, more secure code generation, stronger inline detection, early agent guardrails, basic dependency awareness, and developer-time policy hints.
What seems less likely in the near term is a full push into heavy enterprise control planes such as runtime security ownership, identity governance, or SIEM/XDR-scale platforms. Those markets are operationally intensive and only indirectly tied to model consumption.
Anthropic’s move is therefore important, but not because it eliminates the need for cybersecurity. It matters because it raises the baseline expectations for what “secure by default” should look like in software creation.
Conclusion
In the end, Anthropic’s push doesn’t shrink the need for cybersecurity, it shifts where durable value lives. As frontier models make code increasingly secure by default, the lowest layers of pattern detection inevitably commoditize, but the system-level risk created by explosive software and agent proliferation only grows more complex. The control point in security is already migrating upward, away from static developer tooling and toward platforms that can continuously understand identity, runtime behavior, policy, and business context in dynamic environments.
Code security is interesting because it contains code reasoning challenges (is there an exploit? Can you fix the exploit?) that stress test coding capabilities and are often validation friendly, meaning it’s easy to agree on whether a model did a correct job. Think of it as similar to the startups offering model playgrounds as a way to help LLMs build domain expertise, with the existing vulnerable codebase being the playground. This makes app sec highly interesting as a way to measure and improve core capabilities wrt. code manipulation. Since those improvements will likely be transferable to model coding capabilities in general, the improvements and their space have a ton of economic value — probably far greater than the AppSec TAM.
As examples of recent FOSS problems between maintainers and agents, see the termination of curl’s bug bounty in response to false positive reports from agents: https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty/ or this humorous but sad issue in which an agent tried to shame the matplotlib maintainers into changing their contribution policy that humans must oversee contributing agents: https://github.com/matplotlib/matplotlib/pull/31132



