Traditional backup and disaster recovery approaches were designed for more contained environments. Today, organisations are operating across SaaS platforms, hybrid cloud infrastructure, and increasingly autonomous AI-driven systems, introducing new layers of complexity for Infrastructure and Operations (I&O) teams.
This shift was a key theme at the Gartner IT Infrastructure, Operations & Cloud Strategies Conference (IOCS) in Sydney, where I shared perspectives on how resilience strategies need to evolve alongside agentic AI adoption.
In this blog, I outline the key themes from that session and what they mean for I&O leaders who are rethinking how to protect critical data, reduce operational risk, and maintain continuity in modern environments.
Why Traditional Backup No Longer Cuts It
The environments we’re asked to protect have changed fundamentally as digital transformation accelerates. Critical data no longer sits in a handful of on‑premises systems. Today, it’s spread across SaaS applications, DevOps pipelines, identity platforms, and cloud infrastructure. At the same time, data is growing faster than teams can manage, making it increasingly difficult to maintain consistent, comprehensive protection — and harder to know what’s truly protected.
The cost of that complexity is climbing. Gartner predicts that by 2029, enterprise backup and recovery costs will rise by 40%, driven by expanding data volumes and the growing burden of managing fragmented environments; this makes consolidation, prioritisation, and long-term resilience planning increasingly critical for I&O leaders.
But cost is only part of the challenge. When an incident occurs – whether it’s ransomware, accidental deletion, or a configuration issue – many organisations still struggle to restore what matters most. The impact is felt across the business: missing customer history, blank analytics dashboards, corrupted storage, deleted databases, and even disrupted operational systems.
These gaps point to a broader issue: Traditional recovery approaches must keep up with how modern organizations operate and their digital transformation strategies. As data estates become more distributed and interdependent, resilience can’t rely on coverage alone: It needs to prioritise what matters, surface risk earlier, and adapt to an increasingly complex environment.

Four Pillars of Infrastructure Resilience
In my session, I outlined a simple but critical shift. Resilience can no longer be treated as a reactive function. It needs to be proactive, unified, and aligned to how data – and risk – actually flow across modern environments.
To do this effectively, organisations need to rethink resilience across four core pillars that work together to reduce blind spots, prioritise recovery, and prevent disruption before it occurs.
1. Protect the Data That Runs the Business
Protection today must extend beyond traditional infrastructure. Critical data lives across platforms such as Microsoft 365, Azure, Power Platform, and identity systems, often spanning multiple cloud providers. Consistent protection requires multiplatform coverage and resilience by design, including capabilities such as air‑gapped and immutable storage.
2. Identify Risk Before It Becomes Downtime
Visibility is what turns protection into resilience. Without a clear view of where gaps exist, organisations are often reacting to failures rather than preventing them. A unified approach to identifying backup risk – across cloud environments and workloads – allows teams to surface issues early, reduce operational overhead, and avoid downstream disruption. In larger or more complex environments, this can translate into meaningful efficiency gains by reducing manual administrative effort and repetitive recovery tasks.
3. Recover What Matters First
Not all data carries the same business impact, yet traditional recovery approaches often treat it that way. Resilience depends on the ability to prioritise which systems, users, and data are restored first to minimise disruption. Modern recovery approaches are designed to accelerate this process, enabling faster restoration times and ensuring organisations can resume operations with minimal delay.
4. Prevent Access-Related Breaches
As environments become more distributed, access is a primary vector of risk. This is especially true with the rise of AI-driven workflows, where new applications and agents are introduced at pace. Preventing access-related breaches requires stronger guardrails, including visibility into what is running in the environment and the ability to manage risks introduced by expanding application and agent footprints.
Agentic AI Is Expanding the Risk Surface
As organisations adopt AI more deeply, resilience strategies need to account for a rapidly growing layer of complexity. Agentic AI introduces new patterns of access, automation, and interaction that can significantly accelerate efficiency and scale, but without the right governance in place, those same capabilities can also expand the risk surface.
Agent Sprawl and the Visibility Gap
Agent activity is growing faster than most organisations realise, often without clear visibility into what those agents can access. During my session, I used a demo of the AgentPulse Command Center to show just how quickly this can scale across an environment.
As more agents are deployed across SaaS platforms and cloud environments, it becomes increasingly difficult to know what is running, where, and how it is behaving. Left unmanaged, this sprawl can introduce security, compliance, and operational risks that traditional controls are not designed to catch.

A Layered Approach to Resilience
Visibility alone isn't enough without a structure to act on it. This is why resilience should be framed as a unified model that extends beyond infrastructure into three coordinated layers:
- Traditional technology protection for safeguarding infrastructure, workloads, and core systems
- Information governance covering data protection through classification, access controls, and policy enforcement
- AI governance overseeing agent behaviour through runtime inspection, enforcement, and lifecycle management
This aligns with broader discussions at Gartner IOCS on agentic AI in observability, reinforcing that successful adoption depends on governance and operational practices, not just technology.
Preventing Risk From Agent Sprawl
As the number of agents grows, so does the risk of unmanaged expansion. In the presentation, I highlighted agent application sprawl as a key challenge if not addressed early.
Preventing this type of risk requires more than static controls. Organisations need visibility into agent usage and the ability to apply governance throughout the lifecycle, from deployment to ongoing monitoring and control.
Rethinking Resilience: Where to Start
If there’s one takeaway from my session, it is this: Resilience is not a backup line item. It is a strategic priority that needs to be treated with the same rigour as cloud strategy, identity management, or AI adoption.
Throughout the session, I framed the conversation around four diagnostic questions designed to help leaders assess where they stand:
- Where is your most critical data, and is it all protected?
- If all that data was unavailable, could you recover?
- Do you know where your risk is across those systems?
- What guardrails do you have in place to prevent and recover from a breach?
Each question maps to one of the four pillars I walked through. Together, they’re designed to surface blind spots quickly and help organisations prioritise where to act first.
The broader theme of the conference reinforced this urgency. With the skills gap widening and environments growing more complex, the case for investing in platforms that consolidate visibility and reduce manual overhead is only getting stronger. Resilience strategies that depend on fragmented tools and reactive workflows will struggle to keep pace.
From Reactive Recovery to Proactive Resilience
As the complexity of agentic AI, hybrid cloud, and SaaS accelerates, the path forward is clear: Protect critical data, surface risk early, recover what matters first, and apply AI governance to the agents already running in your environment.
That shift – from reactive recovery to proactive resilience – was the throughline of my session, and I encourage every I&O leader to pressure-test it against their own environment.
If you'd like to learn what that looks like in practice, take a solutions tour or get free access to AgentPulse and see what AI agents are already running in your environment.


