One of the most strategically impactful projects I led at Kadant Carmanah Design was the complete redesign of our Business Continuity and Disaster Recovery (BCDR) framework using Microsoft Azure Site Recovery (ASR) and Azure Backup. It was about building a resilient, real-time, cloud-native system capable of protecting critical manufacturing and engineering operations across regions.
Previously, our DR approach was outdated and centered around on-prem failover servers and offsite tape backups. It lacked scalability, hadn’t been tested properly, and failed us during a power outage that halted production for days. I proposed a full transformation to Azure-based BCDR, built around multi-region resilience, automated failover, and measurable RTO/RPO objectives.
I began by conducting a business impact analysis with department heads to define recovery priorities. Systems like ERP, CNC job schedulers, and engineering databases required near-zero data loss and recovery in under an hour. Others, like archive storage, could tolerate longer RTOs.
Our production workloads ran in Canada Central, so I architected Canada East and East US 2 as our secondary DR sites. ASR provided continuous replication for tier-1 systems, while Azure Backup handled daily snapshots for non-critical VMs using Geo-Redundant Storage (GRS). I configured Recovery Services vaults, VSS-integrated snapshots for app-aware backups (especially SQL), and paired VNETs across all regions with VPN Gateway failover as backup to ExpressRoute.
I used Azure Automation runbooks to update DNS, reassign internal IPs, and adjust load balancers post-failover. I also built ASR Recovery Plans with service dependencies sequenced for seamless recovery. All traffic was encrypted in transit, access was restricted via RBAC and JIT through Azure Bastion, and monitoring was integrated via Defender for Cloud and Azure Policy for ongoing compliance.
I led a live failover test of our document management system over a long weekend. Within minutes, the workloads were operational in a secondary region, and end users resumed work with zero impact. We performed a controlled failback, then updated runbooks and DNS TTLs based on the lessons learned.
Azure Backup policies were deployed in parallel for systems that didn’t require real-time replication. I configured VM, SQL, and file-level backups, and used Power BI-integrated Azure Backup Reports to monitor success rates and storage usage, enabling better tiering decisions.
Whether you’re planning a cloud migration, improving security, or building a more resilient IT environment — I’m here to help