Recent Outages Highlight the Need for Digital Resilience and Experience Assurance (2024)

Recent Outages Highlight the Need for Digital Resilience and Experience Assurance (1)

(Credit: Panther Media GmbH / Alamy Stock Photo)

A string of network outages in the first half of 2024 from major companies underscores the critical role of the IT network in our always-on digital world. These disruptions, impacting millions of users, highlight the importance of ensuring network infrastructure performance and resilience. Network complexity combined with increasing demand and persistent cyber threats call for a new approach that delivers the scalability, flexibility, security, and ease of use needed to ensure consistent performance and protect against disruptions.

Specific details regarding the causes of these outages are being investigated, but one thing is clear: network disruptions are on the rise and can be caused by a multitude of factors, so you need to be prepared for anything. Disruption causes can range from hardware failures and software glitches to cyberattacks and even human network configuration errors.

The recent outages raise a critical question for all organizations, especially large enterprises with global footprints and millions of customers: Is our network equipped to handle the ever-evolving demands of today's digital landscape? If the answer to that question is “no” or “I’m not sure,” you may have a serious problem on your hands.

Modernization is essential

Related:Internet Resilience Report: Outages Cost Companies Over $10 Million a Month

The good news is that advancements in network technology offer solutions. Modernized networks, leveraging the power of intelligent automation, offer the agility and resilience needed for today's world. Here are some best practices for IT operations management (ITOM) that can help reduce the risk of network outages. By implementing these practices, organizations could’ve potentially prevented or lessened the impact of the significant outages we saw in the first half of this year.

Testing and Validation: On February 22, a major cellular network provider in the United States experienced a widespread network outage, impacting millions of customers across the country for roughly 12 hours. The company attributed the issue to an error during a network expansion project. Without complete knowledge of the company's specific network environment, we do know that automated testing and validation are key to minimizing the risk of these types of errors. This includes pre-change lab-based testing, pre-checks, and post checks to ensure an optimal network state before and after changes are made. While it's impossible to say definitively if these techniques would have prevented this specific outage entirely, they certainly could lessen the impact and help organizations in a similar situation restore service more quickly.

Configuration Management: In early March, a technical issue caused widespread outages for a major social media platform, impacting roughly half a million users globally for over two hours. According to ThousandEyes, the outage was likely caused by a backend service such as authentication. Outages like this often stem from software bugs or configuration errors introduced during updates or maintenance. Traditional troubleshooting methods can be time-consuming, leading to delays in resolving the issue and extended downtime for users. By automating configuration management, users can thoroughly vet new updates and configurations before they become public. This can help catch and fix bugs much sooner, potentially preventing outages entirely. Rollback and preview capabilities provide additional measures to avoid major outages. Additionally, continuous integration and continuous delivery (CI/CD) practices can streamline the deployment of bug fixes once they've passed testing. This helps resolve outages quickly and minimizes user disruption.

Network Monitoring: On March 6, a brief disruption left many users unable to access another social media platform. Based on early analyses, an issue with the platform’s backend system - likely the servers storing user data and posts - prevented it from responding to requests from the edge network, resulting in a temporary outage for users. Modernized networks often have sophisticated monitoring tools that can detect issues like these early on. Combining monitoring with auto-remediation capabilities allows for quicker resolution of problems before they significantly impact users. Additionally, networks with intelligent features can reroute traffic or switch to backup systems instantaneously, minimizing overall downtime.

Network Visibility: On March 15, a major outage crippled a fast-food restaurant chain’s operations worldwide for several hours, impacting millions of customers across numerous countries. The outage was caused by a minor configuration change by a third-party vendor, highlighting the complexity and increased vulnerability of interconnected technology systems. With better visibility into the entire technology stack, including what third-party vendors are doing, enterprises can better identify potential problems before they cause outages. To further strengthen their defenses, enterprises can implement redundancy and diversification, making their networks less susceptible to outages caused by single points of failure.

Steps to network modernization

To take this further, below are key steps all enterprises can take to modernize their networks and reduce the risk of costly outages:

  • Establish proactive responses: Set up systems to respond to monitoring and alerting conditions. Include periodic and triggered configuration audits, configuration and state drift detection, and proactive troubleshooting procedures to pinpoint network issues.

  • Enable self-healing mechanisms: Utilize technology, such as network automation with auto-remediation, to implement functionalities that fix common network problems like configuration errors, restarting failed devices, and rerouting traffic.

  • Enforce standardization with configuration management: Implement a system to enforce standard configurations, track changes, and enable rollbacks to known-good states.

  • Integrate continuous testing: Incorporate automated testing and validation, including pre-change lab-based testing, pre-checks, and post-checks to ensure optimal network state throughout changes.

  • Maintain clear documentation and visualization: Update network documentation, device inventories, and topology maps regularly. This minimizes manual errors and speeds up troubleshooting.

  • Streamline security posture with enforcement: Enforce security policy configuration automatically to minimize threats and the likelihood of security-related outages. Make sure patching and OS upgrades are current to reduce exposures.

The network outages from the first half of the year serve as a wake-up call for organizations of all sizes. No longer can businesses cross their fingers and hope for their networks to get the job done. They need to be proactive about managing their networks and battle-testing them for today’s unpredictable landscape. By embracing network modernization best practices, businesses can build more agility and resilience into their existing infrastructure. This minimizes downtime and mitigates the impact of outages, and it also ensures a smoother and more reliable experience for users and employees alike. Investing in network modernization is no longer a luxury; it's a business imperative for thriving in today's ever-connected digital landscape.

Recent Outages Highlight the Need for Digital Resilience and Experience Assurance (2024)

FAQs

Why do we need digital resilience? ›

Digital resilience allows businesses to: Adapt to shifting market conditions and a changing landscape. Apply data and digital governance strategies across the entire organization to deal with disruptions.

What is an example of digital resilience? ›

To build digital resilience, a company might, for example, migrate applications with high-risk exposure and criticality to a local vendor (e.g., the ERP system), while maintaining the status quo for others (e.g., the global document management system).

What are the 4 key applications of digital resilience? ›

A definition of digital resilience. Guidance on self-assessing in four key domains of influence: environment, content, service and policy. A self-assessment checklist to help you identify how well you promote each aspect of digital resilience.

What are the 4 pillars of digital resilience? ›

The Key Pillars of Digital Resilience

Respond and restore. Expand and optimize. Accelerate and innovate.

What are the 5 most powerful ways to increase your resilience? ›

Tips to improve your resilience
  • Get connected. Building strong, healthy relationships with loved ones and friends can give you needed support and help guide you in good and bad times. ...
  • Make every day have meaning. ...
  • Learn from the past. ...
  • Stay hopeful. ...
  • Take care of yourself. ...
  • Take action.
Dec 23, 2023

Why is technology resilience important? ›

Technology resilience prepares organizations to overcome challenges when their technology stack is compromised. It reduces the frequency of catastrophic events and enables them to recover faster in the case of an event.

Why do we need cyber resilience? ›

Cyber resilience is essential, since it extends beyond the mere prevention of cyberattacks. “It involves strategies to keep an organization running during and after a cyber incident, protecting data and systems, and maintaining business operations,” Mellor says.

Why is data resilience important? ›

Data resilience is important because it underpins the core functions of the business, from decision-making to customer service, while also safeguarding against potential risks and challenges that could impact the organization's success and reputation.

Why is IT important to use resilience? ›

Resilience can help protect you from mental health conditions, such as depression and anxiety. Resilience also can help you deal with things that increase the risk of mental health conditions, such as being bullied or having trauma.

References

Top Articles
European Labour Authority | Legislative Train Schedule
Shawn Mcclung Obituary
Truist Bank Near Here
Canary im Test: Ein All-in-One Überwachungssystem? - HouseControllers
Usborne Links
What are Dietary Reference Intakes?
Rek Funerals
Horoscopes and Astrology by Yasmin Boland - Yahoo Lifestyle
50 Meowbahh Fun Facts: Net Worth, Age, Birthday, Face Reveal, YouTube Earnings, Girlfriend, Doxxed, Discord, Fanart, TikTok, Instagram, Etc
Teenbeautyfitness
Legacy First National Bank
When Is the Best Time To Buy an RV?
414-290-5379
Best Pawn Shops Near Me
Regular Clear vs Low Iron Glass for Shower Doors
What to do if your rotary tiller won't start – Oleomac
Evil Dead Rise Showtimes Near Regal Columbiana Grande
Meritas Health Patient Portal
Insidekp.kp.org Hrconnect
Conscious Cloud Dispensary Photos
Convert 2024.33 Usd
Conan Exiles: Nahrung und Trinken finden und herstellen
Great Clips Grandview Station Marion Reviews
67-72 Chevy Truck Parts Craigslist
Sodium azide 1% in aqueous solution
Koninklijk Theater Tuschinski
Creed 3 Showtimes Near Island 16 Cinema De Lux
Shelby Star Jail Log
Uno Fall 2023 Calendar
Noaa Marine Forecast Florida By Zone
Lawrence Ks Police Scanner
Ugly Daughter From Grown Ups
Kattis-Solutions
Pokemmo Level Caps
Six Flags Employee Pay Stubs
Glossytightsglamour
2008 Chevrolet Corvette for sale - Houston, TX - craigslist
Bbc Gahuzamiryango Live
Suffix With Pent Crossword Clue
Tunica Inmate Roster Release
ACTUALIZACIÓN #8.1.0 DE BATTLEFIELD 2042
Vérificateur De Billet Loto-Québec
Enr 2100
Jimmy John's Near Me Open
Contico Tuff Box Replacement Locks
New Zero Turn Mowers For Sale Near Me
Suppress Spell Damage Poe
Who Is Nina Yankovic? Daughter of Musician Weird Al Yankovic
Cars & Trucks near Old Forge, PA - craigslist
Mawal Gameroom Download
Spongebob Meme Pic
Prologistix Ein Number
Latest Posts
Article information

Author: Horacio Brakus JD

Last Updated:

Views: 5661

Rating: 4 / 5 (51 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Horacio Brakus JD

Birthday: 1999-08-21

Address: Apt. 524 43384 Minnie Prairie, South Edda, MA 62804

Phone: +5931039998219

Job: Sales Strategist

Hobby: Sculling, Kitesurfing, Orienteering, Painting, Computer programming, Creative writing, Scuba diving

Introduction: My name is Horacio Brakus JD, I am a lively, splendid, jolly, vivacious, vast, cheerful, agreeable person who loves writing and wants to share my knowledge and understanding with you.