• SpikesOtherDog
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    11 hours ago

    Yes. A worldwide service provider should be able to achieve at least 4 9s of uptime. That’s 99.99% available, or about <52 minutes of downtime a year. That’s accomplished through best practices with redundancy, planned maintenance, and solid disaster recovery plans.

    The ways to achieve a disaster of this magnitude include:

    • No hot spares
      • A security event has locked all redundant servers and they are now rebuilding servers from backup.
    • Lack of effective redundancy
      • A disaster has occurred at one data center and the load sharing is causing the servers to be unresponsive
        • This is unlikely because there would be intermittent reports of success
    • Poor patching management
      • Patches were sent to all servers without proper testing or rollback strategy
    • Bieren@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      9 hours ago

      Key word there is planned. You can have all of the best practices covered with the best possible solutions. But, at the end of the day, shit happens outside of your control.