Home / Mission-Critical Computing / Downtime

Downtime is a computer industry term for the time during which a computer or IT system is unavailable, offline or not operational. Downtime has many causes, including shutdowns for maintenance (known as scheduled downtime), human errors, software or hardware malfunctions, and environmental disasters such as power outages, fires, flooding or major temperature changes. In industrial environments, downtime may refer to failures in production equipment. This type of downtime is often measured as downtime per work shift or downtime per a 12- or 24-hour period. Downtime duration is the period of time when a system fails to perform its primary function. Communications failures, for example, may cause network downtime.

In IT environments, downtime can be one of the metrics used for system availability. Availability is often measured against a 100% operational or never-fails standard. A common standard of availability is 99.999%, known as “five 9s” availability. Two 9s would be a system that guarantees 99% availability in a one-year period, allowing up to 1% downtime, or 3.65 days of unavailability. Service level agreements (SLAs) often use monthly downtime or availability percentages for billing calculation. Scheduled downtime for system updates and routine maintenance is usually not included in the availability percentages for SLA contracts. For provisioning, service level agreements may use uptime and downtime percentages to describe the dependability of the various services available to clients. Such percentages also help determine the value of each service, as most clients desire continuous real-time availability (zero downtime).

Large enterprises increasingly depend on high availability for IT services and applications delivered through the cloud. IT organizations may deploy server clusters to improve availability and reduce unscheduled downtime. A server cluster is a group of linked servers that work together to improve system performance, load balancing and service availability. If a server fails, other servers in the cluster can take over the functions and workloads of the failed server. SUSE Linux Enterprise Server can help businesses minimize downtime by exploiting hardware reliability, availability and serviceability features, by providing server clustering for physical and virtual systems, and by enabling live kernel patching without rebooting.