When Hurricane Sandy battered Manhattan, Broadview Networks maintained 100% uptime for OfficeSuite Phone, its award-winning cloud VoIP service. But the event highlighted the potential for service disruption from a regional disaster. Leveraging SUSE high availability products and partnership, Broadview proactively created a geographically redundant solution that provides business continuity even during such “storm of the century” events.
An essential requirement for a telecom service provider is maintaining high availability (HA): 99.99% is the minimum acceptable, but the gold standard is 99.999%, often called “five 9s.” This translates to less than six minutes of downtime per year. Failure to achieve these lofty service availability objectives can have significant consequences, including unhappy customers and loss of business.
Hurricane Sandy put Broadview Networks’ high availability to the test—and showed the way to securing its competitive advantage. In 2012 when the massive storm pounded the East Coast, Manhattan’s bridges and tunnels were shut down or submerged. The basement of one of Broadview Network’s buildings in New York City was under water.
On the fourth-floor central telecom office, however, Broadview Networks was prepared. Despite the loss of commercial power for two weeks and unreliable power for the next two months, on-site Broadview personnel and battery banks running on diesel-fueled generators maintained 100% uptime. It was a dramatic success, but also drove home the importance of geographical redundancy—the ability to switch service over to a replicated, geographically distant site if an entire site fails because of a regional disaster.
This lesson was reinforced by what came next. Companies began demanding geo redundancy as their insurance companies threatened to increase premiums, or even deny coverage, unless companies— and their service providers—could meet this requirement. The message was clear: Broadview Networks needed to enable geo redundancy to increase sales and comprehensively safeguard its customers’ business continuity.
For a long time, there was no viable Linux technology for geo clustering. However, by the time the destructive storm and customer demands elevated geo clustering to the top R&D priority, Broadview Networks was able to turn to SUSE for geo clustering, as it had for high availability (HA).
BUILDING ON ITS SUSE PRODUCTS AND PARTNERSHIP
In 2010, when Broadview Networks was evolving its software for future success, it chose to incorporate SUSE Linux Enterprise Server and the SUSE Linux Enterprise High Availability Extension. “We evaluated several Linux vendors. We found that the SUSE Linux Enterprise High Availability Extension was the best-in-class,” says Brett Buckingham, Vice President, Technology, Broadview Networks.
Broadview embedded SUSE Linux Enterprise Server with the High Availability Extension in silhouette, and using this, set up HA clusters. Software components distributed among the cluster servers (some of which have primary and backup instances) can recover from failures, including the failure of an entire server’s worth of components, via cluster resource management mechanisms. If the primary instances fail, the backup instances are promoted; the network peers connect to the new primary system; and the recovered former primary instance becomes the new backup. This setup provides 99.999% availability for each site.
During this development period, the SUSE-Broadview relationship grew. Broadview continued as a customer, but also became a SUSE Original Equipment Manufacturer (OEM) and development partner, providing feedback to SUSE R&D in its efforts to create innovative HA technology. One such advance was Geo Clustering for SUSE Linux Enterprise High Availability Extension, introduced with SUSE Linux Enterprise Server 11 Service Pack 2. This extends the HA clustering capabilities of SUSE Linux Enterprise High Availability across unlimited distances, so that in a regional disaster an entire site can failover to another site, protecting business continuity and data integrity “It's an engineering feat to make a telecom or any system geo redundant. You need to deploy a foundation technology and a framework, such as Geo Clustering for SUSE Linux Enterprise High Availability Extension,” says Buckingham. “No other Linux vendor has a geo clustering solution.”
In close collaboration with SUSE Support and Engineering, Broadview Networks evaluated the ability of the SUSE Geo Clustering technology to meet its new requirements. During a deployment pilot, SUSE made functionality from the SUSE Linux Enterprise 12 code stream available for the SUSE Linux Enterprise 11 platform used by silhouette. This functionality is now present in SUSE Linux Enterprise 12 and will be made available in SUSE Linux Enterprise 11 Service Pack 4.
“We made a very significant design and test effort, using the ingenuity of our team and the partnership with SUSE,” says Brett Buckingham.
GEO CLUSTERING AND THE BROADVIEW NETWORKS GEO REDUNDANT ARCHITECTURE
Broadview Networks located the silhouette-based primary and backup geo sites in geographically diverse regions so that a disaster affecting one site wouldn't affect the other site. Based on SUSE Geo Clustering and running x86-64 hardware, these geo sites are linked to each other in a geo cluster and to an arbitrator node in a third geo site. (While two sites are sufficient for manual failover, three are needed for automatic failover.) The primary site continuously replicates the silhouette configuration and operational data to the backup geo site. Only the primary geo site provides a specific service at any one time, as directed by a ticket scheme in the geo cluster. When the software detects a failure at the primary site, it promotes the backup to be the new primary instance and implements mechanisms for client systems (i.e., network peers and phones) to recognize and tolerate changing silhouette locations. (See the figure on the following page.)
BROADVIEW GEO CLUSTERING EXTENSIONS INCLUDING THE GEOGRAPHIC FAILOVER SWITCH
Broadview Networks developed custom extensions, including a custom design and resource agent for database replication between geo sites, technology for file system replication to the backup geo site and a geographic failover switch. While Geo Clustering for SUSE Linux Enterprise High Availability Extension provides rules-based failover for automatic and manual transfer of a workload to another cluster outside of the affected area, Broadview wanted to increase the opportunity for human intervention. Brett Buckingham explains, “If you failover prematurely or for the wrong reason, you can make matters worse.”
With the geographic failover switch, Broadview Networks extended the SUSE Geo Clustering failover policy. The technology sends an alarm if there is a perceived failure in an entire site and starts a timer. Automatic failover will occur after a reasonable, set period.
Ideally, however, when an alarm is received, decision makers can convene and follow a procedural script to verify the outage and, based on evidence, authorize or abort a failover. This feature can also help with compliance before sensitive data fails over to another region automatically. SUSE will be implementing this functionality in future product versions.
Presently, SUSE Linux Enterprise Server, SUSE Linux Enterprise High Availability Extension and Geo Clustering for the SUSE High Availability Extension are embedded in silhouette 7, the latest generally available release. Deployment in the Broadview Networks labs and on the company’s internal and newly commissioned sites is expected to occur in Q1 2015.
The geo clustering solution provides strategic business benefits. First, it goes beyond typical definitions of availability and uptime, addressing extreme disaster scenarios not generally included in these traditional measurements. This is evident in a regional disaster, when silhouette with geo clustering assures disaster recovery, data integrity and business continuity for customers.
The solution also gives Broadview Networks a competitive advantage. Most telecom providers rely on another company for their networks, so their claims of geo redundancy apply only to the cloud service portion of the solution. Broadview Networks owns its own networks and delivers geo redundancy for them and its telephone and other services more comprehensively and rigorously than competitors.
This advantage drives sales. According to Buckingham, “One reason companies come to us for services is because we offer and manage all pieces of the puzzle.” He adds, “We couldn't have done this without SUSE. It speaks positively to our partnership with SUSE and their responsiveness. We would not have this solution today without SUSE and our strong relationship.”