Broadview Networks Logo
Industry: Telecom
Location: United States
Download Full Story

Broadview Networks gains a competitive advantage with in-house disaster recovery solutions

Highlights

  • In a regional disaster an entire site can failover to another site protecting business continuity and data integrity
  • Geo clustering solution goes beyond typical definitions of availability and uptime, addressing extreme disaster scenarios not generally included in traditional measurements
  • In-house disaster recovery systems provide Broadview Networks with a competitive advantage

Products

Watch our success story

Broadview Networks’ OfficeSuite Phone, underpinned by its Silhouette carrier-grade telecom product, is a cloud-based VoIP (Voice over Internet Protocol) service for small and mid-sized businesses. The company hosts the OfficeSuite service for over 100,000 business users every day. It also licenses Silhouette to 17 other CSPs (cloud service providers), which, combined, serve an additional nearly 80,000 business users.

At-a-Glance

When Hurricane Sandy battered Manhattan, Broadview Networks maintained 100% uptime for Office-Suite Phone, its award-winning cloud VoIP service. But the event highlighted the potential for service disruption from a regional disaster. Leveraging SUSE high availability products and partnership, Broadview proactively created a geographically redundant solution that provides business continuity even during “storm of the century” events.

The Challenge

An essential requirement for a telecom service provider is maintaining high availability (HA): 99.99% is the minimum acceptable, but the gold standard is 99.999%, often called “five 9s.” This translates to less than six minutes of downtime per year. Failure to achieve these lofty service availability objectives can have significant consequences, including unhappy customers and loss of business.

Hurricane Sandy put Broadview Networks’ high availability to the test—and showed the way to securing its competitive advantage. In 2012 when the massive storm pounded the East Coast, Manhattan’s bridges and tunnels were shut down or submerged. The basement of one of Broadview Network’s buildings in New York City was under water.

On the fourth-floor central telecom office, however, Broadview Networks was prepared. Despite the loss of commercial power for two weeks and unreliable power for the next two months, on-site Broadview personnel and battery banks running on diesel-fueled generators maintained 100% uptime. It was a dramatic success, but also drove home the importance of geographical redundancy—the ability to switch service over to a replicated, geographically distant site if an entire site fails because of a regional disaster.

This lesson was reinforced by what came next. Companies began demanding geo redundancy as their insurance companies threatened to increase premiums, or even deny coverage, unless companies— and their service providers—could meet this requirement. The message was clear: Broadview Networks needed to enable geo redundancy to increase sales and comprehensively safeguard its customers’ business continuity.

For a long time, there was no viable Linux technology for geo clustering. However, by the time the destructive storm and customer demands elevated geo clustering to the top R&D priority, Broadview Networks was able to turn to SUSE for geo clustering, as it had for HA.

“It’s an engineering feat to make a telecom or any system geo redundant. You need to deploy a foundation technology and a framework, such as Geo Clustering for SUSE Linux Enterprise High Availability Extension. No other Linux vendor has a geo clustering solution.”

SUSE Solution

Building on its SUSE products and partnership

In 2010, when Broadview Networks was evolving its software for future success, it chose to incorporate SUSE Linux Enterprise Server (SLES) and the SUSE Linux Enterprise (SLE) High Availability Extension. “We evaluated several Linux vendors. We found that the SUSE Linux Enterprise High Availability Extension was the best-in-class,” says Brett Buckingham, vice president, technology, Broadview Networks.

Broadview embedded SLES with the High Availability Extension in Silhouette, and using this, set up HA clusters as part of their environment. Software components distributed among the cluster servers (some of which have primary and backup instances) can recover from failures, including the failure of an entire server’s worth of components, via cluster resource management mechanisms. If the primary instances fail, the backup instances are promoted; the network peers connect to the new primary system; and the recovered former primary instance becomes the new backup. This setup provides 99.999% availability for each site.

During this development period, the SUSE-Broadview relationship grew. Broadview continued as a customer, but also became a SUSE Original Equipment Manufacturer (OEM) and development partner, providing feedback to SUSE R&D in its efforts to create innovative HA technology. One such innovation was Geo Clustering for SLE High Availability Extension, introduced with SLES 11 Service Pack 2. This extends the HA clustering capabilities of SLE High Availability Extension across unlimited distances, so that in the event of a regional disaster an entire site can failover to another site, protecting business continuity and data integrity.

“It’s an engineering feat to make a telecom or any system geo redundant. You need to deploy a foundation technology and a framework, such as Geo Clustering for SUSE Linux Enterprise High Availability Extension,” says Buckingham. “No other Linux vendor has a geo clustering solution.”

In close collaboration with SUSE Support and Engineering, Broadview Networks evaluated the ability of the SUSE Geo Clustering technology to meet its new requirements. During a deployment pilot, SUSE made functionality from the SLES 12 code stream available for the SLES 11 platform used by Silhouette. This functionality is now present in SLES 12 and will be made available in SLES 11 Service Pack 4.

“We made a very significant design and test effort, using the ingenuity of our team and the partnership with SUSE,” says Buckingham.

Geo clustering and the Broadview Networks geo redundant architecture

Broadview Networks located the Silhouette-based primary and backup geo sites in geographically diverse regions so that a disaster affecting one site wouldn’t affect the other site. Based on SUSE Geo Clustering and running x86-64 hardware, these geo sites are linked to each other in a geo cluster and to an arbitrator node in a third geo site. (While two sites are sufficient for manual failover, three are needed for automatic failover.) The primary site continuously replicates the silhouette configuration and operational data to the backup geo site. Only the primary geo site provides a specific service at any one time, as directed by a ticket scheme in the geo cluster. When the software detects a failure at the primary site, it promotes the backup to be the new primary instance and implements mechanisms for client systems (i.e., network peers and phones) to recognize and tolerate changing Silhouette locations. 

Broadview geo clustering extensions including the geographic failover switch

Broadview Networks developed custom extensions, including a custom design and resource agent for database replication between geo sites, technology for file system replication to the backup geo site and a geographic failover switch. While Geo Clustering for SLE High Availability Extension provides rules-based failover for automatic and manual transfer of a workload to another cluster outside of the affected area, Broadview wanted to increase the opportunity for human intervention. Buckingham explains, “If you failover prematurely or for the wrong reason, you can make matters worse.”

With the geographic failover switch, Broadview Networks extended the SUSE Geo Clustering failover policy. The technology sends an alarm if there is a perceived failure in an entire site and starts a timer. Automatic failover will occur after a reasonable, set period. Ideally, however, when an alarm is received, decision makers can convene and follow a procedural script to verify the outage and, based on evidence, authorize or abort a failover. This feature can also help with compliance before sensitive data fails over to another region automatically. SUSE will be implementing this functionality in future product versions.

The Results

Presently, SLES, SLE High Availability Extension and Geo Clustering for the SUSE High Availability Extension are embedded in Silhouette 7, the latest generally available release. Deployment in the Broadview Networks labs and on the company’s internal and newly commissioned sites is expected to occur in Q1 2015.

The geo clustering solution provides strategic business benefits. First, it goes beyond typical definitions of availability and uptime, addressing extreme disaster scenarios not generally included in these traditional measurements. This is evident in a regional disaster, when silhouette with geo clustering assures disaster recovery, data integrity and business continuity for customers.

The solution also gives Broadview Networks a competitive advantage. Most telecom providers rely on another company for their networks, so their claims of geo redundancy apply only to the cloud service portion of the solution. Broadview Networks owns its own networks and delivers geo redundancy for them and its telephone and other services more comprehensively and rigorously than competitors.

This advantage drives sales. According to Buckingham, “One reason companies come to us for services is because we offer and manage all pieces of the puzzle.” He adds, “We couldn’t have done this without SUSE. It speaks positively to our partnership with SUSE and their responsiveness. We would not have this solution today without SUSE and our strong relationship.”