Enterprise data storage: complex, expensive and mission critical
Since former Meta Group analyst Doug Laney (now Gartner) famously defined the three ‘V’s of big data in 2001, we’ve all been seeing increases in the volume of data stored. Analyst firms might make varying projections of total market growth and value, but they are all agreed on one thing: growth is ‘exponential’ – it doesn’t grow in annual percentage terms, but by orders of magnitude, making storage one of the biggest drains on enterprise IT budget.
The technology itself is constantly changing, with new features and benefits arriving on an ongoing basis. In the past few years we’ve seen capacity improvements from de-duplication, huge increases in iOPs from the introduction of flash, and hybrid arrays that combine disk and flash. Enterprises have gone to great lengths to make the optimum environment, adopting best of breed technology, tiering data according to access rates and improving disaster recovery capability, consistently adding to the infrastructure, array on array, project on project, year after year. The result is a highly complex, multi-vendor environment crying out for consolidation and the resulting simplicity: few storage architects would choose their existing set up given the chance to build from scratch.
The cost of disk may have plummeted, and flash may be following suit, but that doesn’t mean architects are finding it easy to keep up with rising costs. Far from it – complexity continues to increase, and the sums of money involved in keeping pace with rising data volumes and requirements provoke difficult conversations with the CIO and the CFO: today’s architects are under sustained pressure to find ways of reducing cost without compromising on capability. Meanwhile data pools are rapidly becoming data lakes.
To successfully cope with the escalating requirements for those data lakes, to allow them to be cleaned of pollution and enable data scientists to provide the management board with the 360 degree view of the business they demand from analytics, administrators need to deliver high availability, high performance storage, consolidate and simplify it to reduce the management headache, and find ways to add to the storage, all without breaking the bank – no small challenge.
In theory, the best way to achieve these objectives is to do away with expensive proprietary hardware and software in favour of industry standard commodity hardware run on free open source software which works across standard infrastructure – including cabling and switches. In practise, fears about the maturity, stability and capability of open source, software-defined storage holds IT teams back from realising the cost savings they need. In this blog, we will examine why that should no longer be the case, illustrate typical use cases, and provide a look into the future roadmap of SUSE Enterprise Storage to explain how – as so often happens in the technology market – the rules of the game have changed, and changed in favour of the storage buyer away from vendors.
Perfecting the SAN
Consolidating storage – often the first step towards cost control – can be difficult and disruptive, particularly in mission critical environments with complex dependencies, different and non-compatible proprietary software and variances in the media itself. But with volume growth being what it is, ‘if it aint broke don’t fix it’ is ultimately a losing strategy, because if it aint broke, now it soon will be. It’s no small wonder there’s such a large industry in providing the IT team with guidance and support, and help with trouble shooting when, any admin can tell you, things don’t go quite as planned.
The challenge lies in bringing together multiple SANs so that capability can be pooled between different applications and storage media, right-sizing provisioning can be more easily achieved, and cost more readily controlled. Done right, consolidation means you can add capacity without increasing complexity – like pouring more water into a single pool instead of topping up hundreds of individual buckets. It follows that getting your platform choice right is hugely important.
Making the right platform choice
Storage platforms are typically compared on the basis of a number of capabilities, taking into consideration different requirements. Here’s Gartner’s comparison criteria for software defined storage – all 26 of them.
|Deployed as bare metal on Windows Server||Microsoft Hyper-V||Clones|
|Deployed as a Virtual Machine||KVM||Integrated Encryption|
|Scale-out Architecture||Global Namespace||Synchronous Replication|
|Scale-up Architecture||Compression||Policy Based Management|
|SMB Protocol||Deduplication||Provisioned at the VM level|
|MFS Protocol||Auto-Tiering||Managed at VM level|
|iSCSI||Thin Provisioning||Provisioned at the LUN level|
|Channel||Space Reclamation||Provisioned at the Volume Level|
Relatively few platforms tick every box (at least not without an enormous accompanying price tag), but a few are particularly important in the enterprise –the top four being:
- Manageability – ease of installation, centralised management, monitoring and reporting.
- Interoperability – unified block/file/object, heterogeneous OS (fabric and native)
- Efficiency – cache tiering, deduplication/compression, hierarchical storage management, thin provisioning, and erasure coding
- Availability – back-up/archive, continuous data protection, remote replication.
SUSE Enterprise Storage is the first open source software defined storage solution to meet all four critical enterprise requirements with robust, mature, and hardened software.
Your choice: iSCSI vs Fibre Channel
There are two main approaches to SAN – dedicated Fibre Channel (FC) and iSCSI. FC is a highly stable and mature technology, pre-dating iSCSI by some nine years, and is usually considered the ‘gold standard’ in the enterprise. In FC, storage arrays are connected to application servers via dedicated fibre to dedicated fibre switches which in turn link to fibre HBAs (Host Bus Adaptors) on the target array, up to a maximum speed of 32GB/s. The key word here is ‘dedicated’: because the links aren’t shared with any other network traffic, there is low latency and this, accompanied with intelligent routing on the switches, make it difficult to saturate the links with enough traffic to impair performance. This makes for very low latency and arguably more bandwidth than is actually required, certainly for spinning disk, though not, perhaps, for flash – at least where the fabric is concerned (there’s plenty of space for performance bottlenecks elsewhere).
iSCSI came onto the market shortly after the turn of the century, exploiting the opportunity raised by the cost and complexity of fibre channel: fibre channel switches and HBAs are not cheap, and the dedicated fabric requires dedicated skills – adding thousands to the cost of purchase. In contrast, iSCSI uses Ethernet as its communications fabric. This means iSCSI is cheaper in terms of headcount as no special skills are needed, and that commodity switches and cabling can be used in its deployment, thereby generating major cost savings. And on a typical 10GB/s network, the difference in throughput can be negligible – it’s actually tough to saturate a 1GB/s link – let alone a 32GB/s link. So, unless you’re actually in the 1% of organisations that really needs the performance of fibre, you’re probably best off saving it for high performance requirements and standardising everywhere else on iSCSI.
SUSE was first to bring iSCSI to Ceph – the leading open source software defined storage solution
Ceph is the leading open source software defined storage platform. Ceph stores data on a single distributed computer cluster, providing interfaces for object, block and file level storage. Ceph can be completely distributed without a single point of failure, is scalable to the exabyte level, and, as an open source platform.. Ceph replicates data providing fault tolerance, and uses commodity hardware (even your existing hardware no matter who supplied it). The design provides for storage that is self-healing and self-managing, thereby reducing administration costs. Compared with proprietary hardware and software solutions, Ceph can deliver savings in the region of 40- 60%.
However, until SUSE innovated with iSCSI support Ceph had a really big problem: a lack of support for heterogeneous operating systems and versions, effectively limiting it to LINUX, and killing its appeal as an approach to consolidation: big savings on a small part of your SAN infrastructure that cannot be applied to the majority – a bit like 50% off one wheel of your new car instead of 50% off the car itself.
However, this picture has changed in the last year: SUSE was first to market with a Ceph version delivering robust support of the universally accepted iSCSI standard, which translates into near universal cost savings and amplifies the appeal of open source software defined storage by the same order of magnitude that is powering data storage growth. SUSE Enterprise Storage is a truly game-changing technology can deliver vastly more affordable storage in the long term – regardless of data growth. Commodity cabling and routing + commodity infrastructure skills + commodity servers and media + opensource software = massive savings that proprietary vendors cannot compete with.
Vs 3 of SUSE Enterprise Storage fully supports block as well as object, with File System (CephFS) supported as a Tech Preview (until the next release), combined with heterogeneous OS access in place the cost savings it delivers – broadly applicable in the data centre. Heterogeneous support was achieved by building on and extending the long established, robust, hardened and enterprise proven Linux-IO Target, delivering capability you can trust.
SUSE Enterprise Storage Vs 3 iSCSI support delivers robust software defined storage that is VMware aware, and works across UNIX variants like IBM AIX, Microsoft and Linux.
Open Source offers other advantages for the hard pressed IT team too – not least of which is the easy portability of data across the hybrid cloud and into different off-premise providers: a critical requirement for avoiding expensive vendor lock-down; storage administrators attracted to the public cloud need to work out the cost of extracting data as well as storing to avoid unpleasant surprises – especially when it comes to the cold store.
Why your storage roadmap should – arguably must – include open source
SUSE believes the future of big data lies in open source. So should you: why? Because open source projects dominate the big data landscape: the majority of big data analytics projects today are built on Hadoop, and the majority of real-time data analytics projects in the future will be based on Apache Spark – both of which are open source projects. The number of active users of Spark has doubled in the past year, and the number of active developers has risen six-fold in the past two years – making Spark the single largest Apache project. Investment is in place from IBM, venture capital groups like Andreessen Horowitz and NEA, funding is abundant: open source is already a de facto standard in big data analytics.
Open source projects share many developers motivated by the same working ethos, readily borrow code from each other, they meet at the same conventions and they share staff and information from the highest to the lowest level: open source development road maps don’t happen in isolation.
The convergence of storage and analytics roadmaps is inevitable: one roadmap for the data lake, for analytics and for storage. At the very least you are going to need to build your capability or you are at risk of every CEO and shareholder’s worst nightmare: a competitor who is making better decisions than you, more quickly, on more accurate data, who is more agile and who gets product to market faster than you, product that customers want more, and which are genuinely better than yours. In this digital era with its fast changing business models, company boards dream of being the next Netflix, and fear being the next Blockbuster Video in equal measure.
10 approaches to generating large scale cost savings on 6 enterprise storage use-cases
Use SUSE enterprise Storage 3 to:
- Build storage on commodity x86 servers to create multiple nodes over Ethernet.
- Use RESTful APIs (Swift or Amazon S3) to move data between resources, on and off premise.
- Achieve similar or even lower total cost of storage to Amazon Glacier – whilst avoiding heavy charges for moving data at speed or returning it to on premise in the event of need.
- Keep your storage on premise at the same or lower cost than the cheapest cloud services (developments in analytics have a tendency to render seemingly useless data useful and you never know when you will need it).
- Achieve erasure coding for redundancy and use can be used in high capacity enclosures, so limiting your storage footprint in terms of square feet in the data centre.
- Use with your existing archiving software (e.g. Veritas NetBackup or Commvault Simpana etc.) to build block or object storage.
- Replicate copies for redundancy across standard network fabric
- Use as a VERY LARGE disk array and achieve heterogeneous network access across VMware, Windows, Linux etc.
- Use for your windows file store and general purpose block storage
- Rapid disk to disk backup
- Automatically tier cached data according to use frequency
Cut the cost of
#1 object storage
#3. Bulk Block Storage
#4. Data backup
#5. VMware data repository or back up
#6. Rich Media and Video/Audio storage
What’s holding you up? Two questions you should ask yourself:
The arrival of iSCSI on Ceph was a game-changing technology development. For the first time ever, commodity hardware and open source software have combined to generate huge cost savings and flexibility with a road map that protects you against dramatic rises in the cost of storage and sets your organisation in the right direction for the big data architecture future. No longer limited to Linux powered web apps, open source software defined storage is spreading to the entire enterprise, and its effect is revolutionary.
With the growth in data volumes accelerating, and open source becoming the dominant force in big data analytics, ask yourself this:
- How long can you continue to pay for proprietary hardware and software?
- How quickly will the roadmaps for open source big data analytics and open source storage converge?
Open source, software defined storage should come under the category of ‘if’ not ‘when’, and SUSE can put you on the path to cost savings, now.