The fame of Big Data lies within its wide-ranging definition of employing high volume, velocity, and different data sets that are difficult to manage and excerpt value from. Clearly, most software development companies can identify themselves as facing Big Data challenges and chances today or in future. This, therefore is not a new issue yet it has a new quality as it has been aggravated in recent years. Cheaper storage and omnipresent data collection and availability of third party data has overtook the capabilities of traditional data warehouses and processing solutions. Businesses investigating Big Data frequently recognize that they lack the capacity to process and store it sufficiently. This shows either an inability to employ existing big data sets to the fullest or inability to expand their current data strategy with additional data.
Three strata of cloud computing as a service
Big Data as a Service is in the business of countless as-a-Service offerings. The most noteworthy ones that allow us to classify any subsequent services are threefold. Infrastructure as a Service (IaaS), e.g. virtual machines, networks, storage devices, or servers, is the most basic building block and includes everything (real or virtual) you would expect inside a data center. Above this level, exists the Platform as a Service (PaaS) which includes frequently employed software like web and database servers, or Hadoop and its ecosystem. At the next level is Software as a Service (SaaS) which are still nonspecific but more user interactive services like Email, CMS or CRM. Finally, past SaaS are usually domain or business specific applications.
Hadoop or a substitute distributed compute and storage technology at the platform level naturally builds the core of a BDaaS. Subsequently, any BDaaS solution includes the PaaS layer and may be SaaS and/or IaaS. This leaves us with four possible groupings for BDaaS:
- PaaS only – focuses on Hadoop
- IaaS and PaaS – focuses on Hadoop and optimizes infrastructure for performance
- PaaS and SaaS – focuses on Hadoop and productivity & exchangeable infrastructure features
- IaaS and PaaS and SaaS – focuses on total vertical integration for features and performance
Big Data as a Service Models
Big Data as a Service (BDaaS) offerings in the cloud can be categorized into one of four types:
The core BDaaS implements the minimal platform, e.g. YARN and HDFS having Hadoop and a few popular services like Hive. Amazon Web Service’s Elastic MapReduce (EMR) is the most noticeable core BDaaS and represents this model. EMR is one of myriad services in Amazon’s offering and EMR assimilates well with many of the other services just as NoSQL store DynamoDB or S3 storage. Users can combine them to build something like data pipelines to a comprehensive full company infrastructures around the EMR service. However, composability of its services being Amazon’s strength, also signifies that the core BDaaS offering is meant to stay nonspecific to interact with the other services.
One way of vertical integration for BDaaS is in the downward direction to include an optimized infrastructure. This allows to get away with some overheads of virtualization and specifically build hardware servers and networks keeping in mind Hadoop’s performance needs.
Businesses are served, understanding and working with Hadoop that are rising, but are held back by scale and complexity. The software development company can outsource their infrastructure and platform needs and management around Hadoop to an infrastructure service provider. Business can then emphasize on putting Hadoop to work and the stack from SaaS upwards. A package pricing strategy based on storage and compute usage aims to remove common problems of choosing between performance and cost optimization, and give anticipated, fixed costs.
The other way of vertical integration for BDaaS is in upward direction to include features past the common Hadoop ecosystem offerings. The feature driven BDaaS emphases on efficiency and abstraction to get users started with Big Data quickly. The feature BDaaS company’s services includes web and programming interfaces and database adapters pushing technologies like Hadoop into the background and their service reaches into the SaaS layer. In fact, Hadoop clusters are initiated, scaled and even stopped transparently according to the load requirement.
The feature method uses IaaS to provide computing and storage with a noteworthy difference. Being independent from a cloud provider allows a feature BDaaS to leverage computing and storage as a fully scalable and more importantly interchangeable commodity. Amusingly, the compute and storage from IaaS are pass through pay as you go and thus ideal for very variable, volatile, or exploratory workloads.
Finally, another possibility is a fully vertically integrated BDaaS that syndicates the performance and feature benefits of the previous two BDaaS. This is an interesting approach since it could result in the perfect BDaaS, which is productive and supports business users and experts, and provides supreme performance. Both feature and performance BDaaS are at initial stages and the integrated BDaaS could in practice turn out to be a good solution to this difficult problem.
As Big Data is growing as a topic, business and service models are evolving and we can see the similarities and differences between the three competing types of Big Data as a Service. The core BDaaS has been around for quite a time and is in use by many software development companies especially as part of a bigger architecture or for uneven workloads. It has settled as a model supporting the provider’s broader service architecture.
The feature BDaaS needs a proof to be competitive on a performance level, though the commoditization and service level generalization means that at the end of the day, winning of this model isn’t dependent on squeezing the most performance from comparable hardware, but on a dollar to dollar basis. The performance BDaaS, will face business demands from companies that diminishingly are willing to take on the complex challenges of building their own data architecture and linked SaaS layer, and progressively want to focus on their value adding domain specific processes.