What are different types of Big Data as a Service (BDaaS)

The fame of Big Data lies within its wide-ranging definition of employing high volume, velocity, and

The fame of Big Data lies within its wide-ranging definition of employing high volume, velocity, and different data sets that are difficult to manage and excerpt value from. Clearly, most software development companies can identify themselves as facing Big Data challenges and chances today or in future. This, therefore is not a new issue yet it has a new quality as it has been aggravated in recent years. Cheaper storage and omnipresent data collection and availability of third party data has overtook the capabilities of traditional data warehouses and processing solutions. Businesses investigating Big Data frequently recognize that they lack the capacity to process and store it sufficiently. This shows either an inability to employ existing big data sets to the fullest or inability to expand their current data strategy with additional data.

Three strata of cloud computing as a service

Big Data as a Service is in the business of countless as-a-Service offerings. The most noteworthy ones that allow us to classify any subsequent services are threefold. Infrastructure as a Service (IaaS), e.g. virtual machines, networks, storage devices, or servers, is the most basic building block and includes everything (real or virtual) you would expect inside a data center. Above this level, exists the Platform as a Service (PaaS) which includes frequently employed software like web and database servers, or Hadoop and its ecosystem. At the next level is Software as a Service (SaaS) which are still nonspecific but more user interactive services like Email, CMS or CRM. Finally, past SaaS are usually domain or business specific applications.

Hadoop or a substitute distributed compute and storage technology at the platform level naturally builds the core of a BDaaS. Subsequently, any BDaaS solution includes the PaaS layer and may be SaaS and/or IaaS. This leaves us with four possible groupings for BDaaS:

  • PaaS only – focuses on Hadoop
  • IaaS and PaaS – focuses on Hadoop and optimizes infrastructure for performance
  • PaaS and SaaS – focuses on Hadoop and productivity & exchangeable infrastructure features
  • IaaS and PaaS and SaaS – focuses on total vertical integration for features and performance

Big Data as a Service Models


Big Data as a Service (BDaaS) offerings in the cloud can be categorized into one of four types:

Core BDaas

The core BDaaS implements the minimal platform, e.g. YARN and HDFS having Hadoop and a few popular services like Hive.  Amazon Web Service’s Elastic MapReduce (EMR) is the most noticeable core BDaaS and represents this model. EMR is one of myriad services in Amazon’s offering and EMR assimilates well with many of the other services just as NoSQL store DynamoDB or S3 storage. Users can combine them to build something like data pipelines to a comprehensive full company infrastructures around the EMR service. However, composability of its services being Amazon’s strength, also signifies that the core BDaaS offering is meant to stay nonspecific to interact with the other services.

Performance BDaaS

One way of vertical integration for BDaaS is in the downward direction to include an optimized infrastructure. This allows to get away with some overheads of virtualization and specifically build hardware servers and networks keeping in mind Hadoop’s performance needs.

Businesses are served, understanding and working with Hadoop that are rising, but are held back by scale and complexity. The software development company can outsource their infrastructure and platform needs and management around Hadoop to an infrastructure service provider. Business can then emphasize on putting Hadoop to work and the stack from SaaS upwards. A package pricing strategy based on storage and compute usage aims to remove common problems of choosing between performance and cost optimization, and give anticipated, fixed costs.

Feature BDaaS

The other way of vertical integration for BDaaS is in upward direction to include features past the common Hadoop ecosystem offerings. The feature driven BDaaS emphases on efficiency and abstraction to get users started with Big Data quickly. The feature BDaaS company’s services includes web and programming interfaces and database adapters pushing technologies like Hadoop into the background and their service reaches into the SaaS layer. In fact, Hadoop clusters are initiated, scaled and even stopped transparently according to the load requirement.

The feature method uses IaaS to provide computing and storage with a noteworthy difference. Being independent from a cloud provider allows a feature BDaaS to leverage computing and storage as a fully scalable and more importantly interchangeable commodity. Amusingly, the compute and storage from IaaS are pass through pay as you go and thus ideal for very variable, volatile, or exploratory workloads.

Integrated BDaaS

Finally, another possibility is a fully vertically integrated BDaaS that syndicates the performance and feature benefits of the previous two BDaaS. This is an interesting approach since it could result in the perfect BDaaS, which is productive and supports business users and experts, and provides supreme performance. Both feature and performance BDaaS are at initial stages and the integrated BDaaS could in practice turn out to be a good solution to this difficult problem.


As Big Data is growing as a topic, business and service models are evolving and we can see the similarities and differences between the three competing types of Big Data as a Service. The core BDaaS has been around for quite a time and is in use by many software development companies especially as part of a bigger architecture or for uneven workloads. It has settled as a model supporting the provider’s broader service architecture.

The feature BDaaS needs a proof to be competitive on a performance level, though the commoditization and service level generalization means that at the end of the day, winning of this model isn’t dependent on squeezing the most performance from comparable hardware, but on a dollar to dollar basis. The performance BDaaS, will face business demands from companies that diminishingly are willing to take on the complex challenges of building their own data architecture and linked SaaS layer, and progressively want to focus on their value adding domain specific processes.


Emerging Technologies and Opportunities for Big Data Applications

Introduction Software development companies are trying to be up to date with the emerging trends for


Software development companies are trying to be up to date with the emerging trends for Big Data. Big data has got a well-recognized place by The Government as well. Government has recognized big data by categorizing it as one of the ‘Eight Great Technologies’ which will drive the world to future growth. The (COMMUNITY, 2014) reports on the increase in data being produced and the importance of new types of computing command in order to reap the economic value of the data.

Big Data

According to (COMMUNITY, 2014), the following is a working definition of Big Data:
“Big Data refers to huge volumes of data with high level of complexity as well as the analytical methods applied to them. This requires more cutting-edge techniques and technologies in order to develop meaningful information and understandings in tangible time”.

Analytics is considered to be the inherent part of new techniques and technologies for Big Data. The scope of analytics covers three roles:

a) Descriptive analytics - to understand what is happening in the world, using visualization techniques, some modeling and regression.

b) Predictive analytics - to predict what will happen, using forecasting.

c) Prescriptive analytics - to work out what we want, using simulation, optimization, scenario testing and Multi-Criteria Decision Analysis.

Trends in Big Data Analytics

Big data technologies and practices are moving quickly. Here’s what one should know, according to (Mitchell, 2013), to stay ahead of the game :

a) Big data analytics in the cloud

This allows users to access extremely scalable computing and storage resources through the Internet. It allows companies to get server capacity as needed and expand it rapidly to the enormous scale required to process big datasets and execute complicated mathematical models. Cloud computing reduces the price of data storage because the resources are shared among many users, who pay only for the capacity they actually utilize. Companies can access this capability much more quickly, without the expense and time needed to set up their personal systems, and they do not have to purchase enough capacity to accommodate highest usage.

b) Hadoop: The new enterprise data operating system

Hadoop is by far the most popular implementation of MapReduce. MapReduce is a completely open source platform which handles Big Data. As it is flexible, it works with multiple data sources. It either aggregates multiple sources of data in order to do large scale processing, or reads data from a database in order to run processor-intensive machine learning jobs. It has several diverse applications, but one of the top usages is for large volumes of constantly changing data. Changing data may be web-based or social media data, location-based data from weather or traffic sensors, or machine-to-machine transactional data.

c) Big data lakes

Traditional database theory dictates that you design the data set before entering any data. A data lake, also known as an enterprise data lake or enterprise data hub, turns that model on its head. It offers tools for people to analyze the data, along with a high-level definition of what data exists in the lake.

d) More predictive analytics

Predictive analytics is the branch of data mining concerned with the prediction of future prospects and trends. The central element of predictive analytics is the predictor, a variable that can be measured for an individual or other unit to predict future behavior.

With big data, analysts have not only more data to work with, but also the processing power to handle great numbers of records with many attributes.

e) In-memory analytics

It works by increasing the speed, reliability and performance when querying data. Business Intelligence deployments are typically disk-based, that is the application queries data stored on physical disks. In contrast, with in-memory analytics, the queries and data exist in the server's random access memory (RAM).

The use of in-memory databases to speed up analytic processing is increasingly popular and highly valuable in the right setting. Many web application development companies are making use of In-memory analytics to attain more reliability and greater performance.

f) More, better NoSQL

Alternatives to traditional SQL-based relational databases, termed NoSQL (short for “Not Only SQL”) databases, are rapidly gaining importance as tools for use in specific kinds of analytic applications.

 According to (Wikipedia), the working definition of NoSQL is as follows:
“A NoSQL, formerly referred to as "non SQL" or "non-relational", database provides a mechanism for storage as well as retrieval of data which is modeled in means other than the tabular relations used in relational databases.”


While the subject of Big Data is broad and encompasses many trends and new technology developments, Software development companies in India are keeping pace with the global market. It becomes essential for organizations to cope with and also handle Big Data in a cost-effective way. The various technologies emerging for Big data applications are Hadoop, Column-oriented databases, MapReduce, Schema-less databases, or NoSQL databases to name a few.


Mitchell, R. L. (2013, Oct 23). 8 Big trends in big data analytics. Retrieved Apr 25, 2016, from 8 Big trends in big data analytics:
Rouse, M. (2012, June). Cloud ERP. Retrieved 04 20, 2016, from
Wikipedia. (n.d.). Retrieved from https://en.wikipedia.org/wiki/NoSQL