Big Data Analytics: Don't Miss the Woods for the Trees

Posted by Mahesh Yellai on 17th Aug , 2016

Harnessing Big Data for data-driven decision making is a tough challenge for any business, because of Big Data's 3 Vs: Volume, Velocity and Variety. Whether it is in Ecommerce, Retail, Financial Services, Healthcare or any other industry, the scale of the data is overwhelming for most traditional analytics tools.

How do you process, analyze and visualize such datasets as quickly as the data changes? This blog post discusses the challenges you may face and the strategies that may help in deriving actionable insights from Big Data.

Size Matters

"The Milky Way", Image Courtesy: Peter Ozdzynski at Flickr

The biggest hurdle to harnessing Big Data is its sheer size. How do you store such large datasets in a way non-technical business users can easily query for insights? How do you scale your hardware to be able to process such large datasets. How do you process and summarize such large datasets easily for your users to make the right decisions? How do you do all of this within a response time that is good enough for users to make their decisions in real-time. To do all of this and more, you need a platform that is elastic and can scale out to handle the volumes of Big Data.

Need for Speed

"Speeding Away", Image Courtesy: snapp3r at Flickr

In most businesses, today, delayed decisions could have a significant and adverse impact on business performance. Especially, in use cases related to Banking and Financial Services, Ecommerce, or IoT; the pace of change of data is overwhelming and businesses have to respond to new data as fast as it arrives. How can you make decisions in environments like these, if the tool you use to process this data does not respond to your requests at the pace at which new data is coming in? So, you need a platform that is responsive at web-scale.

Variety is the spice of Data

"Variety of Spices", Image Courtesy: Delphine Ménard at Flickr

The last V of the 3Vs is Variety. Data that is relevant to business decisions is often times sourced from varied data sources which by nature are disparate. For e.g. a retailer making stocking decisions should not only consider transaction history from within the Organization's enterprise systems but also consider signals from the market, weather and macroeconomic conditions. Even within an Organization, the enterprise systems are varied and so are the underlying data structures. How do you aggregate and store data from all these disparate data sources to provide a single version of truth? To tackle the challenges of wide variety of data structures, you need a datastore that is schema-less or one that supports a loosely defined schema. NoSQL datastores are a great choice for this class of problems.

Don't miss the Woods for the Trees

Each of the Vs by themselves pose a non-trivial challenge when harnessing Big Data. And in combination, they can make most Big Data implementations fall short on business expectations. In order to optimize your Big Data strategy, it is important to understand that Big Data can be used for three different kinds of analytics: Descriptive Analytics, Predictive Analytics and Prescriptive Analytics. Once you understand the differences, you can define your strategy to address any or each of them, depending on your business objectives.

"Trees in Woods", Image Courtesy: jungle_group at Flickr

Descriptive Analytics is the art and science of mining past data with the objective of drawing patterns and deriving insights from history. It is the process of aggregating and summarizing historical data, to provide users with hindsight so they can make course corrections. Descriptive Analytics is the first step in your journey to Big Data mastery. It is also where the maximum bang for your investment buck is. A key success factor at this stage is your choice of the right data discovery platform that can handle the scale of Big Data.

Predictive Analytics is the use of machine learning algorithms on past data to predict future outcomes. At this stage, you are trying to find solutions to either of two categories of problems: Classification or Regression. Classification problem statements ask which class new data is likely to fall into; while Regression problem statements ask the exact value a variable is likely to take. For e.g., a question like: Is this new transaction fraudulent?, is answered by a Classification algorithm, while a question like: What is the expected lifetime value of a new Customer?, is answered by a Regression algorithm. Whether it is classification or regression, you have a variety of algorithms in each category at your disposal for every problem statement you can come up with. Each strategy has a different cost of implementation and you can be dragged down a rabbit-hole if you are not wary of the incremental accuracy you get by employing a complex algorithm over a simpler one and the additional costs of implementing such an algorithm. You need to pick the right strategy keeping your budgets and timeframes in view.

Prescriptive Analytics is the application of algorithms to prescribe actions. This type of analytics makes use of descriptive as well as predictive analytics to prescribe the best course of action. This is the most advanced form of analytics and requires specialized algorithms that are built for individual use cases. There is no one-size-fits-all software to address the requirements for prescriptive analytics. Custom build the right software for your requirements.

Descriptive analytics is where you should start and prescriptive analytics is where you should advance to. For descriptive analytics, you need a platform that helps your non-technical business users in data discovery without you having to incur high implementation costs (keeping the TCO low), while working at Big Data scale. In addition, your choice of a data discovery platform should support the deployment of the predictive models so the outcomes predicted by those models can easily be visualized by your users.

Vizard, our award-winning data discovery tool can help you in your journey to Big Data nirvana, watch this video. Vizard lets your non-technical business users search and analyze Big Data, thus opening up Big Data to the wider audience within your Organization.

About Infruid

Infruid Labs is a Business Analytics and Data Visualization solutions company. Vizard is Infruid’s patent-pending and award-winning Big Data Visualization platform. Vizard helps business users ask questions in simple English and answers back with interactive charts. Ask for a demo to know more.