MapReduce

MapReduce is a processing methodology that makes it easier for computers to process, compare and analyze huge datasets. The core principle in MapReduce is to reduce all data fields into their smallest but most meaningful descriptor.

The methodology of MapReduce is similar to solving a seemingly complex arithmetic problem by reducing its components as much as possible. For example, consider the problem:

If “x” = 3, then 4×2 + 4×2 + 4×2 = ?

The slow way to calculate this problem would be to substitute “x” with “3” at each instance then calculate the sum. But instead, we can reduce the problem to look like this:

12×2 = ?

This reduction dramatically condenses the steps needed to solve the problem, lowering the amount of processing power and time needed overall to make the calculation.

MapReduce applies the same logic and methodology to enormous datasets. For instance, if we wanted to predict the most advantageous launch date for a seasonal product, a database can compare data from millions of samples and transactions over time. Normally, each transaction described would have several fields:

((product name X), (retail location), (retail price), (product wholesale unit cost to retailer), (transaction date))

Yet, MapReduce can simplify this to:

((price), (daily profits from product name X))

By reducing data points in this fashion, fewer fields are considered when making massive calculations. Only the data changes or movement that we’re interested in is being considered, and the computer no longer has to store values for each individual transaction.

MapReduce became popular when organizations began adopting Apache Hadoop and Spark parallel computing systems. With these systems, advanced statistical techniques and machine learning simplify large distributed data-sets while running parallel calculations to deliver rapid results.

In the BI world, MapReduce tools are critical since they allow predictive analytics functions to work with lower processing requirements. Examples of applications for MapReduce analysis include:

  • Making business predictions
  • Determining key factors that drive business events
  • Determining look-alike competitor products or customer demographics

Thanks to MapReduce, big data analytics and business intelligence are able to function at a competitive cost while being accessible to a much wider swathe of users.