Big Data: Humble Objectives!

The (generalized) ‘Birthday coincidence’ problem and the ‘Gamblers ruin problem’ that are in a sense contra-positive to each other, are perhaps the most non-intuitive of basic probability problems. What sounds less likely (Birthday coincidence) is more likely and what sounds more likely (winning a jackpot) is less likely. For those who are not likely to be shocked, it takes just about 23 random people to reach a almost 50% chance of one pair having the same birthday (not incl. the year of birth) and there is almost a 99.9 % probability in 70 randomly chosen people, with one pair sharing the same birthday. No brainer that 100% probability is attained with 366 people.

The answers to both these problems come entirely from first principles of high school probability, notwithstanding the alternate ways (e.g. probability as a limiting function) of looking at this problem. No intimidating math, no advanced statistics, no software, no hardware, No analytics, no Big Data. It is simple math. Assuming these problems were not identified yet by humanity, a proof by observation could have been a possibility i.e. repeat the trials sufficiently large number of times across time and space, store the data points incl. environmental variables like race, weather, height of respondents and distance from Rome in a 100000 TB columnar data warehouse and then carry out an analysis of this data to perhaps reach a conclusion that in 9 out of 10 samples of 50 or more random people, it is noticed that at least one pair shares the same Birthday, however the chances are twice as high in a Geo X because the chances of twins born there is 3 times higher than the global average.

No, I am no professional statistician or a mathematician. I am a ‘supply chain management’ professional. I intuitively understand the systemic corruption by chance and variability (of anything) that characterizes the behavior of supply chain as a whole wrt its ability to meet or exceed expected service levels at a certain cost. I can also discern certain underlying patterns by merely looking at the data. Not all patterns but some patterns. E.g. I may know that each time a certain product is ordered by a certain customer between Dec and Jan, there has been a failure of service level because of a scheduling issue associated with this product. I may also know that each time this product is produced, it is likely to delay the schedule objectives of 10 other products that share the same resources. I may also know that that customers from a particular region always order product A whenever they order B and K but not B and D. I may also know that selectively promoting a particular product X results is more gross margins than promoting Y along with X. I may also know that my SKU competes with 28 other possible SKU’s from the competition, not necessarily based on how the product is grouped and targeted from a sales or marketing point of view. I may also know that chocolate cookies are competing with dark chocolates and hence sales of both in conjunction may be more predictable that each on its own. I may know that if the competition is launching a new product next month, I will lose sales of 5 of my ‘unrelated’ products by varying proportions even though the product is not directly competing with the remaining n-1 products. I may also know that a 5 % price discount erodes bottom line by 14% by a 20% price discount marginally improves the bottom line by 3% (well I am selling more because of appealing discounts)

These observations MAY help in many ways. E.g. rationalizing SKU’s in the interest of better delivery performance and/or direct costs of delivery or devising more targeted promotional strategies or improving the gross margins and profitability or scheduling my resources better or devising innovative grouping and forecasting demand for products or changing the production schedule at the nth hour based on certain developments in the market that may or may not be in my interest or deciding not to sell certain products at all.

Talking of Big Data, the only USEFUL statement that I have come across after sifting through a few dozen pages and videos is “Big data is not about more information but NEW information, ‘Hidden’ patterns’, Information hitherto unknown but MAY be useful for arriving at some fruitful outcomes in the interest of whatever. But I am interested in a set of hypothesis to begin with. Unknown Unknowns is how some people call it when referring to hidden patterns in such data. But it will be prudent to ask the following questions first.

Do I currently have a set of hypotheses based on my currently available data? That can be conclusively useful for a new business or operational strategy.
If so has the hypothesis been tested by doing things on the ground?
If tested what benefits were accrued?
If expected benefits were not realized, did you do some kind of sensitivity analysis and repeat 2 ?

What is easy to visualize at this point is how data, seemingly (ir) relevant, incl. unstructured data, can be harvested and stored. That is the easy bit. Technology for storage and real time analytics is out there and some are beginning to store tons of data. May be that too will cost peanuts a decade down the line with syndicates of big data outside of the organization available for purchase on a subscription basis. What is not there yet is the readiness with the possible use cases of Big Data Analytics.

Can we think of some substantive ones in supply chain or more generally in typical business functions for a start? I am keen to know how existing platforms and solutions can be enriched in the interest of a, b, c…. in the context of revenues, costs, profits, service levels, employee engagement….

Arijit Dutta