Data Integration

Opportunities

Integrating data from different computer systems presents great opportunities but significant challenges whether the data is all internal (usually the result of acquisitions) or a combination of internal and external (from syndicated data services such as Information Resources or The Nielsen Company).

The list below describes some of the key Business Intelligence opportunities with integrated data, that is, internal shipment data from orders and invoices, trade promotion deals, integrated with syndicated point of sale consumption data and advertising and demographics data.

  1. Trade promotion effectiveness.  Syndicated point of sale scanner data can tell us if we sold more product resulting from a price promotion (cents off or buy-one-get-one) or other trade promotion (advertising).  Internal data tells us what our cost of the promotion was (discounts given to the retailer plus advertising costs).  Did we sell enough additional product to make money?  Or, if not, did we get a lasting share increase (called ‘lift’) so that when we raise the price or the deal ends, we sell more than before the promotion.  Even if we did get a lift, is it enough to make a profit on the promotion over time?

  2. Out of stocks.  Out of stocks translates to a lost sale, or worse, a consumer switching to a competitor.  With a Direct Store Delivery (DSD) business when the driver is in the store every day, this is not a problem: the driver can fix it immediately.  However, for second and third tier products or for retailers where the delivery schedule is weekly or thereabouts, it can be a problem.  With integrated data, we can see that when the syndicated consumption data is very close to the internal shipments, this tells us that the retailer is selling everything we ship.  At the end of the delivery period, there is very likely an empty shelf.  Worse, it may have been that way for a couple of days.  We have no other way of knowing, other than to visit the store every day.  When we see this situation, the company’s sales rep has the ammunition he/she needs to persuade the retailer’s buyer to order more.  These second and third tier products are usually the ones sold at full price and more profitable to both the company and retailer.  We can mine the data to detect where there are product-retailer combinations that need monitoring.

  3. Trade promotion planning.  When marketing plans a promotion, they want to know if last year’s advertisements worked.  Syndicated data will show that.  The only way to know what the exact terms were is to view the ad itself, so they need an easy way to put the 2 together.  Next question to answer is whether the ads we ran at (say) Jewel Food Stores worked better that the ads at (say) Meijers Stores.  Without integrated data, figuring this out is so time consuming, no one has the time.

  4. Micro marketing.  Demographic data from the US census can tell us the socio-economic profile of the neighborhood around every store we ship to.  This is not so useful for the major brands because over a period of time, shoppers buy their preferred product, the retailer orders more and the supply/demand is in balance.  It is most useful for new product introductions and for detecting trends.  We can target neighborhoods for new product introductions and get ahead of the curve when a new trend appears (this requires a straightforward correlation model).

  5. Retailer compliance.  This is a hot topic in the industry today.  Did the retailer implement the trade deal exactly as agreed?  For various reasons, the pricing or the display or the ad did not get done as agreed.  Again, with a DSD business where the company’s representative sets the shelf and sets up the displays, this is not a significant problem.  The only way to detect these situations is by comparing trade deal data with actual performance from syndicated data.

  6. Diversion.  It has been known to happen in the industry that a retailer purchases product from a manufacturer at a deep discount for sale in a particular market.  The retailer then ships some of the product to another market and sells it at full price.  The only way to detect this is by matching up internal shipments with syndicated consumption data.  Again, with a DSD business this is an unlikely scenario.

  7. Demand forecasting.  This is the biggest opportunity for integrated data.  This is long term, strategic consumer demand forecasting, not the short term tactical retailer demand forecasting that is done today.  This requires a regression model with consumption data including sales, pricing, promotion activity for in house and competitors, and all the other business drivers: holiday schedule, category trends, weather forecast (!), commodity prices etc.  The key here is to weight all trade deals historically so that we can enter the planned deal calendar and get a long term forecast of consumer demand.  We at Ironbridge have had good success with this approach at a client.  In the salad dressings category, we were able to increase the accuracy of the long term forecast by a few percentage points.  This was enough to enable longer runs at the plant for each flavor and avoid down time for line sanitizing. Our client avoided the need to build a new plant and saved millions.

  8. Efficient Assortment.  This is one of the key components of Category Management.  Here is an example:  your sales rep is trying to sell Jewel Food Stores on the idea of buying more of a particular product.  The sales rep analyses the consumption data and shows the Jewel buyer that in the Chicago market as a whole, this product is selling well and other retailers are carrying more of it.  Also, we must convince the retailer that a competitor’s product is not selling well and that ours will sell better and be more profitable for the retailer.  This gets down to the level of the packaging, not just the product itself.

  9. Loyalty (frequent shopper) data.  There have been some attempts to leverage this data.  However there is so much of it, and because of privacy rules it is masked by the retailer so manufacturers are still challenged to find insights from the frequent shopper data.

 

Challenges

  1. 1. Strategy.  Usually, the data being integrated resides entirely in a Data Warehouse and is integrated as part of the ETL (Extract, Transform and Load) process. In the case of syndicated data however, the volume can be so great as to make this infeasible. In this case, data is integrated (or 'harmonized') in real time, that is, at the time an end user makes a query. Ironbridge's Net-Bench product has this capability. This, then is the first question to answer: is the data entirely resident, or a mixture with the external data remaining at their sources.

  2. 2. Data Acquisition.  Acquiring data from its original source, the company's transaction systems or external services requires cooperation from the data provider whether internal or external. Large scale ERP systems are notoriously difficult to extract data from, they are complex in design, require many tables and abstract the company's data into surrogate keys that are meaningless for data recognition. This step requires specialized skills in the ERP technologies and even more specialized if external data is being brought in house. (Ironbridge has a long history of successful data extraction projects from both internal and external data sources).

  3. 3. Timing.  The different data sources become available at different times of the day, week or month. This is especially true when data is from different divisions of the company. It is important to not make data available from one source before all integrated sources are ready.

  4. 4. Harmonizing.  This is the most significant challenge of data integration. Harmonizing is the process of matching data items from each source. This breaks down into four separate tasks for the four common dimensions of the data: geographies, products, time and measures.

     

    Geographies. The geography dimension includes the company's regions, districts, sales territories all the way down to individual customers. The "Global Location Number" is an international standard for all physical locations. If this is known for the company's customer ship-to locations, this is a very useful concept for standardizing geographies across systems. When integrating different internal systems that have no geographical overlap, the only challenge is to apply consistent coding to the levels above customer. Zipcode or Postal Code is a good starting point.

    When different systems contain the same customer but with different coding schemes, there is no choice but to make a harmonizing relationship table. With luck, the customer name and address is spelled consistently between systems. If not, there must be manual maintenance.

    The process of integrating syndicated or any external data depends entirely on the levels available in the external data. For a DSD business that has syndicated store level data available, matching the retail chain and store number is straightforward. If the syndicated data is at trading area or retail market area or metro-market, then we must know the stores that make up the area. This is available from the syndicated supplier. We then aggregate our data from customer up to these levels.

    The same concept applies to a Distribution Center based business.

    Products. Products refers to the company's manufactured items and in the case of external data, competitors' items. If internal systems use different SKU or Item ID schemes, then the most effective harmonizing method is to use product attributes and descriptions to match items. As for customers, there will be a need for manual matching to achieve 100% coverage.

    External (syndicated) data has only the UPC/NDC/EAN for matching since this is what the retailer's point-of-sale scanners provide. If our product system does not contain an equivalent UPC for retail items (it may be a case or pallet identifier), then we harmonize using product attributes. External data, of course, includes competitor information. When harmonizinf using attributes, we must decide whether to use the syndicated provider's coding system or our own. (Since competitors may have products outside our range, the syndicated system is usually preferred).

    Time. Harmonizing time is not always straightforward. Perhaps our company closes its week at Saturday midnight but IRI (for example) closes at Sunday midnight. Individual time-stamped transactions can always be aggregated to any time period.

    Measures. Generally, companies 'equivalize' product volumes when they are sold in different packages so that volume measures are available in 'equivalent pounds/kilos/gallons/liters' in addition to the unit quantity sales. This is the only reliable way to compare internal and external sales volumes. 'Unit volume' will be unreliable when our company counts a shipped case of 24 items a 'unit' whereas at the point of sale, each item counts as one. 'Sales dollars' from internal data are wholesale, whereas point of sale 'sales dollars' are retail.