Data Synchronization in Supply Chains

Updated: Dec 26, 2019

Supply chains are multi-dimensional and complex. There are products, geographies, customers, suppliers, channels, stocking locations, transportation, warehouses, and factories. Each of these dimensions has entities that are connected to other entities in other dimensions. But none of these dimensions has as much impact on complexity as the one dimension that cuts across all the other dimensions: time. Products are connected to customers, warehouses, and factories not in a static way, but across a time horizon that unfolds like a synchronized dynamic symphony that never rests.

But this symphony does not operate or sound quite like the Berlin Philharmonic. In the case of supply chains, the right hand is often not time-synchronized with the left hand. The best orchestras not only have the best individual musicians, but also have the best synchronization, a function provided by the conductor. So it goes with supply chains – functional excellence is very important, but its value can be completely lost when functional areas are not adequately synchronized with each other. But who is the orchestra leader in a supply chain? It’s increasingly the customer, in this case the farthest downstream customer you can find in your supply chain.

Time is Money

To properly orchestrate a supply chain, entities across the various dimensions (demand, physical assets, machinery, equipment, materials, people) must be linked in time. Data and information must flow between an entity A and every other entity with which entity A has a relationship. This creates a very complex web of data transmission.

The time dimension makes supply chain software different and unique from other enterprise software such as CRM and two-sided marketplaces. For example, most CRM software has no time dimension; this makes the software much less architecturally complex. In a CRM system, if you want to capture information across a time continuum, you have to periodically download Excel snapshots, or purchase add-on modules that reverse engineer a time dimension by filtering through time-stamped system log files. (This is interesting considering log files are at the center of streaming architectures, to be discussed below).

Integration and Synchronization

The supply chain world has been awash in discussions on “integrated supply chain” for the past twenty years. Data integration has historically represented up to 60% of the investment required for the software component of supply chain transformation programs. It has been a principal revenue lifeline for systems integrators. But integration is just a mechanical mechanism – for example, it means data can be transferred and interpreted between a warehouse system, a transportation system, and a factory system. It says nothing of the time stamp associated with such data and whether or not the factory systems are basing their decisions on warehouse data that is current, two hours old, or two days old. Just because data is pulled from a database and used immediately, it does not mean that the data is current; the data may have been deposited in the database via an ETL (extract, transform, load) batch process 12 or 24 hours ago.

Synchronization means making decisions in the factory based on current data from the downstream warehouse, or importantly, understanding that the warehouse data against which decisions are being made is two days old, requiring the factory to buffer appropriately. Due to distance and other physical considerations, it may not be possible to always base decisions on exactly current data from two different systems in two different locations. Thus, it is important to understand the concept of snapshots and to account for them. (Even Einstein had to account for distance and the speed of light when determining if two lightning strikes were or were not simultaneous).

Plans are Developed Based on Snapshots

Plans are necessarily developed based on snapshots of the current state of the supply chain at a given point in time. A model of the supply chain, now commonly called a digital twin, is the foundation against which plans are developed and decisions are made. This model contains data on the current state of warehouses, transportation equipment, factory machinery, inventory, and demand. It gets this current state by pulling data from various systems, including ERP, MES, CRM, supplier systems, and customer systems.

Once the data in the model is updated, algorithms are run to develop plans and make decisions, both automatically and interactively. Once plans are developed and decisions made, there is a post process to populate new decision data into systems that consume such data. In the past, the process of gathering up the latest data and updating the model could take hours; the process of running plans, making decisions, and populating the results could also take hours.

With newer technologies, this process, particularly the decision-making part of this process, has been compressed significantly. For example, the best systems can now execute the planning and decision-making process in minutes or seconds. However, much of the front-end and back-end data updating processes may still be time consuming.

The point in time at which this process starts is called a snapshot. The snapshot captures the state of the supply chain at a point in time, at least as represented by the data. However, the data itself, depending on when, where, and how it was gathered, may have a latency even at the time of the snapshot. Much of this is due to cascaded batch data processing, such as that shown in the figure below.

For example, the retailer cascades its data up (“south to north”) from store execution systems to higher level planning and decision systems; likewise, the distributor and manufacturer cascade data up from their transportation, warehousing, and manufacturing execution systems. These cascaded processes may be on the same or different cadences and frequencies. As shown in the figure, this results in situations where time T0 does not equal time T0’, which does not equal time T0’’, and so on back through the supply chain (“east to west”). In other words, the snapshot timestamps are different both within an enterprise and even more so across the supply chain.

And, like the case of the Tacoma Narrows bridge, these discrepancies, if large enough, can lead to large back-and-forth oscillations. Supply chains are complex systems, and as anyone knows, small changes to inputs for even simple systems can lead to large changes in outputs.

There is No Such Thing as Static Data

A lot of this can be accounted for by understanding what data is relatively static and what data is relatively dynamic. In the past this was pretty clear – structural data was static and operational data was dynamic. Now, even this is changing.

In another area of increasing complexity for supply chain management, data that was relatively static in the past has become more dynamic. Unfortunately, many supply chain systems continue to treat such data as static; this has an adverse effect on supply chain performance. For example, most supply chain systems consider lead times to be relatively static and only change them based on infrequent analysis; they are entered into systems and then revisited once per year. In fact, most lead times are dynamic.

For example, Kinaxis has used its machine learning-based self-healing supply chain software to show that a significant number of previously-considered static lead times can be off by significant margins. Kinaxis is now using this same software to ensure its digital twin is in sync with actual lead times. In the process it has improved the quality of the plans it creates. Even supply chain structure (supply sources, factories, warehouses, et. al.) has become increasingly dynamic, as customers choose to shift relationships on the fly.

New Synchronization Architecture

The past thirty years has shown that an inherently distributed system like a supply chain cannot be adequately served by an inherently centralized database system like that provided by ERP software and its architectural derivatives. Nor, it seems, that supply chains can be data-synchronized by stringing together sequences of such systems.

Batch data cascading from south to north and from east to west such as that shown in the earlier figure results in unknown latencies with unknown consequences. In today’s world, these cascading data pipelines are being replaced by direct and near latency-free updating via IOT and streaming architectures.

Computer scientists have worked on the distributed time synchronization problem for a long time. Among the hard problems to solve in distributed systems are fault tolerance and time synchronization, which are interrelated. When one computer node fails, how do you ensure that a backup can pick up the load and the integrity of the overall system be maintained? How do you ensure that data that is processed separately and individually by different computer nodes can be time-sequenced across those nodes?

When you have a system interoperating across 100 or 1000 computer nodes, these are real and important problems. Albeit in a different context, these problems are very similar to the ones we have with supply chain synchronization. Interestingly, the advent of big data and artificial intelligence has led to novel approaches in distributed systems thinking. Some of this thinking has turned the database upside down: the architecture is centered around events and the database simply becomes a producer or consumer of events. In this approach, the event becomes the center of the architecture; all other systems – including ERP – are simply producers and consumers.

New requirements drive technical innovation, but technical innovation also drives new requirements, as leaders understand the art of the possible. Big data analytics led to distributed systems, MapReduce, and large-scale batch data systems. At the same time, there was a requirement to harmonize real-time data with batch systems. This led to the Lambda architecture, which is a hybrid of batch and real-time. This in turn caused people to question why a hybrid approach was necessary; instead of thinking of real-time as special case in a batch-dominant world, things started to evolve in the opposite direction – we live in a real-time (streaming) world in which batch is a useful subset. Thus emerged the idea of a streaming (event) architecture, which has the potential to become the dominant data design pattern for data synchronization.


The streaming approach could be particularly useful for event processing across enterprises. Since stream processing is inherently distributed, as are supply chains, it would seem that streaming would be a useful abstraction for multi-enterprise supply chain problems. Streams could be set up to subscribe to other streams. For example, a manufacturer or distributor could subscribe to a retailer’s streams.

With streams of data from across the supply chain, one might imagine creating a “movie” of what happened in the supply chain in the past week, with an ability to rewind, pause, and fast forward. This streaming data could also be used to feed multi-enterprise machine learning algorithms. This is the essence of where the term “stream” came from: a time-sequenced frame-by-frame record of events.

Consider the tracking of all events and activities associated with a product as it moves from its genesis (the very first materials and operations that go into that product) to the point of consumption in a customer’s hands. And, consider how this information might be a selling point for consumers: Here’s your product, and here’s the URL into which you can put the product serial number and can get a list of everything that has happened to that product since it started its supply chain journey. Of course, in certain regulated industries like pharmaceuticals, serialization and genealogy are already requirements; this takes it to another level. These requirements are finding their way into consumer products industries, particularly food and beverage.


The rollout of these approaches will unfold over a number of years. It takes time for new approaches to become pervasive across any interconnected multi-process, multi-organization, and multi-enterprise system. Architectural and organizational inertia, risk aversion, investment cycles, along with talent shortages will cause adoption to follow the usual bell curve with early, mainstream, and late adopters. Furthermore, architecture shifts are especially challenging because they touch all areas of an enterprise and its systems; these shifts also place a premium on planning, collaboration, and leadership.

Finally, while it is right to focus a lot of attention on digital transformation, particularly artificial intelligence, it is important to not lose sight of supply chain fundamentals, particularly time, which is one of supply chain’s most important variables. The path to digital transformation requires a close look at time synchronization issues and the underlying data pipelines that contribute to them.

#dataarchitecture #supplychainsoftware #streamingarchitecture #ERP #synchronization #integration