Welcome! GovernYourData.com is an open peer-to-peer community of data governance practitioners, evangelists, thought leaders, bloggers, analysts and vendors.
The goal of this community is to share best practices, methodologies, frameworks, education, and other tools to help data governance leaders succeed in their efforts.
We all want a zero wait infrastructure. This has spurred many organization to push all data through real-time infrastructure. Unfortunately I’ve seen the mistaken point of view that real-time is the “modern” solution and batch is the “old way”. But both batch and real time processing are very old they both exist for a reason. It is important to determine the nature of the business need and the realty of data acquisition in determining whether data should be moved in “real-time” or in “batch”.
The basic tradeoff here is between high throughput and low latency. It can be somewhat counter intuitive for the wider team, so it is important to make certain to determine what the throughput requirements and latency requirements are separate from each other. The best example of this what is the fastest way to get from Boston to Wilmington? You might say taking a plane. Well that would be true for a few people, giving them the best latency. What would be the fastest way to move a million people, how do you get the best throughput?
Another important trade off is that zero wait means information is ready when a user needs it. Since the user will usually need data that has been processed (e.g. cleaned, combined, augmented, …) it is import that the data infrastructure is built with this in mind. So if the user is going to need information that includes an average, sum or comparisons there is a natural need to have a set of data. In these cases a "real-time" approach can actually increase the wait.
Real-time is very good for dealing with collecting input and responding to a user. But it is not the solution for all data movement. It is not even necessary the fastest. Architects always need to ask is this a batch or real-time problem.
More about Jeff