Welcome! GovernYourData.com is an open peer-to-peer community of data governance practitioners, evangelists, thought leaders, bloggers, analysts and vendors.

The goal of this community is to share best practices, methodologies, frameworks, education, and other tools to help data governance leaders succeed in their efforts.

Asset in, Garbage Out: Measuring data degradation

We all know about the "Garbage In/Garbage Out" reality that data quality and data governance practitioners have been fighting against for decades.   If you don't trust data when it's initially captured, how can you trust it when it's time to consume or analyze it?

But I'm also looking at the tougher problem of data degradation.  The data comes into your environment just fine, but any number of actions, events - or inactions - turns that "good" data "bad".  

I'll be blogging on this data degradation topic in much more detail in the near future, but to provide a few examples to set the context, good data can turn bad due to:

  • Age. A customers postal address may be perfect the day they place an order with you, but are they still at the same address 3 years later?
  • Data modeling or metadata lapses.  E.g., Right data put in wrong context  (e.g., First name put in Last Name field); 
  • Ungoverned human and system processes that update or change the data.  e.g., Upstream applications allow non-validated updates to valid data. 

There are many more "flavors" of degradation that I'm tracking - 8 to 9 actually - that I'll deep dive into in that blog.  

But first, I'd be very interested to get feedback on if and how you're tracking this issue.

Tags: Data, Governance, Quality

Views: 314

Reply to This

Replies to This Discussion

Hi Rob. I have worked a lot with the first matter you have mentioned around ageing of data. Hooking your data up with external reference data sources is essential in order to prevent this kind of decay caused by that people and companies move, they dies or dissolves as well as a lot of other things happens in the real world and that should be reflected in your master data.

As late as this Thursday I wrote about such a service, now available as self-service in the cloud. The post is called Data Management in the Cloud.

The post tells about a business case where the self-service element plays a role in using such a service in order to implement data governance around master data.

My personal favorite is [Age].  As in collecting the customers/persons age at the time of the transaction, rather than collecting their birthday (or at least month and year).  

On certain sites and systems, I'll be 31 forever (oh, if only it were true).

Thanks Marc, Age is one of my favorite examples as well.  It's the DQ poster child for how NOT to plan for long term usefulness of data!  But it's amazing how many other forms of degradation beyond age I'm finding and that are likely not the focus of many DQ and DG efforts.

Marc said:

My personal favorite is [Age].  As in collecting the customers/persons age at the time of the transaction, rather than collecting their birthday (or at least month and year).  

On certain sites and systems, I'll be 31 forever (oh, if only it were true).

Degradation of data quality in and of itself is a bit of a ref herring, in my view.

I keep banging on about understanding the true VALUE of the data; what use will it be put to, what action will you take as a result, and what changes of outcome do you then expect to achieve? The measures of "quality" are only a function of the data's use - without that defining context, you might as well not bother!

Which then begets the conversation of measurable outcomes for the business process, an area that too few organisations have given real thought to. And so to understanding whether the data that is actually collected supports the chosen outcome measures to define whether the business is performing (or not).

Then - and only then - does the question of monitoring data quality and tracking the up-stream issues that contribute to degradation come into play. 

Thanks for your comment Alan.  I agree that the first step of doing anything in regards to data quality is to identify what business outcome you're looking to drive.  My most recent blog post "Build a Prioritized Data Management Roadmap" focuses on that very subject, as do many of the other discussions on this site.  

But  I don't think that that priority focus should eliminate the very real need to understand how and why the data got to a point where it started to damage the business in the first place.  How are you going to improve these critical business processes, decisions and interactions that rely on trusted, secure data if you don't have an understanding of the data itself?

If I were evangelizing the need to analyze data degradation as a first step - or a higher priority - then I would agree with your statement that it's a red herring.  But that was not my recommendation.  

I hope this helps to clarify the purpose of my discussion to ask the community for feedback on their discovery of this very real problem.

Thanks!


Alan David Duncan said:

Degradation of data quality in and of itself is a bit of a ref herring, in my view.

I keep banging on about understanding the true VALUE of the data; what use will it be put to, what action will you take as a result, and what changes of outcome do you then expect to achieve? The measures of "quality" are only a function of the data's use - without that defining context, you might as well not bother!

Which then begets the conversation of measurable outcomes for the business process, an area that too few organisations have given real thought to. And so to understanding whether the data that is actually collected supports the chosen outcome measures to define whether the business is performing (or not).

Then - and only then - does the question of monitoring data quality and tracking the up-stream issues that contribute to degradation come into play. 

Thanks Robert - this is good debate. (It would be easier face to face!) You raise an interesting consideration when you ask "how and why the data got to a point where it started to damage the business in the first place." When the winds start blowing is a different scenario to when it's blowing so hard that your house will fall down!

I've not come across too many organisations that are able to even identify the causal link between poor data and business impact, let alone measure it or act upon it (there are therefore two things to measure - the quality of the data, and the impact on business).

I think the reason might be that all too often data issues either get conflated with the Process agenda, or with the IT agenda, which ends up with the data not actually getting any attention. For me, this is why we need to advocate explicitly for Data Governance and Data Quality as a separate (but complementary) discipline.

There's another problem at play - the issue of incremental impact. It's very seldom that any individual data quality issue will be of sufficient magnitude to cause any real impact on the business's overall performance. It's only when you aggregate hundreds - or thousands - of errors together that you can see the impact data quality problems can cause (if one telephone call drops out, who cares. If ten thousand calls can't be billed, big issue!). I also observe that most operational job roles are defined to operate at the micro-task level, so it's often just no-one's problem to take the more macro view.

I'm seeing this sort of thing right now in my current role at the University, and their realisation that there's another set of issues at play was the trigger to set up a Data Governance office and get me on board. 

Reply to Discussion

RSS

Try Our Tools

© 2014   Sponsored and Hosted by informatica   Powered by

Badges  |  Contact Us  |  Terms of Service