Does your data have a passport?
A passport does many things – it establishes unique identity, origin, prior destinations, and ultimately it helps determine entry to the destination. For a traveler, it is the primary document to establish trust.
Only 1 on 3 business executives trust the information used to make decisions. 1 in 3. Many research papers cite data growth as a challenge, but perhaps the greater challenge is to understand and trust the growing number of data sources. The growth of sources, points of integration, and complexity further drives the need for a “data passport”. Does your data have a valid passport?
Here are 4 questions of trust that a data passport will answer.
1 – Where did this data come from?
The data passport must identify origin, one of the pillars of trust. Business users may trust an address from a billing system over a marketing system. You’d think this would be easy – you could simply ask where the data came from in a certain report. But … who would you ask? IT? And is IT really staffed to handle “passport questions”? Think about a real passport; if a customs officer had to call your country’s consulate to verify your identity, how efficient would that be? Just like a real traveler, data needs to carry its passport with it.
2- Where has this data been?
Like a real passport, a ‘data passport’ needs to have stamps to indicate where the data has been on its trip. In other words, to document the transformations (standardization, verification, matching) and combinations (combining data from system A with system B) that have occurred since the data left the point of origin. This is another pillar of trust – if the data has been standardized and verified, the business user will trust it more.
3 – What is the data profile?
Real passports have notes, and customs officials also maintain notes on an individual. Have they previously not claimed goods? Have they travelled to watch-list countries? The same concept applies to data. Establish a quality profile for each point of origin (source system). Source profiles are a pillar of trust, and help you combine data from multiple sources (survivorship); when the same data comes from more than one source, which one do you trust?
4 –Admitted for entry?
Ultimately a passport helps customs determine who may enter a country. The same is true of a data passport – it determines which data should be permitted for entry, and usage, in enterprise applications and data warehouses. But this is when the analogy breaks down. Once you enter a country, you put your passport away. Each citizen that you encounter determines whether they trust you without seeing your passport. But a data passport has a life after ‘crossing the border’ – the data passport should be surfaced to each and every business user to foster trust. When a business user is reading a report, running an analytic query, or viewing data in an enterprise application, they should be able to access the data passport. Business people want to know where the data came from and what happened to it. Trust isn’t established at delivery. Trust is established during usage, when a business user accesses the data and decides whether they trust it. That is why a data passport needs to be a living document – accessible on demand to end users to establish trust.
A data passport may be an interesting analogy, but is it anything more than a concept (that requires a lot of manual work)? It is. Technology can help you manage the data passport and improve trust in information. Data integration and quality technology discovers, profiles, cleanses, and delivers data. With integrated metadata management, it can record the point of origin, the transformations and combinations, the glossary of business terms, and the delivery of data. It can also surface that ‘data passport’ via services APIs (SOA), so that it may be consumed by business users on demand.
A data passport is the foundation for information governance. If the fundamental promise of governance is “Business users will trust information”, then a data passport, or enterprise metadata management, is a mandatory requirement for any governance initiative.
Join me on an upcoming webcast when I will present with Gartner’s Eric Thoo on Data Integration Styles: Choosing an Approach to Match Your Requirements.
Register here http://bit.ly/KLhZ7J for the Gartner webcast on June 13
About David CorriganI’ve spent my entire career in industry applications and Information Management technology. I’ve always enjoyed solving abstract problems and that’s exactly how I view Information Management. I’ve helped hundreds of companies understand that they need a strong, underlying foundation for information management to achieve their business objectives. I’m currently responsible for product marketing for the IBM InfoSphere (Information Integration and Governance) and big data portfolios. I’m a frequent speaker at industry conferences and webcasts on both of those topics. Follow me on Twitter @DCorrigan or on LinkedIn at http://ca.linkedin.com/pub/david-corrigan/3/aa3/92. Aside from Information Management, some of my interests include photography, chess, writing, and soccer (TorontoFC and Manchester United fan), and I currently live in Toronto, Canada. The opinions expressed in this blog are mine, and not necessarily those of IBM.
- At #edw13 - 60% of orgs say they need a Chief Data Officer to govern #bigdata, less than 7% have one today @IBMbigdata @IBM_InfoSphere 2 weeks ago
- @forr_mgoetz 'don't let your hadoop sandbox become a sandtrap - data needs context and governance to be useful' @IBM_InfoSphere @IBMbigdata 1 month ago
- @forr_mgoetz - top performers link big data to the business by providing context to make decisions @IBM_InfoSphere @IBMbigdata 1 month ago
- Big Data: What’s Your Plan? It should involve Governance - information-management.com/news/Big-data-… 1 month ago
- #gartnermdm @ted_friedman "38% of orgs don't know the cost of bad data to their org - unacceptable' @IBM_InfoSphere #DataGovernance 2 months ago