Does your data have a passport?
A passport does many things – it establishes unique identity, origin, prior destinations, and ultimately it helps determine entry to the destination. For a traveler, it is the primary document to establish trust.
Only 1 on 3 business executives trust the information used to make decisions. 1 in 3. Many research papers cite data growth as a challenge, but perhaps the greater challenge is to understand and trust the growing number of data sources. The growth of sources, points of integration, and complexity further drives the need for a “data passport”. Does your data have a valid passport?
Here are 4 questions of trust that a data passport will answer.
1 – Where did this data come from?
The data passport must identify origin, one of the pillars of trust. Business users may trust an address from a billing system over a marketing system. You’d think this would be easy – you could simply ask where the data came from in a certain report. But … who would you ask? IT? And is IT really staffed to handle “passport questions”? Think about a real passport; if a customs officer had to call your country’s consulate to verify your identity, how efficient would that be? Just like a real traveler, data needs to carry its passport with it.
2- Where has this data been?
Like a real passport, a ‘data passport’ needs to have stamps to indicate where the data has been on its trip. In other words, to document the transformations (standardization, verification, matching) and combinations (combining data from system A with system B) that have occurred since the data left the point of origin. This is another pillar of trust – if the data has been standardized and verified, the business user will trust it more.
3 – What is the data profile?
Real passports have notes, and customs officials also maintain notes on an individual. Have they previously not claimed goods? Have they travelled to watch-list countries? The same concept applies to data. Establish a quality profile for each point of origin (source system). Source profiles are a pillar of trust, and help you combine data from multiple sources (survivorship); when the same data comes from more than one source, which one do you trust?
4 –Admitted for entry?
Ultimately a passport helps customs determine who may enter a country. The same is true of a data passport – it determines which data should be permitted for entry, and usage, in enterprise applications and data warehouses. But this is when the analogy breaks down. Once you enter a country, you put your passport away. Each citizen that you encounter determines whether they trust you without seeing your passport. But a data passport has a life after ‘crossing the border’ – the data passport should be surfaced to each and every business user to foster trust. When a business user is reading a report, running an analytic query, or viewing data in an enterprise application, they should be able to access the data passport. Business people want to know where the data came from and what happened to it. Trust isn’t established at delivery. Trust is established during usage, when a business user accesses the data and decides whether they trust it. That is why a data passport needs to be a living document – accessible on demand to end users to establish trust.
A data passport may be an interesting analogy, but is it anything more than a concept (that requires a lot of manual work)? It is. Technology can help you manage the data passport and improve trust in information. Data integration and quality technology discovers, profiles, cleanses, and delivers data. With integrated metadata management, it can record the point of origin, the transformations and combinations, the glossary of business terms, and the delivery of data. It can also surface that ‘data passport’ via services APIs (SOA), so that it may be consumed by business users on demand.
A data passport is the foundation for information governance. If the fundamental promise of governance is “Business users will trust information”, then a data passport, or enterprise metadata management, is a mandatory requirement for any governance initiative.
Join me on an upcoming webcast when I will present with Gartner’s Eric Thoo on Data Integration Styles: Choosing an Approach to Match Your Requirements.
Register here http://bit.ly/KLhZ7J for the Gartner webcast on June 13
Tags: Data governance, data lineage, data passport, data quality, data trust, etl, Information governance, metadata, metadata management
About David Corrigan
I’ve spent my entire career in industry applications and Information Management technology. I’ve always enjoyed solving abstract problems and that’s exactly how I view Information Management. I’ve helped hundreds of companies understand that they need a strong, underlying foundation for information management to achieve their business objectives. I’m currently responsible for product marketing for the IBM InfoSphere (Information Integration and Governance) and big data portfolios. I’m a frequent speaker at industry conferences and webcasts on both of those topics. Follow me on Twitter @DCorrigan or on LinkedIn at http://ca.linkedin.com/pub/david-corrigan/3/aa3/92. Aside from Information Management, some of my interests include photography, chess, writing, and soccer (TorontoFC and Manchester United fan), and I currently live in Toronto, Canada. The opinions expressed in this blog are mine, and not necessarily those of IBM.2 Responses to “Does your data have a passport?”
Leave a Reply Cancel reply
Recent Posts
Blogroll
Categories
Archives
Twitter Updates
- At #edw13 - 60% of orgs say they need a Chief Data Officer to govern #bigdata, less than 7% have one today @IBMbigdata @IBM_InfoSphere 2 weeks ago
- @forr_mgoetz 'don't let your hadoop sandbox become a sandtrap - data needs context and governance to be useful' @IBM_InfoSphere @IBMbigdata 1 month ago
- @forr_mgoetz - top performers link big data to the business by providing context to make decisions @IBM_InfoSphere @IBMbigdata 1 month ago
- Big Data: What’s Your Plan? It should involve Governance - information-management.com/news/Big-data-… 1 month ago
- #gartnermdm @ted_friedman "38% of orgs don't know the cost of bad data to their org - unacceptable' @IBM_InfoSphere #DataGovernance 2 months ago
An important implication to your reflection and insights David, I would call out—Getting data to where it needs to go is not enough — there is a critical need to ensure fitness for purpose. And there is synergy between delivering data really well—solid data integration, and delivering data that business can trust—high quality data. Organizations may make and risk the choice to pursue their data integration activities without a focus on data quality. Observations from my interactions with enterprises in the recent year, there is increasing interest and recognition for the importance of data quality and expanding focus and strategy for data delivery to include data quality competence. On the other hand, where there is poor synergy between data integration and data quality, the pain associated with wrong decisions and poor productivity — the battles over whose numbers are correct, and the significant effort required to work around incomplete and inaccurate data — continues. Many end users have already use transformation logic and business rules from their data integration tools to mimic data quality functions such as matching, cleansing and standardization. Data governance will increasingly become a required focus in various data integration efforts.
Hi
I totally agree with this article, I have worked on several large Data Migration projects over the years all in the Financial industry. At the moment I am acting as a Data Quality Architect, on a Solvency ii project, and we are using the complete suite of InfoSphere Tools, FastTrack, Business Glossary, Information Analyzer, DataStage and QualityStage.
I am also responsible for developing Java Tools using the Http API and the REST Api to integrate the content and results from InfoSphere Analyzer and Business Glossary, trying establish the accurate Data lineage(history trail) of the data before we receive requires a lot of effort and time interacting with Data Producers and Data Consumers.
IF the data had sufficient embedded metadata, containing the detail of where it originated and every time it had been changed, exported, imported that would have been fantastic, it would also have been considerably easier to implement some of the current mandatory requirements.