In order to forget something, first you need to remember it. That simple premise will cause organizations a great deal of pain as consumer privacy legislation takes effect.
The concern about consumer data privacy is at an all-time high. 70% of Europeans are concerned about the reuse of their personal data. 86% of Americans are concerned with data collection from internet browsing and how it is used to generate personalized banner advertisements. Their primary concern is how that data may be used for other purposes, or packaged and resold to other entities. With data breaches and issues such as the NSA’s collection of private data making headlines each week, it’s no wonder that consumer sensitivity is heightened.
This will present a very large problem for companies, because law makers are starting to take action. The European Union announced changes to the 1995 Data Protection Directive to take effect starting in 2014. It contains one very logical and innocent looking directive – “the right to be forgotten” which means that upon request from a consumer, an organization must delete all of their personal data. That sounds simple. It’s actually a wildly complex problem, because of the premise above – you cannot forget what you cannot remember. And most organizations aren’t particularly good at remembering their customers.
 Forrester Research. EU Regulations And Public Opinion Shift The Scope Of Data Governance
by Henry Peyret, October 17, 2013
 Perfect Storm For Behavioral Advertising:
How The Confluence Of Four Events In 2009 May Hasten Legislation (And What This Means For Companies Which Use Behavioral Advertising) By: Susan E. Gindin
 Forrester Research. EU Regulations And Public Opinion Shift The Scope Of Data Governance
by Henry Peyret, October 17, 2013
Protecting and security sensitive big data is necessary to ensure data is shared for new forms of analysis. Before the owners of that data will share it (yes, political silos still exist, and yes individuals still feel they own data and can say no to sharing it), they want to ensure it is adequately protected. Especially if they are the ones in the cross-hairs if that data is misused.
At the Data Governance Financial Services Conference last week in New York, I spoke on the issue of Confidence in Big Data. And boy, did that topic ever resonate with the audience. I spoke with a Chief Data Officer who said confidence was really the main issue she deals with – governance is all about confidently ensuring that her business users trust and protect their information. A head of governance approached me to discuss confidence in customer data; they were struggling to ensure they were confident in accurately identifying customers and households as the basis for big data analytics. There were a lot of common themes that came out of my discussions – customer data and big data, rapid integration of new data and business user self-service, how to visually display data confidence to business audiences …. but one issue dominated the conversations – privacy and security.
Ensuring privacy and security for big data, or any data for that matter, is always a top concern. Why? Well, someone might go to jail if sensitive data is exposed. Or face compliance fines. That’s always a compelling reason to act. But I heard something different at this conference. One Chief Data Officer described it this way – “Imagine you want to buy a new car and safety and security is your top concern. 10 years ago you could always decide to add a security device or alarm after you buy the car. But now, you want a system integrated with the ignition. And for safety you want front and side curtain airbags – you’re never going to install those after the fact. So the issue becomes a non-starter – you’ll only buy a car with the features already integrated. The same thing is happening at our firm. Security is a pre-requisite for big data. If we can ensure data security for sensitive information, that project will be approved over one that lacks security. It’s a non-starter for big data and analytics – no security, no data.”
That certainly makes sense. Data security is as fundamental to sharing big data for new analysis as policing is to a healthy and thriving society and economy – it’s a fundamental pre-requisite. And it offers an interesting twist on the reason to worry about privacy and security. If you want to share big data freely, combine it in new an interesting ways in new technologies such as Hadoop or NoSQL, then you need to ensure it is protected. Big data is by definition sensitive data – it’s important information about your customers, your products, your suppliers. That data must be masked when it’s appropriate to do so (good rule of thumb – if the actual data value isn’t relevant for the analysis, mask it). It must be monitored to ensure that internal users aren’t accessing it inappropriately.
Before embarking on a new big data and analytics project, make sure you’ve taken care of the fundamentals. Make sure you can adequately protect and secure sensitive data before you ask a data owner to share it.
For tips on how to protect and secure big data, check out this ebook – Top Tips for Securing Big Data Environments
Confidence in big data is highly variable. Some data sources have inherent uncertainty. So why shouldn’t you spend as much time as needed to make big data perfect? Time. You simply don’t have enough time to sort out every data irregularity, every ambiguity, every incomplete attribute. And for many big data use cases, you don’t need to. That’s why perfect is the enemy of good. In the era of big data, governance has evolved to first diagnose the usage, then prescribe the appropriate amount of governance. So the objective is not to make it perfect for every possible usage up front, it’s to make it good enough for the use case at hand.
Tony Baer of Ovum explains this in more detail in his blog post here – http://bit.ly/18bDetn.
For more information on building big data confidence check out IBM Big Data Hub
Confidence in big data is essential. Without confidence, decision-makers may not act on big data insights – which would completely negate the benefits of big data in the first place. Understanding the level of confidence in big data, and selectively improving confidence to the required level, ensures the successful adoption of big data and analytics.
Last week IBM launched new innovations in Information Integration and Governance at an event titled “Building Confidence in Big Data”. The term “confidence” really resonated with clients, analyst and press in attendance. It’s a business issue that organizations are struggling with – and their ability to understand and improve confidence is directly related to their ability to leverage big data.
One of the speakers at last week’s event was Michele Goetz from Forrester. She unveiled new research strongly linking big data success with the presence of mature information integration and governance (IIG). Organizations with mature IIG technology and practices were far more likely to be doing big data projects, and also more likely to be successful with them. One of the interesting aspects uncovered is the notion that big data is governed in ‘zones’. Certain types of big data require certain types of governance. And specific big data use cases have specific requirements for governance. This changes the old notion of governing the data once and then using it; the new approach is to understand the usage and the data, then govern to the appropriate level – that is Agile Governance. There are many other interesting conclusions and also recommendations based on surveys of hundreds of organizations, which you may download here: http://ibm.co/17DNTvS
This is an important topic in the big data market, and we plan on continuing the conversation on big data confidence. Our next conversation will be Tuesday September 17, on a webcast that Michele and I will host, entitled “Building Confidence in Big Data with Information Integration and Governance”. I hope you can join us tomorrow at 2 PM EST – you can register here – http://bit.ly/19p2WgI
Willingness to act is directly related to confidence. Low levels of confidence lead to mitigating risk in another way – by not acting boldly. For example, a chief marketing officer and has a report on his desk. The marketing and data scientist team analyzed big data sources to identify new sub-segments and life event triggers for purchase decisions. But he immediately questions the conclusion based on the data that was used. Where did it come from? How did they verify social media and external data and link it to customer records? And without answers to these questions of confidence, he is faced with a decision – should he act on this insight? He might opt to take less risk – instead of investing $500,000 in a marketing campaign, he might invest only $25,000 to test it out. The result? Timidity, lost time, and wasted opportunity. Confidence impacts the level of investment and return from big data.
As a business user, there are important requirements for confidence. First, a baseline must be established. What is the current confidence level in various sources of big data? Understanding is the basis for improving confidence, and in a lot of cases simply understanding and communicating (i.e., not improving) confidence is a huge improvement in making decisions with ‘your eyes wide open’. Second, data confidence needs to be improved – selectively. Certain big data use cases will need a higher level of data confidence (read: governance) than others. The key is to identify the usage, then apply the appropriate level of governance. Out with the old model (‘make the data perfect and then share it for various purposes’) and in with the new (‘understand how data is being used and make appropriate improvements’). Third, you need to communicate confidence. It can’t be something that is invisible to the business user. And that’s why confidence needs to be an open book – accessible when and where a business user needs it to determine whether they will take action.
IBM made exciting announcements for its InfoSphere IIG portfolio this past week at an event called “Building Confidence in Big Data.” Automated integration with Data Click ensures that data users can access and move data when and where it’s needed with just two clicks. And as that data is integrated, it can also be matched and mastered with Big Match – MDM matching running at a big data scale.
Visual context enables business users to leverage and understand confidence in their data. The Information Governance dashboard helps to visualize confidence by showing status on governance metric KPIs. Big Data Catalogue profiles metadata from a wider variety of big data sources to help data users find and utilize big data rapidly.
Agile Governance is about applying the appropriate level of governance for the use case at hand. Big data privacy and security monitors and masks big data in a wider variety of Hadoop, NoSQL, and relational systems, delivering a single security solution for the big data environment. MDM for Big Data joins MDM and InfoSphere Data Explorer, to provide an extended and dynamic complete view of important business entities.
You will hear much more about these announcements at the upcoming Information on Demand Conference Nov 3 – 7 in Las Vegas. IBM will feature a number of demonstrations of the capabilities above, as well as other solutions and future capabilities, in the InfoSphere demo room.
Confidence is clearly a topic that resonates with the market. It’s about understanding current levels of confidence. It’s about acting with greater certainty. It’s about making bigger bets. Ultimately, it’s about acting on big data insights.
To learn more about Big Data Confidence, click here www.ibm.com/software/data/information-integration-governance
I’ll be speaking with Eric Thoo of Gartner on a webcast on June 13 entitled “Data Integration Styles: Choosing an Approach to Match your requirements.” Click here to register – http://bit.ly/KLhZ7J
In the webcast, we will go into detail on three styles of integration: bulk data movement, real-time, and federation. Bulk data integration involves the extraction, transformation, and loading of data from multiple sources to one or more target databases. One of the key capabilities of bulk integration is extreme performance and parallel processing. Batch windows continue to shrink and data volumes continue to grow; and the new wave of big data puts even more emphasis on batch integration performance. Real-time integration involves replication and low-latency integration. It is often uses to syncrhonize operational databases and to power real-time reporting and analysis. Federation is a completely different approach – it leaves data in place and allows users to access it via federated queries. This style of integration is very important for operational systems and it is a cost-efficient complement to batch integration – only move what is necessary, leave the rest in place and access it as required. In the webcast Eric Thoo will provide details on each style and the uses for each.
These three styles of integration should not be independent and discrete from one another. They should share something in common – a foundation that establishes trust in information. A foundation that profiles data quality, improves the accuracy and completeness of data, tracks its lineage, and exposes enterprise meta data to facilitate integration. Client’s derive real value from a common approach to all three styles because they leverage a common foundation for information trust – common rules for data quality, meta data, lineage, and governance.
On the webcast we will explore the specific requirements for which each style is suited. If you look at the larger IT project, typically all three styles are required. For example, supplying trusted information to a data warehouse will require bulk data integration, but for specific reporting needs it may also need real-time integration, and potentially even federation to access other data sources. Building and managing a single view with MDM will again require bulk integration to populate MDM, real-time integration both to and from the MDM system, and federation to augment MDM’s business services to blend data stored within MDM and data stored in other source systems.
The common foundation of trust and governance, and the need to use multiple technologies are the keys to making a strategic choice of technology – one that you can leverage across the life of a project and into other projects that require integration.
Please join us on June 13, when we will share more details on this topic.
Register here http://bit.ly/KLhZ7J