3 Data Integration Technologies, 1 Common Foundation
I’ll be speaking with Eric Thoo of Gartner on a webcast on June 13 entitled “Data Integration Styles: Choosing an Approach to Match your requirements.” Click here to register – http://bit.ly/KLhZ7J
In the webcast, we will go into detail on three styles of integration: bulk data movement, real-time, and federation. Bulk data integration involves the extraction, transformation, and loading of data from multiple sources to one or more target databases. One of the key capabilities of bulk integration is extreme performance and parallel processing. Batch windows continue to shrink and data volumes continue to grow; and the new wave of big data puts even more emphasis on batch integration performance. Real-time integration involves replication and low-latency integration. It is often uses to syncrhonize operational databases and to power real-time reporting and analysis. Federation is a completely different approach – it leaves data in place and allows users to access it via federated queries. This style of integration is very important for operational systems and it is a cost-efficient complement to batch integration – only move what is necessary, leave the rest in place and access it as required. In the webcast Eric Thoo will provide details on each style and the uses for each.
These three styles of integration should not be independent and discrete from one another. They should share something in common – a foundation that establishes trust in information. A foundation that profiles data quality, improves the accuracy and completeness of data, tracks its lineage, and exposes enterprise meta data to facilitate integration. Client’s derive real value from a common approach to all three styles because they leverage a common foundation for information trust – common rules for data quality, meta data, lineage, and governance.
On the webcast we will explore the specific requirements for which each style is suited. If you look at the larger IT project, typically all three styles are required. For example, supplying trusted information to a data warehouse will require bulk data integration, but for specific reporting needs it may also need real-time integration, and potentially even federation to access other data sources. Building and managing a single view with MDM will again require bulk integration to populate MDM, real-time integration both to and from the MDM system, and federation to augment MDM’s business services to blend data stored within MDM and data stored in other source systems.
The common foundation of trust and governance, and the need to use multiple technologies are the keys to making a strategic choice of technology – one that you can leverage across the life of a project and into other projects that require integration.
Please join us on June 13, when we will share more details on this topic.
Register here http://bit.ly/KLhZ7J
Two Big Data Podcasts
I recently sat down and recorded a pair of podcasts about big data. In the first I discuss the definition of big data and how new big data technology can yield new insights – listen here – http://ibm.co/Lxuals. In the second I cover the requirements for a big data platform to buid a new class of analytic applications – listen here - http://ibm.co/NFUxHr.
Questions from the Market – “How can I reduce the cost of data?”
I’m starting an ongoing series that will be based on questions I’ve been asked when speaking at the Big Data and Information Governance forums. The first question I’ll cover is “How can you control the cost of data growth?”
Is your company an “Information Hoarder”? If so, answer the five questions below and you’ll be able to reduce the cost of your data.
1 – Do you need the data that you have?
The first step is to determine whether you need all of the data in your production systems. You might be surprised by the answer. Often, up to 85% of data in production systems is old and not used. Information lifecycle management software will manage the lifecycle (i.e., the expiry date) of your information, as well as archive it to lower-cost storage alternatives. The cost savings is immediate and significant.
2- Do you know the expiry date for your data?
All data has an expiry date. Or at least it should. Most organizations do not determine the expiry date for their data. The result? They keep data indefinitely. There’s a simple rule on the TV show Hoarders – if you haven’t worn an item of clothing in the past year, throw it away. The same rule applies to data. If you haven’t accessed it recently, consider deleting or archiving it.
3- What is the cost of managing your data?
Every database has a cost. Do you know the cost of yours? There are several published performance benchmarks for relational database software. The performance of the database is directly correlated to cost. Migrating to another database may be easier than you think, as some vendors have invested in portability and migration capabilities.
4- The quality of data carries a cost
The poorer the quality, the higher the cost. Seems like an obvious relationship, doesn’t it? But not always. Cleaning address data will correct downstream errors, such as return mail costs and postage discount savings. Aside from the direct cost savings, consider the indirect cost of inefficiency. How much time do your employees spend investigating and correcting data errors?
5- Unifying fragmented data reduces cost
In the typical organization, many data entities are fragmented across dozens of systems. Customers, products, locations, suppliers, to name just a few. The fragmentation drives a cost that is not easy to detect. It’s the cost of your employees manually searching for data in multiple systems, the cost of duplicating data entry in multiple systems, and the cost of the inevitable errors that result from fragmentation and duplication of effort. Unifying data into one master data system often has far reaching cost savings implications.
There are many ways to reduce the cost of your data. If you are looking for “the low hanging fruit” and a fast ROI, then take a look at the data you have today and determine whether you need all of it. Likely the answer will be “no”, and information lifecycle management software will help you realize an immediate cost savings.
What is the Big Deal about MDM + Big Data?
Mark Beyer of Gartner spoke about the emerging relationship between Big Data and MDM at the IBM Big Data, Integration and Governance Forum. His session had a provocative title – “Big MDM?” – i.e., is MDM big data? Several interesting discussions with the audience emerged.
He highlighted the two major use cases for MDM and Big Data integration – extracting master data from big data, and using MDM as a ‘starting point’ as you mine big data (see this recent blog post – http://bit.ly/HOJJFI). Organizations could potentially accelerate their initial MDM implementations by extracting master data from previously untapped big data sources. For example, a company may want to analyze SEC filing documents for risk exposure, to understand their organization customers, their financial health, and key individuals at those companies. Implicit is the notion of master data, to determine unique records for organizations and people, and the relationships among them. The danger in big data projects lies in not recognizing the requirement for MDM – and treating data quality, matching, and storing unique records as a “one off” tactical task. It isn’t.
The second use case features MDM as a starting point, or as Mark described it, a “search index” for big data. Start with master data concepts and then analyze new sources of data for specific master data records. Don’t analyze all customers, analyze the most valuable ones. Don’t analyze all of your products, analyze the most profitable ones. This may initially be expressed entirely as an “analytics” requirement from business owners. I recently visited with the CIO of an entertainment and betting company who’s CEO set a direction to “analyze social media to understand potential online bets their customers might make.” Wait a second – what does that mean? Which customers? What constitutes a “betting event”? And how will you respond in time to capture that opportunity? That company realized they didn’t have the answer to the first and most fundamental question – who are their customers? There’s no point in analyzing all available social media feeds and then determining who your customers are. There are 2 billion internet users globally. How many customers do you have – less than 2 billion? Doesn’t it make sense to start the other way around? Know what you’re looking for before you start looking.
In both scenarios, the initial big data analytics project may require master data, only many organizations do not realize it.
Here are 3 clues that indicate when you should integrate MDM and Big Data:
- You are searching and matching for the same entity types over and over – If your big data project requires you to know whether a social media blogger is a customer, and you will run this same determination every time an interesting social media post is detected, then you have a master data problem – you need to know your customers.
- You are performing targeted analysis, not an aggregate analysis – When you are looking for particular product feedback to respond to isolated incidents vs. general sentiment towards your brand, or you are looking for a particular customer’s multi-channel service experiences vs. tracking the general service levels, then you have a master data problem – you need to know specific customers and products in order to guide your big data analysis.
- You want to combine the analysis of multiple master data domains from new big data sources – If your big data use case involves matching multiple data domains and gleaning new insights from big data sources, you likely have an MDM requirement. For example, telecommunications companies are increasingly interested in mobility – understanding the location of mobile devices and the potential implications (selling new products, proactive service alerts, etc). In order to realize that use case, the telco will need to understand unique accounts, devices, customers, households, and locations. That’s a multi-domain MDM problem to be sure, and MDM can be a great starting point for big data analytics.
The answer to Mark’s provocative title “Is there Big MDM” was no, there isn’t an MDM technology that stores all big data and there shouldn’t be – that’s what a big data platform is for. But there is absolutely a need to integrate the two, and it’s often overlooked. Big data analytics will certainly encompass the most important concepts within your organization – customers, products, prospects, accounts, locations, suppliers, among others. Those are all master data concepts – and therefore MDM is a good starting point for many big data projects.
Don’t reinvent the wheel. Make sure that big data leverages your existing enterprise technologies, MDM being just one of them.
Five Tips for Selling Big Data to your Business Sponsors
Big data is perhaps the hottest topic in IT right now. But do your business users know what it is?
When I ask business executives what big data means, I get a wide variety of responses, from ”I don’t know” and “Bigger databases?” to “Social Media?”.
Here’s the best answer I’ve heard “I don’t particularly care because it’s an IT technology. My job is to solve business problems.”
Exactly.
Business executives care about solving business problems. The big data market is born out of the technical community – innovations from technologists to address the growing volume, variety, and velocity of data. And predictably, the descriptions of big data technology are predominantly technical. If you want to move big data from a research war room to the boardroom, you need to illustrate big data as a business opportunity.
Based on conversations I’ve had over the past few months, here are five tips for selling big data inside your organization.
1 – Build a Business Case … A different type of business case
While this is an obvious step, there are some challenges with big data. This is a new market and there are few public proof points or metrics to leverage. So you’ll have to create much of it from scratch. A business case for big data could be exhaustive, so you should focus on a single problem and only a handful of metrics. Quantify four or five items, and simply list the rest. The qualitative points will sell themselves. You’ve really just scratched the surface with the metrics you measure, but its enough to prove there’s something worth pursuing.
2 – Evangelize Big Data …. In business terms
More often than not, an IT person makes the case for big data technology. And IT people tend to add lots of details, usually very technical details, to explain why a new technology is necessary. Resist that temptation. Make an evangelization deck that is no more than 7 slides. Explain how your company will benefit from big data and the business opportunities it creates. Include a very simple slide on why big data technology is different from other technologies you already have, the business case, and next steps. Remember, the objective isn’t for you to pitch this deck; the objective is for business people to embrace it and include it in their plans. Make it business-friendly.
3 – Identify a sponsor …. with some moxie
Here’s the challenge with big data technology – it can solve hundreds of business problems and you could identify any or all of your business executives as potential sponsors. So where do you start?
Start with a relevant and pressing business issue that clearly demonstrates the use of big data technology. Then, evaluate the business executive sponsor. You’re looking for someone dynamic, who understands the business and believes that technology can drive competitive advantage. Above all, you want someone who will be a change agent.
After identifying that individual, book a short meeting to review your evangelization deck. Convey your idea of how new technology can address one of his or her strategic priorities. If you’re successful, your evangelization deck will become his or her evangelization deck.
4 – Capture metrics …. and use them to tell a story
Many projects identify as many metrics as possible for ”shock and awe”. Resist that temptation. Instead, identify only a few metrics that you will measure. Focus the business community on tracking a handful of truly meaningful metrics from the business case. I’ve seen many successful programs gain momentum by measuring just one metric. The key is reporting on it often – so it becomes engrained in the business community.
Even more important – tell a story. If you are using big data technology to improve fraud detection, tell one story of detecting a fraudulent incident that would have gone undetected previously. People will remember the stories long after they forget the numbers in your business case. Stories sell initiatives.
5 – Put a face on the big data opportunity
Business executives can’t see big data. And it’s hard to get passionate about abstract concepts. You may need to visualize the problem and the opportunity. Consider an internal demonstration of big data technology. If your first project is social media analysis, then do a demonstration to analyze your social media channels. If it’s multi-channel customer service analytics, then analyze your data and show what new results will occur. A picture is always worth a thousand words.
Emerging technologies don’t sell themselves. Big data technology has huge potential to change the way you do business. In order to drive adoption in your organization, you must explain and evangelize big data to your business community – in business terms.
“Is MDM Ready for Big Data”?
Last week, after I presented at the Gartner MDM Summit on MDM and big data, one person asked me whether MDM was ready for big data (meaning, is it ready to store large volumes of any variety of data). MDM isn’t meant to be a big data system – it will never store all social media data, transactional data, etc. MDM is meant to be an operational, structured repository of key enterprise data entities – customers, households, products, locations, and many others. But there will be more and more use cases that require MDM to integrate with big data technologies. In order for MDM to work with big data systems, there are several requirements for each.
Let’s start with the requirements for MDM. An MDM system must be able to store social media profiles and relate them to customers (e.g., Facebook and twitter accounts stored on the master customer record). MDM must also be able to store profiles for any big data source that needs to be linked to a master record. Examples include: account IDs to link transactional big data to customer and account records, mobile device IDs to link mobile device data and real-time location data to a customer record, among others. The MDM system must also be able to store preferences for each big data source. Does a customer want you to analyze their tweets? Or their Facebook profile? MDM must track the customer’s preferences and consent for certain types of communication and interaction. MDM should relate many-to-many relationships between customers and profiles – for example, a household is related to a single social media profile on a photo-sharing website (one SM profile for many customers who belong to a household). This enables MDM to effectively feed a big data application with relevant master data and big data links.
MDM must also be able to store the output from big data analytics. Intent to purchase, next best action, customer churn alert flags, negative customer sentiment – these are all attributes that should be stored in MDM. Insights from big data should be available to multiple operational channels (for example, if you detect that a customer is angry with your company, then you want all channel personnel to know that fact, no matter which channel the customer interacts with). The MDM system should also have the capability to proactively detect events and send event notifications, triggering action in business applications and enterprise processes as necessary. MDM must be an active participant in big data analytics.
There are implications for big data technology as well. The big data system must be able to interact with MDM. Whether persisting transactional data, or analyzing social media data, or analyzing streaming call detail data off a network – the big data system needs to understand the master view of customers and products. There’s no point in the big data system re-inventing the wheel and trying to determine unique records and identities. Therefore, big data applications need to be MDM-aware. They should obtain master data from MDM either in batch load or in real-time if necessary. This, of course, has a return implication for MDM – it must be capable of integration with big data systems via batch and possibly real-time SOA as required.
I’ve seen many use cases for MDM and Big Data working together – social media analytics to predict customer churn or intent to purchase, mobile network analytics to make real-time location-specific product offers, and multi-channel interaction analysis to predict and prevent customer churn. In future blog posts we’ll explore these use cases via customer scenarios.
