A step-by-step approach
Photo by Viktor Forgacs through Unsplash.
Data assets (or products) — a set of prepared data or information that is easily consumed for a set of identified use cases — are the hype in data management land. Being able to identify, build, and govern an individual data product is one thing, but how to go about this at an enterprise level? Where to start?
Data enablement leaders, and specifically Chief Data Officers, grapple with this mobilization challenge. In this point of view, we’ll discuss how one can take a portfolio approach to data assets. Figure 1 below presents the stepwise approach, and the remainder of this article will elaborate on the 7 steps. Throughout, we explain both the approach and methodology, mixing in examples as we go.
Figure 1 — 7-step approach to start managing a portfolio of data assets. Image by the author.
In various real-life examples I have followed this approach, but to avoid any suspicion that the data comes from any given client, while at the same time showing how generative AI can be actionably used when prompted rightly, I’ve used ChatGPT 4.0 to generate the examples. The full chat is available here.
Step 1: Use Cases & Impact
The first step is to identify the data-driven use cases that matter for your organization. You don’t have to do this for the entire enterprise all at once — you can start with one domain or business line, and this might even be recommended.
Use cases are the specific mechanisms through which the overall organizational strategy can be implemented. Data strategy and data governance drive no value in and of themselves — they only do so to the extent that broader strategic goals are achieved. Hence, use cases must be the first step.
There are various ways to go about this. You can internally build an inventory of use cases by interviewing business and analytics leaders. For your sector, you can cobble together an overview of use cases from external sources. Most success is usually to be had with a hybrid approach — bring in an external list of use cases, and then refine this list with internal leaders.
As explained above, for the purposes of this article, I’ve used ChatGPT 4.0 to build the inventory, which is presented in Figure 2 below. For example, under Finance & Accounting, the use case fraud detection and prevention uses real-time analytics and machine learning models on a combination of customer and transaction data to recognize patterns and identify suspicious events. Or under Marketing & Sales, as part of marketing mix modeling, the historic relationship between marketing efforts and sales performance is investigated to optimize the allocation of marketing budgets and usage of channels and tactics.
Figure 2 — Overview of 90 data-driven use cases across 14 business and functional areas. Data generated by ChatGPT 4.0, image by the author.
Having the use cases is not enough — we need a sense of how important they are. There are 4 critical ways in which use cases can drive value:
Increase revenuesReduce costsEnhance customer experienceMitigate risk
Some list “drive innovation” as a 5th value driver, but in my view that’s just a matter of timelines, because any innovation itself is to eventually drive value through the 4 above-mentioned mechanisms as well.
Now, in Figure 3, we have an overview of marketing-related use cases and the typical “top line impact” that is associated with them. In fact, for the use case of marketing mix modeling (“MMM”) that we had just introduced above, we see “1 to 2% top line impact.” If your company would average $1 billion in revenues, these estimates suggest that marketing mix modeling can drive $10–20 million on top of that.
Figure 3 — A set of marketing use cases with the impact they typically have on overall enterprise revenues. Source: Identifying data-driven use cases with a value driver tree (coauthored by the author).
At the end of step 1, you have a set of use cases alongside their estimated impact on the organization.
Step 2: Required Data
In this step, we investigate what data is needed to power the identified use cases. The first step is to define what the critical data inputs are for the use cases. For example, for product line optimization under Operations, required data includes production volume data, machine performance logs, and raw material availability. Or for employee turnover prediction under Human Resources, data is required from employee satisfaction surveys, exit interview feedback, and industry turnover rates.
Once you have a partial or complete list of use cases, respective SMEs or process owners can help clarify what data is needed. As your list of critical data inputs grows, you will reach a point where you can start grouping the data into data types or domains. Within sectors, and to a lesser extent even across sectors, these data types and domains are actually quite stable. Data domains that are almost always applicable include Customer (or the equivalent of the customer, such as student, patient, or member), Employee, and Finance, as most organizations serve some group of people, have employees to do so, and need to manage their budgets. Some other domains, like Supply Chain or Research & Safety Data are more specific and may only apply if the organization manages a physical supply chain of products and materials.
Figure 4 — Overview of data types and domains. Data generated by ChatGPT 4.0, image by the author.
Figure 4 above shows what the result could be. There, 12 data domains are presented with about 100 sub-domains. All the organization’s data can be mapped back to the types that are listed here. For example, Campaign Spend data under Marketing & Sales may include data on the initiatives and costs of digital advertising, traditional media campaigns, and sponsorships, and Sensor Data under Operational may include data from temperature sensors that are placed in storage areas and vibration sensors to monitor machinery health in factories.
Once you start to identify the critical data inputs for use cases, and to map those critical data inputs to data types or domains, you can start building a matrix as in Figure 5. Above, we had the example use case of product line optimization, which is mapped to the Operational data domain, as indeed it requires operational data. In Figure 5, use cases are mapped to the broader data domains so that it allows for visualization here, but in real life, you could (and should) map the use cases to the underlying, more granular sub-domains.
Figure 5 — Mapping of data-driven use cases against data types. Data generated by ChatGPT 4.0 and refined by the author; image by the author. Full resolution image available upon request.
A panoramic understanding of just this — the key use cases mapped against the data types they require as inputs — is already invaluable for developing a data strategy and prioritizing specific data domains… but we’re going to take this a lot further and make it even more actionable.
Step 3: Data Sources
Before we take the step to identify source systems based on the (logical) data requirements from Step 2, let’s take one group of use cases and assess the data they require. Figure 6 below shows an overview of Marketing and Sales use cases, and the critical data they rely upon. This is in line with what is shown in Figure 5, just at a higher level of granularity.
Figure 6 — Use cases for Marketing and Sales, and the critical data they require as inputs. Data generated by ChatGPT 4.0, image by the author.
For example, we see that for the first use case of customer segmentation and targeting that data is needed on Customer Demographics. For the company in question, that data is stored in a physical system called Global CRM. Similarly, the Purchase History data that the same use case needs is stored in two systems: E-commerce Transaction History and Retail Point of Sale System.
And so on and so forth. If we take all the critical data inputs from Figure 6 above and we identify the source systems, we get the table of Figure 7. As you can see, some data sources contain multiple types of critical data. For example, the Global CRM Master contains Customer Demographics, but also Customer Preferences, Customer Feedback, and Customer Segmentation Data.
Figure 7 — Critical data inputs for Marketing and Sales use cases mapped against the source systems of that data. Data generated by ChatGPT 4.0, image by the author.
Step 4: Use Cases vs Sources
We identified the required data for use cases (Step 2) and then mapped that against source systems (Step 3). The next view that can now be created is the mapping of use cases against source systems, which for Marketing and Sales is shown in Figure 8 below.
Figure 8 — Mapping use cases against source systems. Data generated by ChatGPT 4.0, image by the author.
Here, dark green signifies that the data is critical for the use case, and light green means that it is ‘nice-to-have’ or supporting. For example, for customer segmentation and targeting, data from the Global CRM Master is critical, but data from Social Media Analytics is ‘nice-to-have’.
But we already know a lot more about the use cases. In fact, in Step 1 above, the very first thing we did was to identify the use cases and the incremental revenues these use cases could drive. This now enables us to say something about the value creation that is dependent on specific data sources. Because if we know that a given dataset is critical for 3 uses cases that respectively are estimated to drive 2, 3, and 5 million dollar in incremental revenues, we can state that 10 million dollars in revenues is dependent on this dataset.
You cannot complete this exercise in isolation — you’ll need to engage the respective use case and business process SMEs and owners. It might take some time to identify these people, but once you’ve found them, you will typically find them to be cooperative because they have a stake in ensuring that the use case is successful, and therefore to clarify what data is critical and the impact that it can drive.
As you go along, you can start building an overview as you can see on the right-hand side of Figure 8, where the top-line revenue impact is estimated across all data sources critical for Marketing and Sales use cases. Beware here for double-counting and make sure you explain and qualify the numbers appropriately; for example, if a given use case with a value creation potential of $1 million depends on 2 data sources, you cannot say that the two data sources together drive $2 million.
Step 5: Asset evaluation
In the previous step, we mapped use cases and the value they drive against a set of data sources. Now we know that these data sources (can) drive value, it means that they have inherent value for the company, and hence that they can be considered data assets.
While Figure 8 is very insightful already, it does not yet enable us to prioritize certain data assets (and the investment in them) over other ones. If a given data asset can drive a lot of value, but it is already in place and “fit-for-purpose,” no further action may be needed.
Figure 9 presents four data asset assessment statuses, ranging from “fit-for-purpose” to “missing or large gaps,” that enable a consistent evaluation of the data assets. Here, fitness-for-purpose should be interpreted broadly. On the positive end of the spectrum, it means that the right data is readily available, at the right granularity and timeliness; it is of a high quality and reliable, and the source system is never down. On the other end, it means that either the data asset is not there at all, or if it is, that the data is hugely deficient, unreliable, and/or incomplete.
Figure 9 — Data asset assessment values. Image by the author.
We now have the tools we need to build a so-called heat map, where the “hot areas” (i.e., the red or amber parts) signify opportunities for value creation, because that’s where use cases cannot rely on the data they critically need — see Figure 10 below.
Figure 10 — A “heat map” of use cases against data assets. Image by the author.
Step 6: Asset Prioritization
The next step is to prioritize the data assets based on everything we now know about them. Figure 11 presents the same heat map that we had in Figure 10, but I added back in the revenue impact and the number of dependent use cases. I then reordered the data assets, organizing them in descending order based on the total revenue impact they generate.
Figure 11 — Heat map of use cases against data assets and the value they drive. Image by the author.
Now it becomes clearer what data assets could be prioritized for enhancements and investments. For example, it is clear that the Global CRM Master is a big problem as it not optimally feeding 9 (!) use cases, at an impact of over $13 million. Various data assets, like Instagram Insights, Customer Support Portal, and Google Ads Data are fit-for-purpose and hence don’t seem to require remediation. And then we have a few at the bottom, like Shopify Analytics and News Aggregator Platform that may not be in place, but that only support 1 use case each, with a limited impact.
If you were a Chief Data Officer and this panorama reflected the data assets and use cases of your organization for a given domain, an impact-driven roadmap opens itself up for you. There is a clear opportunity to take one or two data assets, and use them as a strategic location to enhance the governance of strategically important data. This can be used to embed and operationalize various data governance capabilities like data ownership and stewardship, metadata management, and data quality, because each of these is critical to ensure that data assets are governed adequately.
Step 7 : Portfolio of Data Assets
It is a commonly cited fact that the expected tenure of data leaders like Chief Data Officers is short, on average less than 2.5 years. To a large extent, this is explained by the fact the CDOs struggle to achieve meaningful business impact in the short- to medium term.
That is exactly why the approach outlined in this point of view is so powerful — if you prioritize data assets using the logic presented in steps 1–6, you are almost guaranteed to generate an impact. And as you start with use cases and their impact, you engage the business and functional areas from the get-go, and therefore avoid falling into the trap of “doing data for the sake of data,” which will go a long way to avoid the perception that data governance is a cost and hindrance to the business.
You’re not done here. The data assets that you have identified are analogous to the properties in a real estate portfolio — you need to actively manage them, ensuring that they are kept up, that data users continue to be satisfied, that new requirements are incorporated as they pop up, and that the value generation is not assumed but explicitly tracked over time.
Figure 12 below shows the data asset portfolio dashboard for the organization we analyzed in this point of view. It shows the number of data assets that have been certified, the number of use cases mapped against them, and the value created through incremental revenues and risk mitigation.
Figure 12 — A dashboard for data assets. Image by the author.
In the center, you see a graph that tracks the number of certified data assets over time and, more critically, the number of enabled use cases and the associated impact expressed in revenues. This is key for CDO career longevity, to be able to evidence the value that is created through targeted data enablement and governance activities.
At the bottom, you can see a pipeline view of the data assets. Some of them are being pushed through a structured activation lifecycle, while other are live already. You’ll see that the Global CRM Master that we investigated earlier has indeed been prioritized — it is currently in the “development” phase.
Anecdote from the marketplace
Photo by Lalit Kumar through Unsplash.
As mentioned at the outset of this point of view, I’ve used and refined this approach working with multiple companies across Europe and the US, and across banking, insurance, retail, technology, and manufacturing.
In one example in the manufacturing sector, we followed a slightly altered version of the 7 steps outlined here. Given that it was a complex, global company, identifying use cases across the entire organization was not feasible. Instead, we picked one business domain as our primary focus, namely the commercial division, and then the sub-domains of marketing and sales (not unlike the scope of use cases in step 3 above).
We identified a set of ~30 use cases, most of which had already been defined for other purposes. We executed a light, accelerated version of steps 2–4, to identify the required data and the corresponding sources, and map use cases against the sources. We skipped ahead to step 5 and engaged the use case owners and SMEs, asking them whether or not they had access to the right, fit-for-purpose data. If not, then what data or source was missing — what was the issue?
Fairly quickly, we settled on a set of 8 use cases that struggled in terms of the data they needed, and discovered that 2 specific data sources were a problem for 6 out of 8 of these use cases. We didn’t boil the ocean any further, and got to work. Together with the central data team and the commercial team, we aligned on owners for the 2 data sources, we assessed them against a set of formal certification criteria, and drafted a plan to address the gaps.
Fast forward a few months, and the first data asset had been enhanced and certified to meet the needs of the documented use cases. At the moment of writing, the exact impact was yet to be measured (as it takes time for the impact to be realized), but initial anecdotal evidence suggested that marketing effectiveness may have increased by high single or even double digits. In any case, the CDO in question was able to introduce and refine the approach, put a modest win on the board, and kickstart a broader roadmap with additional assets, use cases, and domains.
Good luck!
Building and managing a data portfolio isn’t necessarily easy or fast, but it is worth the effort. I hope that the steps outlined here are of use to you. I’d love to hear how it goes, so if you have feedback or your own stories to share, feel free to drop them in the comments.
Safe travels on your data asset enablement journey!
How to Build and Manage a Portfolio of Data Assets was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.