By Jay Yeo, Data Analyst, ORI
With growing recognition for the importance of data analytics in driving success, many organizations are seeking out firms that can help them address their data goals to generate actionable insights. Often, the conversation starts out something like this:
“We are ready to be data-driven and are interested in data mining.”
“We need data cleansing.”
“We’ve got plenty of data, and we’re ready to start mining it for insights.”
While these are excellent goals, they all prompt the same question: Where does your data stand? In other words, what shape is your data in, does it require updating or cleansing before it can be productively analyzed, and where does the data need to be updated or cleansed? Without an initial data assessment, it is extremely difficult to tell what requires cleansing, whether there is enough data to run the highest-priority analyses, and whether the data is of high enough quality to generate results that will drive ROI and bottom-line impact.
Why Data Benchmarking Matters
Because of the importance of overall data health in determining the feasibility, need, and quality of data cleansing and mining, a phased approach that begins with data benchmarking is often the most efficient and cost-effective method for achieving data-driven organizational goals. Data benchmarking requires that organizations take a close look at their data and identify those fields that are essential to organizational functions and running the types of analyses that will help the organization make informed decisions. This initial step is a crucial starting point in focusing on the data that is of highest importance (or an organization’s critical fields) for a healthy and well-established data foundation. Testing these critical fields for completeness, validity, and recency—the top three indicators of data health—then provides guidance in the next steps to derive value from organizational data.
A data benchmarking assessment allows the organization to discover the state of its data in order to determine next steps. The data benchmarking exercise combined with the organization’s unique needs and goals reveals what the logical next steps are to maximize ROI for data utilization. It is not uncommon for organizations that are initially interested in data mining to find that they need to do some cleansing or gap-filling work before being able to mine their data effectively. One example of a low-effort, high-return solution revealed through benchmarking could be purging any obsolete data that can be easily isolated, which in turn can save organizations considerable time and expense as they transition to more data-driven organizational models. Using a phased approach allows organizations to progressively identify areas of weakness, find solutions where they are needed, and generate high-quality, actionable insights from their datasets with confidence. Rather than overinvest in solutions they are unprepared to take full advantage of, organizations are empowered during the initial benchmarking process to identify needs and next steps with a clear view and roadmap for how to realize their goals.
What About Unstructured Data?
So, what does all of this mean for unstructured data (e.g., data from audio recordings, open-ended survey comments, emails, and social media posts that lack a standardized format)? It doesn’t need to be cleansed or checked for accuracy in the same way, right? Can we start there and worry about the structured data later? Yes and no.
- Yes: Accuracy, completeness, and “cleanliness” are not as relevant for unstructured data. Volume of unstructured data, data sources, and what those data sources represent are, however, fundamental considerations to the possibilities of analysis and what those analyses would represent. For example, a newer organization that is just ramping up its interaction with customers or members in unstructured formats may not have enough volume to make unstructured data mining analytically helpful. On the other hand, the organization may have plenty of unstructured data, but that data is only coming from one segment. In that case, unstructured data mining can prove to be highly valuable, but stakeholders will need to stay aware of how to interpret the results.
- No: Because unstructured data mining typically requires less front-end checking and validation, it can be performed with resulting high-quality analyses before the structured data is ready to be mined to its full extent. However, unstructured data is most powerful when it is combined with structured data. For example, when open-ended feedback in a survey remains connected to the demographic information of the respondent, the insights gleaned from the analysis are far more valuable. Examining structured and unstructured data together allows stakeholders to segment certain populations, understand centers of influence, identify trends, and gain more detailed and precise insight to drive their decision making.
As more organizations commit to utilizing their data to drive strategic decision making and growth, it is essential that they have a solid data foundation—and that begins with knowing where they stand. While diving right in to structured and unstructured data mining is tempting, taking the (short) time for an initial assessment allows organizations to invest wisely and maximize their ROI for all types of data analysis.