HEXstream just launched Koios™, a new Data Quality Management solution powered by Naveego and designed to build trust in enterprise data by automatically scrubbing data across silos to eliminate duplicates and inaccuracies.
When implementing and/or maintaining analytic operations, data quality is of the utmost importance for a variety of reasons. But why?
On one hand, the answer to this question is obvious. If the data is of poor quality, the insights from analytics will be flawed. On the other hand, if people interpret data quality as needing perfect data before beginning analytics, analytics will never begin because the data is never perfect. We need to explore the tradeoff between ever-higher expenditures to squeeze the last few imperfections out of the data and the ever-lower increments to insights that flow from that last bit of data quality.
Second, many organizations assume that once data is clean, it stays clean. However, data must be checked and recleansed periodically. Companies reorganize. They merge and divest. Product lines are bought and sold. New categories emerge. People are born, grow up, and die. So do organizations. Data must reflect these changes to remain relevant over time.
Also, organizations can and should put in process changes to maximize the quality of data entering the organization, rather than cleaning it up on the back end. This TQM or six sigma approach to data quality will pay dividends, increasing the quality of the insights gleaned from the data.
For these reasons, we believe that organizations need to be concerned, not just with data quality but with data quality management. Data quality is a process, and like any process, it must be managed.
Data must always be of high enough quality that users can trust it. Without trust in the data, users will not heed the insights analytics provides.
In assessing the requirements for how high quality the data must be, we need to look to the process to which it will be put. The smaller the degree of signal one must pull from the noise, the higher the need for high quality data.
For a given application or use case, organizations need to understand what constitutes “good enough” or “fit for purpose” data. Because we are using this data for analytic purposes, some errors may be tolerated. For example, we may have purchases for suppliers we cannot identify. If they are 1% of total purchases, this may be tolerable. If “unknown” becomes 5% of total purchases, this error may be material, and the organization should attempt to discover from whom these purchases were sourced. Often to move the error rate from 10% to 1% will cost as much as to move it from 1% to 0.1%. Is it worth it? There is no easy answer, but organizations need to ask and attempt to answer this question and to get help in order to answer it.
About the Author
Will Hutchinson is the Director of the Analytics Practice at HEXstream. He is a former Distinguished Sales Consultant for Oracle and has over 30 years of experience with data warehousing and analytics. He is the author of a book on analytics and has extensive industry experience spanning pharmaceuticals, oil and gas, consumer goods, insurance, and manufacturing. Will is an expert in ROI and TCO analysis and is a polished speaker and trainer.