The value of a well-implemented data warehouse to a business can be substantial. Potential benefits can include an enterprise wide, single version of the truth, more timely and accurate reporting, more accurate data, and the discovery of non-obvious business trends. Needless to say, with these sorts of benefits being touted around, the data warehousing trend is continuing, with more and more companies using the technology to deliver management information for the enterprise. Indeed, a mere 4% of the Conspectus survey respondents (see previous article) have definitely ruled out the implementation of a data warehouse in the future.
The survey shows a range of business benefits which warehouse projects were tasked with meeting - from improving customer service and the efficiency of the sales and marketing forces to improving reporting and data accuracy. The vast majority of the respondents indicated that these benefits had been met. However, further analysis suggests that the gap between the anticipated and actual benefits delivered by the data warehouse still exists, and on comparison with last year's survey, this gap has not narrowed. When asked about benefits such as data standardisation and systems integration, the level achieved in practice fell way below the users' expectations. In part this might be due to the amount of hype surrounding the subject.
However, the value a data warehouse provides to its users can be directly linked to the quality of data stored in it, and not surprisingly, the one common issue running through the survey was that of data quality. When asked how they would do things differently, a number of the interviewees said they would invest more time in addressing the data quality issues, and a number of them now have ongoing data quality improvement programmes.
Single View
The ultimate goal in developing a data warehouse is to create a single, consolidated view of data from an assortment of disparate systems. Typically, the data stored in these systems is held in a variety of different formats and can be populated with blank, invalid or incorrect entries. Inevitably, this presents a major problem when attempting to match records (only one quarter of the respondents had managed to achieve a single view of a customer) or resolving discrepancies with conflicting, duplicate data items.
A general perception is that data cleaning improvements should be performed at the data entry stage. However, it is worth pointing out that only a few specific problems can be addressed at this point, as inaccuracies may not be apparent in isolation. In contrast, most issues can be resolved once the data has been loaded into the warehouse and comparisons made with the same record from other source systems. By using the warehouse to identify and clean data anomalies, it should be possible to put a confidence factor on the value of the data in the system which can help the company to make better informed business decisions.
The data quality issue could also help to explain why data mining is still used relatively rarely despite its obvious benefits (only one third of respondents are using data mining). One of the key benefits of data mining tools is their ability to trawl through huge amounts of data and identify non-obvious business patterns. Until the quality of data held in the warehouse is of a sufficiently high standard, any patterns or trends identified by a tool would need to be carefully validated for business critical decisions to be made. Among the companies using data mining tools, all used analogue/rule based systems (none used AI/neural network based tools) which allow users to determine how the tool had reached its conclusions and help them to validate the results.
Progress
The survey also highlights some positive improvements. One of the most telling results is that two thirds of those companies interviewed said they do not have any problems analysing their data, which suggests that they now have fast and timely access to the data in their warehouses.
It is also interesting to note that 50% of respondents said that the cost of developing their warehouse met their initial expectations - and with the average cost of developing a system around £1.25m, the technology is not now the sole preserve of large multinational companies as it used to be.
Summary
The survey results suggest that a number of the issues which were prevalent in the early days of data warehousing technology are now being resolved (eg high development costs, the ability to handle large volumes of date). This is to be expected as data warehousing solutions mature over the years. However, the problem of bad data quality continues to persist, and until such time as companies address this issue, the potential of the new and more sophisticated data exploration tools such as data mining and data visualisation will remain unrealised.