This first appeared in CIO Review.
Until recently, it was acceptable to connect a structured data store to a reporting/visualization tool, and call that an Analytics platform. But times have changed.
The analytics platforms emerging today are cloud-based, collaborative, and multi-entity. They aggregate data from inside and outside sources, automate extraction and transformation, integrate seamlessly with prediction and machine learning algorithms, and deliver insights in real time to business users.
Let’s unpack this statement, step-by-step:
1) Cloud-based – It is clear now that new analytics solutions must be built in the cloud. But even the definition of a cloud-based implementation is changing. Moving servers to a cloud-based hosting environment is a great first step, given the flexibility, scalability, and lower cost of ownership. The eventual vision, however, is to get to serverless computing–using on-demand compute services like Amazon Lambda or MS-Azure Functions, to simplify infrastructure and concentrate on data quality and analytics application development.
2) Collaborative – Historically, analytics has been done in silos, with different groups building fit-for-purpose data stores that respond to specific needs. Data Warehouses served as a centralized source of data, but the real analysis and insight generation occurred on desktops, combining multiple sources manually. Next-generation platforms allow multiple users to publish data sources, algorithms and insights, either through APIs or through data virtualization.
3) Multi-entity –Analytics platforms are increasingly used by multiple parts of the organization, or even across organizations. Platforms are used for more than just sharing data; insights, visualizations and algorithms can be shared across organizational boundaries. This creates additional requirements for security and authentication, as well as data masking.
4) Multi-source –Many organizations focus on internal data sources for analysis (transactional, financial, CRM, etc.), but tend to neglect the ever-increasing availability of public and third-party data sources (such as primary research, devices or social media feeds) that can be used to augment analytical solutions. This ability to blend data from multiple sources is critical, but it requires a sophisticated management approach to data feeds, both on licensing and updates. Otherwise, there is a risk that external data sources will become outdated or inaccurate, and therefore no longer useful for blending.
5) Automated – To ensure productivity of analytics teams, workflow automation is quickly becoming a necessity. As data blending and updating become more complex, it becomes more important to automate data collection, processing and tracking. Organizations are starting to combine primary research platforms, like survey tools, together with workflow automation and visualization components, to streamline work and improve the quality of analyses.
6) Intelligent – The availability of high-quality data from multiple sources is simply the precondition for the true purpose of analytical organizations: derive high-value insights from data. This is increasingly done by applying machine learning and AI to both structured and unstructured data sources. Leading-edge organizations are applying these tools to data that was previously inaccessible, such as customer service call recordings for example, and deriving insights about customer engagement from previously untapped sources. Predictive analytics is no longer an ivory-tower endeavor, but it is now routinely used for most business functions, from lead generation and back-office automation to demand forecasting and hiring.
7) Real-time – The always-on nature of the internet has driven us to a world of instant gratification. This is also the case for analytics solutions; it is no longer enough to use “last month’s data,” or “last year’s results” as a source of analysis. As a result, a batch approach to data collection and analysis is being replaced with on-demand data updating. This puts pressure on both the computing needs of the platform, since it will require real-time processing of ever-increasing amounts of data. It also puts pressure on data availability, to make sure that the most recent data sources are used for analysis. Approaches to manage data streaming, such as Kafka or Flume, are emerging to help organizations deal with high-volume, real-time data analysis.
Capitalizing on disruption in US healthcare – how organizations can avoid wasting billions
So what are the challenges ahead for reaching this brave new world? Companies looking to utilize next gen analytics platforms should consider these challenges while researching platforms.
• Security: In a multi-user, cross-organizational world, security and access management become more important than ever. Implementing cloud-based single sign-on technologies can simplify navigation across platforms and tools, but there will continue to be challenges with a centralized approach to data sharing. The emerging field of blockchain applications, with its distributed-ledger approach to secure data sharing, may provide some relief, but there is still much work to be done before those applications become robust enough for the typical IT organization to implement and manage.
• Legacy systems: Migrating from large investments in data warehouses and legacy reporting tools can be daunting. There will be resistance from teams used to working with those tools, and there may be contractual penalties for discontinuing use of legacy platforms. However, it is possible to build new analytics solutions modularly, and integrate with legacy tools using middleware.
• Unstructured data management: Traditional data warehouses are focused mostly on structured data. However, most analytical insights now depend on blending unstructured and structured data in almost-real-time. This requires a flexible approach and educating the organization on capturing and tagging unstructured data in useful ways for retrieval and analysis.
• Resilience: As we move to real-time analytics, algorithms and data processing routines will need to become more resilient: scalable, fault-tolerant, and able to handle data gaps and errors automatically so that insights can still be produced with incomplete data. This may require a fundamentally different approach to algorithm development, one that relies more on probabilistic data augmentation than on assuming perfect data availability every time.
• Tool evolution: The analytics field is exploding with new tools and approaches, both from large commercial players as well as the open-source community. Companies that relied on SAS for years, are now training their analysts to use R and Python, to keep up with the latest libraries and packages coming out of academia. Deep learning algorithms, enabled by advances in GPU design, are revolutionizing how analytical solutions are developed, but require specialized skills that are hard to find outside of the technology giants. The challenge for building a future-proof analytics platform is to use a modular approach, so as different components change, they can be quickly replaced by next- generation tools.
We are truly in a golden age for analytics, but implementing the right platform to take advantage of the latest technologies and data science developments is a moving target. Flexible, creative organizations that are able to imagine and deploy new ways of solving problems using analytics will stay ahead of the competition.