The language around Machine Learning (ML) can sound complex and obscure to the non-specialist. There exist so many kinds of ML algorithms, with names as engaging as “k-nearest neighbours”, “Hopfield networks”, “Adaboost”, etc, each with specific pros and cons, that it is difficult to know which to choose to solve a particular problem unless you’re a data scientist.
ML techniques are sometimes categorised by broader families of mathematical methods: regression; clustering; dimensionality reduction; neural networks and deep learning. These terms are somewhat easier to understand, but they still bare little connection to the operational contexts where they could be applied.
Perhaps it is more useful for industry decision makers to grasp the classification of ML techniques by “learning style”, as this better reflects the role of input data and of the modelling process, and gives clues about what the algorithms can achieve from the data. The three main “learning styles” are generally considered to be: supervised learning, unsupervised learning, and reinforcement learning.
ML is called “supervised learning” when it is given pre-existing labelled data to “train” on, that is explicit information on what the output should be for a given set of inputs. This learning style is usually used for classification or forecasting tasks. An ML system can be asked to find the category (output result) that an observation (input data) belongs to, for example decide whether an email is spam or non-spam; or to find the relevant health and safety incident category of a reported event. In other instances, ML can be used to predict the level of a variable (output result) based on other variables (input data) such as predicting item price; credit risk; or component failure probability.
In both classification and forecasting tasks, the ML algorithm compares the results it obtains to the results it should have reached according to the pre-existing labels. It then adjusts its model weights accordingly until it reaches a satisfactory error level.For instance, consider data that includes the length and weight of a train vehicle (input data), and a corresponding train class (expected result).
Stored data does not always include explicit labels. ML uses “unsupervised learning” when it is given unlabelled data and is left on its own to come up with a segmentation of the data. Unsupervised learning is usually good for anomaly detection, for example in the application of detecting cybersecurity intrusion. Unsupervised learning is also valuable for “pattern discovery”, that is, the discovery of unsuspected relationships between data. However, it should be noted that the patterns discovered in this way generally lack the theoretical underpinning required for ensuring the robustness of the results over time.
Consider the case when vehicle classes have not been assigned to car lengths and weights, and we therefore end up with unlabelled data (inputs with no assigned correct outputs). “Learning” from this data is still possible using algorithms that detect patterns by comparing the inputs and grouping them in clusters.
The resulting segmentation is ambiguous as one cannot be certain what the clusters represent: the unsupervised learning algorithm does not propose a functional model that underpins the grouping. However, this is still a remarkable result, when we consider that the ML system was only given raw data, without any instructions. Unsupervised Learning is even more impressive when it can detect patterns in the data that the human eye cannot see.
In many control applications, a model of the environment may not be captured analytically due to the complexity of the system. However, a “reinforcement learning” approach can still be used to mimic or simulate how an intelligent agent would try to learn the underlying process. Learning from interactions with the environment is key here, as the agent uses available decisions to try out strategies and observe the “reward” from these decisions. The idea is to discover which decision paths yield the most reward by trying many decisions randomly and observing their outcomes (trial and error). In a sense, it is akin to finding and implementing a winning strategy based on the “rules of the game” (given or inferred) in a given situation.
The "agent" or algorithm can learn the optimal strategy or policy based on historical data, or most likely, based on playing enough simulated games. Maximising the long-term reward (or minimising the long-term loss) is the objective, rather than the immediate reward (or loss). For example, the goal of an agent learning how to play chess is to win a match, not to capture an opponent's piece with the next move.
As we saw in the introduction of this Blog series, reinforcement learning is not confined to gaming, and has demonstrated its ability to bring industrial benefits such as Google Deepmind and Google data centres' energy consumption. This reinforcement learning approach is good in complex environments that are difficult to model, and for real time adaptive decisions. It does not need vast volumes of historical data, since it uses simulations to generate new data and learn, but it requires an adequate representation of the key strategic interactions and of the reward of strategic choices.
The broad categories of supervised learning, unsupervised learning and reinforcement learning are a far from perfect differentiation, as some techniques (for example neural networks) can be found in all three categories, and because some others are half-way between categories such as "semi-supervised learning".
The choice of techniques depends on the data constraints and on the objectives pursued: are labels created by human expertise available for training the ML system in a supervised way? Are we hoping for new insight, in an unsupervised way? Do we have sufficient historical data, or do we need simulated data, in a reinforced learning way? Are we looking for a long-term understanding of the modelled relationships between the data, as with a parametric approach, or do we mostly want short-term efficiency in predictions, as with neural networks? These are questions to consider when embarking on an ML journey. It is likely that the choice of technique will be constrained by the specific problem to be solved.
Below we provide an initial mapping of railway domains and applications where ML techniques could bring benefits, organised by learning style. We plan to refine this mapping and develop it further as the series continues, and we welcome your comments and feedback.