Enabling computers to be able to recognise the world around them and 'see' in the same way that a human can has long been one of the great challenges of Artificial Intelligence (AI). It is however, a field that has seen dramatic progress in recent years driven primarily through the large-scale use of convolutional networks, often known as deep learning. This is the technology that famously allowed Google to recognise cats in YouTube videos.
Google has now made this technology available as a public beta via the Google Vision API. PA’s Innovation Lab, which explores emerging technology and its impacts to create proofs of concept, prototypes and demonstrators to bring practical applications to life, was lucky enough to be granted early access to the algorithm, pre-beta.
Our team used the Google Vision API to create 'Foodstagram' – a simple application that is able to recognise different types of food and meals from pictures taken with a phone camera. Having identified the type of food, our app was then able to estimate the calorific content of a meal. Our client – a provider of health and wellness products – was interested in creating services that would allow customers to be automatically recommended and supplied with products that supplement their diets, based upon knowledge their nutritional and eating habits.
Being able to recognise different types of food from a photo was key to enabling a very simple and intuitive interface - improving the likelihood that the app would be used. Having recognised 'food at click', it was then possible to generate nutritional insight by referencing publicly available databases. This data could then be used to target products based upon detailed insight into a customer’s diet.
This ability to instantaneously identify objects from images would simply not have been possible a few years ago, but it is just the start of the machine vision journey. The next stage is likely to move from beyond simple recognition and identification to understanding what is happening in a picture. This is a non-trivial problem and one which has led calls for a 'Turing Test for machine vision'. Researchers from Brown University and Johns Hopkins University, with support from the Defense Advanced Research Projects Agency have developed a list of attributes that a picture might have, eg what people are included and are they carrying something? Their goal is to develop an information hierarchy for a picture against which machines can demonstrate their capabilities compared with humans. In other words, a more quantitative test for a specific aspect of 'intelligence'.
Whilst none of today’s machine vision technologies are close to passing this test, the field is fast moving with new capabilities being developed all the time. Initiatives such as the Google Vision beta allow any organisation to harness capabilities that would have previously only been accessible to dedicated research teams.
Can you predict how machine vision could benefit your organisation? How might this capability help your business? We’d be keen to hear your ideas and if you’d like to discuss this further, please come and talk to us at the Innovation Lab.