A Software Engineer at Google Research named Chao Chen published on the Google AI Blog the 11th of August 2020. The article published was named: On-device Supermarket Product Recognition. Although I have been writing mostly about natural-language processing the last few days I thought I would take a short break from this endeavour to look at this research.
Chen stresses the challenges faced by users who are visually impaired.
It can be hard identifying packaged foods in grocery and kitchen.
Many foods share the same packaging — packed in boxes, tins, jars, and so on.
In many cases the only difference is text and imagery printed on the product.
With the ubiquity of smartphones Chen think we can do better.
Using machine learning (ML) he suggests to address this challenge. Since the speed has developed and computing power in smartphones has increased many vision tasks can be undertaken entirely on a mobile device.
However, in COVID-19 times, it may be advantages as well to not physically touching a product to examine packaging information.
He mentions the development of on-device models such as MnasNet and MobileNets (based on resource-aware architecture search).
Using these developments such as these, recently released Lookout, an Android app that uses computer vision to make the physical world more accessible for users who are visually impaired.
“Lookout uses computer vision to assist people with low vision or blindness get things done faster and more easily. Using your phone’s camera, Lookout makes it easier to get more information about the world around you and do daily tasks more efficiently like sorting mail, putting away groceries, and more.”
This was built with the guidance from the blind and low-vision community, and supports Google’s mission to make the world’s information universally accessible to everyone.
It is brilliant to see Google going in this direction for those who have difficulty accessing information. Chen writes:
“When the user aims their smartphone camera at the product, Lookout identifies it and speaks aloud the brand name and product size.”
How is this accomplished?
- S supermarket product detection and recognition model.
- An on-device product index.
- MediaPipe object tracking
- Optical character recognition model.
This leads to an architecture that is efficient enough to run in real-time entirely on-device.
Chen argues that this may have to be so.
With an on-device approach it has the benefit of being low latency and with no reliance on network connectivity.
The datasets used by Lookout consist of two million popular products chosen dynamically according to the user’s geographic location.
In this sense it could cover most usage.
Chen has created a figure of the design.
“The Lookout system consists of a frame cache, frame selector, detector, object tracker, embedder, index searcher, OCR, scorer and result presenter.”
For detailed information on this architecture I suggest you read the original blog post by Chen.
Regardless, such a system outlined here without a doubt holds a potential to be useful for those with disabilities and is worth trying out.