Recently news emerged of Google hoping to try out their cars on the bustling streets of London. As someone who regularly cycles on the streets of London, I can certainly attest to their unpredictable, often combative nature.
As such, they represent an enormous challenge for the Google engineers as their cars will have to manage an unprecedented range of unpredictable actors, whether drivers, cyclists or pedestrians.
Researchers from the University of California, San Diego may be able to provide some assistance, as a recent study has explored how computers can accurately recognize objects as well as humans can.
The team have developed an algorithm that can detect a pedestrian in near real-time and with high levels of accuracy.
“We’re aiming to build computer vision systems that will help computers better understand the world around them,” the authors say.
A central part of this is to give computers real-time vision, with particular applications for things like pedestrian detection systems.
The method uses traditional computer vision classification architecture alongside deep learning models.
Managing scale at speed
As you can imagine, the challenge of identifying pedestrians involves a huge amount of data as pedestrians appear in a number of different shapes, sizes and locations. As such, it typically requires millions of windows to be inspected to make an accurate diagnosis, and doing this in real-time is a huge challenge.
This approach works broadly as follows. Firstly, the algorithm triages the images so that ones that are obviously not pedestrians are discarded. It then goes to work on the trickier images, so it might try and differentiate between a tree and a human. The final stage then sees the most challenging images worked on.
This approach typically requires ‘weak learners’ to work at each stage. The approach has the advantage of speed, it isn’t really powerful enough to tackle the complex images in the final stages.
Thinking fast and slow
To overcome this, the team developed an algorithm that uses deep learning during these complex latter stages. They believe that deep learning models are better suited to complex pattern recognition, thus allowing things to scale.
Alas, the models require large amounts of computational grunt, so they often prevent real-time application, thus are unsuitable for the earlier stages of identification.
The team propose a hybrid approach that uses weak learners in the early stages, and then deep learning in the latter stages.
“No previous algorithms have been capable of optimizing the trade-off between detection accuracy and speed for cascades with stages of such different complexities. In fact, these are the first cascades to include stages of deep learning. The results we’re obtaining with this new algorithm are substantially better for real-time, accurate pedestrian detection,” they say.
It is very early stages at the moment, so the algorithm is only capable of working on binary detection tasks, but it will hopefully expand to more complex tasks such as simultaneous multi-object identification in time.
“One approach to this problem is to train, for example, five different detectors to recognize five different objects. But we want to train just one detector to do this. Developing that algorithm is the next challenge,” the team conclude.
Check out the video below for more on the method.