Computers are getting increasingly clever when it comes to identifying objects, and then knowing how to react around them. Whilst many of these advances have come in areas such as driverless technology, the web is not being left behind.
A recent study highlights how computers are now capable of understanding the contents of an image merely by looking at it.
It’s a task that humans are incredibly good at, but without meta information, it’s something that befuddles most computers, until now at least.
The Google based team have trained a machine to figure out the location of any photo, just from looking at it.
The all seeing machine
The new machine is so good at the task that it outperforms humans. Indeed, it’s even capable of determining the location of images taken indoors or of very specific items such as pets and food that have no discernible location cues.
They achieve this in a very clever way. It involves dividing the world up into a grid of 26,000 squares or so. The size of these squares is dependent upon the number of images taken in that location.
So cities will have a different grid structure to remote areas where photographic coverage is less dense.
The team developed a database of 126 million images, together with the location data for each images. 91 million of these were then used to train the neural network to understand the grid location of an image simply by looking at it.
Once the system, called PlaNet, was trained, they tested it out on the remaining 34 million images in the database, before testing it on unvalidated images harvested from Flickr.
“PlaNet is able to localize 3.6 percent of the images at street-level accuracy and 10.1 percent at city-level accuracy,” they say.
It was able to guess the country of each photo with an accuracy of 28%, and the continent 48% of the time, which was better than 10 handpicked human experts.
If you want to test yourself, the team have developed a game where you can test yourself against the machine and see how you get on (just don’t get your hopes up).
“In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km,” the team say. “[This] small-scale experiment shows that PlaNet reaches superhuman performance at the task of geolocating Street View scenes.”
How it does it
Given that thus far, it seems to be thrashing human players, you may wonder just how they do it. It would appear as though experience is key.
“We think PlaNet has an advantage over humans because it has seen many more places than any human can ever visit and has learned subtle cues of different scenes that are even hard for a well-traveled human to distinguish,” the team suggest.
The potential for this approach is emphasized by the ability of PlaNet to successfully identify images taken indoors where no location clues are present.
It does this by looking for contextual clues, such as the album the picture is a part of. Interestingly, it’s capable of doing this with relatively little grunt.
The team reveal that the network uses just 377 MB of memory, which renders it something that can fit onto most smartphones, and therefore something with no end of possible applications.
Watch this space.