Volume 16, No 1, 2019

Efficiency of Web Crawling for Geotagged Image Retrieval


Nancy Fazal, Khue Q. Nguyen and Pasi Fränti

Abstract

The purpose of this study was to find the efficiency of a web crawler for finding geotagged photos on the internet. We consider two alternatives: (1) extracting geolocation directly from the metadata of the image, and (2) geo-parsing the location from the content of the web page, which contains an image. We compare the performance of simple depth-first, breadth-first search, and a selective search using a simple guiding heuristic. The selective search starts from a given seed web page and then chooses the next link to visit based on relevance calculation of all the available links to the web pages they contain in. Our experiments show that the crawling will find images all over the world, but the results are rather sparse. Only a fraction of 6845 retrieved images (<0.1%) contained geotag, and among them only 5 percent were able to be attached to geolocation.


Pages: 16-39

DOI: 10.14704/WEB/V16I1/a177

Keywords: Location-based application; GPS; Web crawler; Location photos; Web application

Full Text