CW 10 (2nd March – 8th March) LIRe feature extraction, Annoy Index Model and Flickr Dataset Subset

Project is back on schedule!

As discussed in the previous post, caffe cannot be used due to the delay of planned release date of AlexNet features. Thus, for the development of this prototype, image feature extraction will be implemented using LIRe software library and integrated with Approximate Nearest Neighbor for indexing.

The Installation of LIRe library turned out to be much simpler than caffe. It depends on some other libraries and packages, but it is not as restrictive as caffe.
LIRe (Lucene Index Retrieval) [1] is an open source Java development library provided under GPL licence. It is built on top of the Lucene open source text search engine, provides text based retrieval and indexing [2]. As discussed in previous post, the same feature set released by Flickr can be extracted using LIRe library.

Based on the evaluation of advanced techniques in feature extraction and effective visual matching performed by Insight Centre, it appears that AlexNet Layer 7 features produce more accurate results [3][4][5].  Although, ideally AlexNet features could be extracted locally using caffe software library, for 100 million images, it would take 231 days (5 images/second) on a standard desktop machine with NVidia Titan graphics card installed. Also, a subset of the dataset could be used, but because of the time limitation the  prototype will be based on what is currently available, which is LIRe features even if it may not produce the same quality of results. However the prototype system designed is flexible, and LIRe features can be replaced with AlexNet features once these are available.

This week, a Java project has been developed, which extracts the following features from the input image (same as released features from Flickr): CEDD, Auto Color Correlogram, Basic features, Jpeg Coefficient Histogram, Color Layout, Edge Histogram, Fuzzy Color Histogram, Fuzzy Opponent Histogram, Gabor, Joint Histogram, Scalable Color and Tamura features. The extracted features are presented as a list of feature values.

ANNOY (Approximate Nearest Neighbor) is a C++ library with Python bindings, which is selected to index the Flickr image features dataset. This approach has been chosen based on the previous research work carried out by a group of researchers at Insight [3]. ANNOY creates static read only files, which means that the creation of the index is separate to loading and the model can be created once, and shared by different processes. Index trees created with ANNOY are very memory efficient and the performance is very good if the index fits RAM. This very well suits this project as millions of images with high dimension vectors need to be indexed.

Because LIRe is a Java library and the code for image extraction is written in Java, thus in order to integrate feature extraction with indexed model, Java version of ANNOY(Annoy-java https://github.com/spotify/annoy-java) will be used. However, annoy-java only implements loading trees built by the Python version of annoy; it cannot yet create its own. Thus, the ANN index model needs to be built in Python and the tree model can be loaded, once it is built for the real dataset, using Java to enable the search for similar images to the query image.

Several experiments have been carried out to analyze the performance of ANNOY in terms of memory usage and index model building time. These experiments will determine the hardware resources required for processing the entire dtaaset, or the number of images that can be processed given the resources available. Figure 1 shows the relative memory used and index built time vs. the number of images. The experiments have been ran on a standard desktop machine with 16 GB RAM, Intel i7-3770 CPU, Ubuntu 14.04 LTS OS installed.

Annoy Index Experiments Result

Figure 1: Annoy Index Experiments Result

In order to evaluate the performance of the system, a subset of the 100 million dataset is selected.

If the performance is satisfying, the size of the subset can be increased or the whole dataset can be used, depending on the time and resources needed and available. The five possible approaches to select a subset have been identified:
Linear select: every Xth image
Random select: select images on a random basis
Region (e.g. country, continent) selection based on the Geo-tag: There are 3rd party services that convert the geo-value to location, such as PyGeocoder. However, like most of other services, this python package is using Google map API at the backend, which only allows 25,000 requests per day and 5 requests per second. This means that for the full dataset it would take over 4,000 days to extract regions.
Extracting location from Flickr website: the link to the website is included in the image metadata. Experiment shows that extraction of geo location for 600 images take around 13 mins using single thread and 3.4 mins using 10 multi threads. Therefore if the whole dataset were to be used, using single thread would take approximately 578 days, 10 threads around 150 days.
– Box solution: In this work, box solution (developed by Insight researcher) was applied to select images from the whole dataset, which have Geo-tag value in the region of Ireland and UK. The speed of this solution is extremely fast, it takes 400s to process 100 million dataset (loading data file from hard drive is not included). After implementation of this solution, 5,339,258 images from Ireland and UK have be selected, which will be used for building and evaluation of the proposed system. According to the experiments above, the RAM resources are available to use this subset for prototyping.

This week I have got two major tasks done, which is development of query image feature extraction code and index model. The subset of the dataset has also been selected and is ready to use. As everything is now ready, the next step is to implement these three components into one working system. I believe that the biggest challenges have been overcome and the only challenge for the rest of the semester is the time. As I do not have any team members, writing project documentation, completing the project, and completing assignments for the other four modules is extremely challenging in terms of time.

References:
[1] Lux, Mathias. “Content based image retrieval with LIRe.” In Proceedings of the 19th ACM international conference on Multimedia, pp. 735-738. ACM, 2011.
[2] Lux, Mathias, and Savvas A. Chatzichristofis. “Lire: lucene image retrieval: an extensible java CBIR library.” Proceedings of the 16th ACM international conference on Multimedia. ACM, 2008.
[3] McGuinness, Kevin, et al. “Insight Centre for Data Analytics (DCU) at TRECVid 2014: instance search and semantic indexing tasks.” , 2014.
[4] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional
neural networks. In F. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25,pages 1097–1105. Curran Associates, Inc., 2012.
[5] Artem Babenko, Anton Slesarev, Alexander Chigorin, and Victor S. Lempitsky. Neural codes for image retrieval. In Computer Vision – ECCV 2014 – 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, pages 584–599, 2014.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: