Monthly Archives: March 2015

CW 13 (23rd March – 29th March) Server

This week the work on the server from CW 10 has been continued.

The index model with one of the LIRE features (Edge Histogram) using 5 million images, as a selected subset, has been built.

The released LIRE feature set contains a set of thirteen different visual image descriptors, from which only one of these, namely Edge Histogram has been chosen for similarity matching in this project, due to limited processing resources.

Edge Histogram is a structure to represent the local edge distribution of an image under 5 types of edges: vertical, horizontal, 45 degree, 135 degree and non-directional edge. The image is partitioned into 16 sub-images of equal size and edge distribution histogram is generated for each sub-image [1].

The edge histogram descriptor captures the spatial distribution of edges. The distribution of edges is a good texture signature that is useful for image to image matching even when the underlying texture is not homogeneous [2].

Previous work shows that Edge Histogram achieved relatively good results in content based image retrieval tasks [2][3]. Thus, in this work Edge Histogram has been selected. As previously discussed, recent state-of-the-art research work shows that AlexNet feature achieve better performance in content based image retrieval tasks, however, this feature set will not be available during the time of this project.

Since the aim of this project is to prove the concept of auto tagging using a large reference image dataset, selecting and evaluating an optimal set of features is out of the scope of this work.

The LIRE features for the full 100 million image dataset, are released in 9921 compressed files (tar.gz). In order to build the index model for the selected subset, a number of steps had been taken:

  1. Get Edge Histogram feature from feature dataset:

Decompress and gather Edge Histogram feature, on which the similarity matching will be based, among all features from the 9,921 compressed LIRE feature files released by Yahoo!. To do this, a Python program has been written, which loads in the compressed feature files, one by one, extracts the Edge Histogram feature and appends it to the feature output file. The time cost of this extraction process is approximately 9.5 hours. The final file containing Edge Histogram feature for 100 million images is 20.3 GB.

  1. Mapping entries in image meta dataset to entries in feature dataset:

The metadata released by Yahoo! contains image ID and other information for the 100 million images. The feature files, however, contain MD-5 hashed image feature identifier. In order to link the feature data with metadata, a Python dictionary table (key: image ID, value: feature ID) has been built, which returns the feature ID for a given image ID.

  1. Construct feature set of the subset data:

As described in CW10, processing the whole 100 million image requires a lot of computer resources, which is only available in a computing cloud. For this project, a 5 million image subset, captured in Ireland and UK, was selected (See CW10).  Based on our previous experiment, the index model that is built from this sub data can fit into the RAM of a standard desktop machine.

To get the Edge Histogram features of the subset, a second dictionary has been built, which contains all the feature IDs of the selected subset as the key. Thus, for a given feature ID, if it is in this key set, then the feature is in the selected subset.

All features in the feature output file (created from step 1) had been looped through, line by line, if its feature ID is in the subset, it is saved in a separate sub-feature file. The size of the subset feature file is approximate 4.5G.

  1. Build ANN Index Model:

At this stage, having image IDs and feature IDs for the subset, an attempt to build ANNOY index model using unique image IDs has been made, however, unsuccessful. Two possible reasons for this may be that the image IDs were not in order, or that the image IDs were not in sequence (5 million images extracted from 100 million images).

The work this week was very challenging because of the number and size of files that were used to create the desired final output. The written Python programs took several hours to run and crashed many times. The next step is to set up a database, which will store the metadata of the dataset. Once the meta data is saved in the database, the integration and testing of the auto tagging system cab start.

References:

[1] Won, Chee Sun, Dong Kwon Park, and Soo-Jun Park. “Efficient use of MPEG-7 edge histogram descriptor.” Etri Journal 24.1 (2002): 23-30 [Online]. Available from: http://etrij.etri.re.kr/etrij/journal/article/article.do?volume=24&issue=1&page=23 [Last Accessed 29 March 2015]

[2] Manjunath, Bangalore S., et al. “Color and texture descriptors.” Circuits and Systems for Video Technology, IEEE Transactions on 11.6 (2001): 703-715.

[3] Won, Chee Sun. “Feature extraction and evaluation using edge histogram descriptor in MPEG-7.” Advances in Multimedia Information Processing-PCM 2004. Springer Berlin Heidelberg, 2005. 583-590.

 


CW 12 (16th March – 22nd March) Client: Graphical User Interface

The development of initial version of the client side system has been completed this week.

Java programming language has been chosen as the development language to enable easy integration of the feature extraction code with Graphical User Interface (GUI).

Specifically, Java 8, the newest release of Java has been used, as it provides many features and tools that were not available in previous version of Java, such as the build in JavaFX library.

JavaFX is a set of graphics and media packages that enables developers to design, create, test, debug, and deploy rich client applications that operate consistently across diverse platforms. More details about JavaFX can be found at the following link.

A new visual layout tool – JavaFX Scene Builder, is also released. It allows users quickly design JavaFX application user interfaces, without coding. Users can drag and drop UI components to a work area, modify their properties, apply style sheets, and the FXML code, for the layout that they are creating, is automatically generated in the background. The result is an FXML file that can then be combined with a Java project by binding the UI to the applications logic [1].

The design of the client side system follows the Model-View-Controller (MVC) [2] pattern, which separates presentation, logic and data.

JavaFX Scene Builder

Figure 1: JavaFX Scene Builder

The view requests information from the model and present these information to the user. In this client system, the view is built using JavaFX Scene Builder (shown in Figure 1).

The controller contains a series of functions that handle user inputs (such as a click of a button) or system events (e.g. image feature is extracted).

The model notifies its associated views and controllers when there has been a change in its state, e.g. processing results returned from the server.

Figure 2: Packages of the client side system

Figure 2: Packages of the client side system

A list of packages is shown in Figure 2. GUIPanel.fxml is the designed UI file generated from JavaFX Scene Builder and it is linked to GUIController.java which handles all the users interactions and system events. ModelContainer.java contains all the data. Utils package consists of few tools including: reading/writing files from/to hard drive, reading system time and converting to string format for display etc.

A screen shot of the client side GUI is shown in Figure 3. This initial version of the client system does not contain the functionality of data communication with the back end server.

Figure 3: A screenshot of Client side system User Interface

Figure 3: A screenshot of Client side system User Interface

The project source code is hosted on a cloud based revision control system called Bitbucket.

References:
[1] JavaFX Scene Builder – A Visual Layout Tool for JavaFX Application, http://www.oracle.com/technetwork/java/javase/downloads/javafxscenebuilder-info-2157684.html, (Last Accessed: 22 March  2015)
[2] Glenn E. Krasner and Stephen T. Pope. 1988. A cookbook for using the model-view controller user interface paradigm in Smalltalk-80. J. Object Oriented Program. 1, 3 (August 1988), 26-49.


CW 11 (9th March – 15th March) Quick Update

Due to a number of assignments and coming deadlines, this post is a short update of CW 11 work, which will be explained in details in next week’s post (e.g system design, library used etc.).

Development of the interface for demonstration purpose of the backend system has been started.

Figure 1: Initial design of interface for demonstration of the backend system

Figure 1: Initial design of interface for demonstration of the backend system

The following functionality has been designed and an initial test version has been developed (as shown in Figure 1):

  • uploading of query image from local file system or from URL
  • word cloud figure ( example shown is using dummy data which will be replaced with real image tags returned from backend system)
  • histogram and pie chart plot (example shown is using dummy data) and
  • a list of the most similar images to the query image are shown at the bottom (Figure 1 shows a sample image, which will be replaced by a list of images returned from backend system).

 


CW 10 (2nd March – 8th March) LIRe feature extraction, Annoy Index Model and Flickr Dataset Subset

Project is back on schedule!

As discussed in the previous post, caffe cannot be used due to the delay of planned release date of AlexNet features. Thus, for the development of this prototype, image feature extraction will be implemented using LIRe software library and integrated with Approximate Nearest Neighbor for indexing.

The Installation of LIRe library turned out to be much simpler than caffe. It depends on some other libraries and packages, but it is not as restrictive as caffe.
LIRe (Lucene Index Retrieval) [1] is an open source Java development library provided under GPL licence. It is built on top of the Lucene open source text search engine, provides text based retrieval and indexing [2]. As discussed in previous post, the same feature set released by Flickr can be extracted using LIRe library.

Based on the evaluation of advanced techniques in feature extraction and effective visual matching performed by Insight Centre, it appears that AlexNet Layer 7 features produce more accurate results [3][4][5].  Although, ideally AlexNet features could be extracted locally using caffe software library, for 100 million images, it would take 231 days (5 images/second) on a standard desktop machine with NVidia Titan graphics card installed. Also, a subset of the dataset could be used, but because of the time limitation the  prototype will be based on what is currently available, which is LIRe features even if it may not produce the same quality of results. However the prototype system designed is flexible, and LIRe features can be replaced with AlexNet features once these are available.

This week, a Java project has been developed, which extracts the following features from the input image (same as released features from Flickr): CEDD, Auto Color Correlogram, Basic features, Jpeg Coefficient Histogram, Color Layout, Edge Histogram, Fuzzy Color Histogram, Fuzzy Opponent Histogram, Gabor, Joint Histogram, Scalable Color and Tamura features. The extracted features are presented as a list of feature values.

ANNOY (Approximate Nearest Neighbor) is a C++ library with Python bindings, which is selected to index the Flickr image features dataset. This approach has been chosen based on the previous research work carried out by a group of researchers at Insight [3]. ANNOY creates static read only files, which means that the creation of the index is separate to loading and the model can be created once, and shared by different processes. Index trees created with ANNOY are very memory efficient and the performance is very good if the index fits RAM. This very well suits this project as millions of images with high dimension vectors need to be indexed.

Because LIRe is a Java library and the code for image extraction is written in Java, thus in order to integrate feature extraction with indexed model, Java version of ANNOY(Annoy-java https://github.com/spotify/annoy-java) will be used. However, annoy-java only implements loading trees built by the Python version of annoy; it cannot yet create its own. Thus, the ANN index model needs to be built in Python and the tree model can be loaded, once it is built for the real dataset, using Java to enable the search for similar images to the query image.

Several experiments have been carried out to analyze the performance of ANNOY in terms of memory usage and index model building time. These experiments will determine the hardware resources required for processing the entire dtaaset, or the number of images that can be processed given the resources available. Figure 1 shows the relative memory used and index built time vs. the number of images. The experiments have been ran on a standard desktop machine with 16 GB RAM, Intel i7-3770 CPU, Ubuntu 14.04 LTS OS installed.

Annoy Index Experiments Result

Figure 1: Annoy Index Experiments Result

In order to evaluate the performance of the system, a subset of the 100 million dataset is selected.

If the performance is satisfying, the size of the subset can be increased or the whole dataset can be used, depending on the time and resources needed and available. The five possible approaches to select a subset have been identified:
Linear select: every Xth image
Random select: select images on a random basis
Region (e.g. country, continent) selection based on the Geo-tag: There are 3rd party services that convert the geo-value to location, such as PyGeocoder. However, like most of other services, this python package is using Google map API at the backend, which only allows 25,000 requests per day and 5 requests per second. This means that for the full dataset it would take over 4,000 days to extract regions.
Extracting location from Flickr website: the link to the website is included in the image metadata. Experiment shows that extraction of geo location for 600 images take around 13 mins using single thread and 3.4 mins using 10 multi threads. Therefore if the whole dataset were to be used, using single thread would take approximately 578 days, 10 threads around 150 days.
– Box solution: In this work, box solution (developed by Insight researcher) was applied to select images from the whole dataset, which have Geo-tag value in the region of Ireland and UK. The speed of this solution is extremely fast, it takes 400s to process 100 million dataset (loading data file from hard drive is not included). After implementation of this solution, 5,339,258 images from Ireland and UK have be selected, which will be used for building and evaluation of the proposed system. According to the experiments above, the RAM resources are available to use this subset for prototyping.

This week I have got two major tasks done, which is development of query image feature extraction code and index model. The subset of the dataset has also been selected and is ready to use. As everything is now ready, the next step is to implement these three components into one working system. I believe that the biggest challenges have been overcome and the only challenge for the rest of the semester is the time. As I do not have any team members, writing project documentation, completing the project, and completing assignments for the other four modules is extremely challenging in terms of time.

References:
[1] Lux, Mathias. “Content based image retrieval with LIRe.” In Proceedings of the 19th ACM international conference on Multimedia, pp. 735-738. ACM, 2011.
[2] Lux, Mathias, and Savvas A. Chatzichristofis. “Lire: lucene image retrieval: an extensible java CBIR library.” Proceedings of the 16th ACM international conference on Multimedia. ACM, 2008.
[3] McGuinness, Kevin, et al. “Insight Centre for Data Analytics (DCU) at TRECVid 2014: instance search and semantic indexing tasks.” , 2014.
[4] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional
neural networks. In F. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25,pages 1097–1105. Curran Associates, Inc., 2012.
[5] Artem Babenko, Anton Slesarev, Alexander Chigorin, and Victor S. Lempitsky. Neural codes for image retrieval. In Computer Vision – ECCV 2014 – 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, pages 584–599, 2014.


CW 9 (23rd February – 1st March) Change of plan: from Caffe and AlexNet to LIRe

Caffe (an image feature extraction library) installation was finally successful this week. The possible reason for the problems faced was a conflict between Nvidia graphics driver and Ubuntu Desktop Environment. The server version of Ubuntu, which does not have desktop environment by default, was used instead of desktop version. Figure 1 is an extract showing that all the tests have passed, meaning a successful Caffe installation.

Successful Caffe installation

Figure 1: Successful Caffe installation

Caffe performance is better (approximate 11 times faster) when used with GPU instead of CPU [3]. According to Caffe official website (http://caffe.berkleyvision.org/), Caffe can process 60M images per day with a single NVIDIA K40 GPU. This is because GPU has thousands of cores that enable processing of image in parallel where CPU has few cores and would process it sequentially [1][2]. Image can be divided into small patches (m by n grid) and these small patches can be processed synchronously. Figure 2 below shows Caffe performance comparison when used with CPU vs GPU vs GPU with cuDNN (deep neural network library).

I was planning to use my laptop as the developing machine and use a much more powerful server machine, which has caffe installed, at Insight Centre for full dataset process. Caffe library was only tested on few specific Nvidia graphic cards, excluding the one in my laptop, Thus, caffe CPU mode was used on my laptop. However, when processing full dataset on the server, caffe will automatically switch to GPU mode.

On Friday (27.02.2015) the AlexNet features (extracted using caffe library) for the Flickr dataset were planned to be published according to the official Flickr dataset website: http://www.yli-corpus.org/home/future-releases). This type of features were to be extracted from the query image using caffe library and the indexed Flickr image features searched to find similar images from which the tags could be extracted. However, the release date has been changed to 30th April, a week after my project submission deadline. This postponement means that the approach we selected (in week CW 6) can not be applied.
After discussion with my supervisor, we have altered our original solution to using LIRE image features (already released by Flickr) instead of AlexNet features.
LIRE is an open source Java library that extracts several low level features, such as MPEG-7 ScalableColor, ColorLayout, EdgeHistogram etc. from the input image.

What have also changed is that the features were meant to be hosted on Amazon cloud, which can be accessed directly, is now shared on Google Drive in the format of 9,921 compressed tar files. The total size of the dataset is 329 GB and approximately 1.5 TB after decompressing. This means that we need to process this data locally instead of on the Amazon Cloud as we have originally planned. It also means that processing this data require a lot of computational resources. After discussing with my supervisor, we decided that we may have to process a subset of the whole dataset depending on the the computer resources that are available to us. Whether to process the full dataset or a subset and the size of the subset is not determined yet. I will review some literature, ask for advice and experiment to find out how much data we can process.

I met my supervisor on Friday. We have discussed the progress since our last meeting, what I am working on at the moment and the next steps. I have also received some advice and clarifications on what I should include in different sections of the Final Delivery Documentation especially on the market analysis of the target market section. I will be posting progress on this as well.

This week the problem with caffe installation, that was stopping me from making progress on the project, has been fixed. However, this solution may not be applied now. Caffe extracts AlexNet features which will not be available from Flickr in my time frame and I need to change my approach.

In CW 10, I will install LIRE library and read the documents to learn how to use it. Hopefully it will not cause issues like caffe library, so I can still get back on my schedule.

Bibliography:
[1]. Gregg, Chris, and Kim Hazelwood. “Where is the data? Why you cannot debate CPU vs. GPU performance without the answer”. Performance Analysis of Systems and Software (ISOPASS), 2011 IEEE International Symposium on. IEEE, 2011.
[2]. Asano, Shuichi, Tsutomu Maruyama and Yoshiki Yamaguchi. “Performance comparison of EFGA, GPU and CPU in image processing”. Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on. IEEE, 2009.
[3]. Larry Brown, 2014. “Accelerate Machine Learning with cuDNN Deep Neural Network Library” [Online]. Available from: http://devblogs.nvidia.com/parallelforall/accelerate-machine-learning-cudnn-deep-neural-network-library/ [Accessed on 1 March 2015]