Category Archives: Uncategorized

CW 16 (13th April – 19th April) Final Post… Establishing Communication Channel and Integrating the Components

The prototype of the system is now complete and the Final Delivery Documentation is written, therefore this is the last blog post.

This week the communication between client and the server has been established and all the building blocks have been integrated into a complete system.

  • The index model has been loaded into the Django server when it starts, to enable the search for IDs of similar images.
  • The mongoDB is now automatically searched for images with returned IDs from the index model.
  • The tags are extracted from the images and are analysed. The frequency of their occurrences are obtained.
  • A word cloud image is generated on the server with the extracted tags. The code is creating a unique name every time the image is generated. Currently, the word cloud images are stored on the server but ideally, it should be deleted when the session ends, or generated on the client side in real deployed system.
  • The communication channel built between the client and server enables the result to be sent back to the client
  • The client generates the pie chart and histogram and displays the result to the user.

The three design patterns used during the development of this prototype are:

  1. Model View Controller (MVC) described in CW 12.
  2. Singleton Pattern which ensures the uniqueness of the instance in the class and provides a global access to that instance. The ModelContainer.java in Model package is a singleton class to ensure data consistency and easy access.
  3. Observer and Observable is used to notify the controller (observer) of any changes in the data hold in the model (observable).

During the development of the web server, many errors have been encountered. Some of them were simple to fix, and some needed a deeper analysis. Few examples are described below.

  • ASCII Error

Some of the tags extracted from the similar images to the query image, were not in ASCII format. The word cloud could not be generated because of this. To solve the problem, these should have been decoded, but because of the time limitation, statements to ignore the characters have been written.

  • KeyError

The error shown in Figure 1, was caused because of the inconsistent name for feature values. In JSON data, the name of the feature value has been defined as ‘featureValue’ and on the server as ‘feature’.

Figure 1: Snapshot of the Key Error

Figure 1: Snapshot of the Key Error

  • Value Error

The value error as shown in Figure 2, occurred because JSON sends data as a string from the client to the server and it needs to be converted into a number. The occurrence of the squared bracket caused the problem. It had to be removed to allow the conversion to the number.

Figure 2: Snapshot of the Value Error

Figure 2: Snapshot of the Value Error

  • Argument Error

Wrong parameter has been passed to the load index function. Instead of the feature dimension, which is required, a variable ‘f’ has been passed, which has previously been defined as a file holder. The snapshot of the argument error is shown in Figure 3 below.

Figure 3: Snapshot of Argument Error

Figure 3: Snapshot of Argument Error

 

In addition to the completion of the system development, work on the Final Delivery Documentation has been continued. The rest of the sections had been written and submitted for review to my supervisor.

The document will be corrected according to my supervisor’s suggestions and submitted on the 25th April as .pdf and in the printed version.

This project has been very challenging and the experience and knowledge gained is invaluable.

 

Advertisements

CW 15 (6th April – 12th April) Web Server with DJango

The building blocks of the overall system are now ready. The next milestone is to establish a communication between the client and the server and integrate the final system.

The communication between the client and the server is as follows:

  • The user selects an image from local file system or provides an image URL from the web.
  • When the process button is clicked, a feature set of the image is extracted using LIRE feature extraction library.
  • The client sends a HTTP request to the server with the extracted feature values in JSON data interchange format.
  • The server receives the request and searches the pre-built ANN index model, for similar entries. The index model returns index IDs of these similar images.
  • The metadata collection in MongoDB is searched, based on these IDs. Tags together with the image URLs and other information are extracted.
  • Tag analysis takes place on the server. The word cloud image is generated from the extracted tags.
  • The server returns the data to the client. Client’s Graphical User Interface displays the word cloud, pie chart, histogram and the first 5 of the most similar images.

Originally, a socket server was planned to be used to build a communication channel between the client and the server. However, after reading some online forums and discussing with my  mentor, it was decided to build a web server instead.

Some of the reasons for making this decision include the richer functionality which web server offers and easier maintenance and expansion at a later stage, compared to socket server.

To do this, Django has been chosen. It is an open source, high-level Python Web framework which reduces the amount of repetitive and time consuming tasks by for example by providing shortcuts for frequent programming tasks [1].

Django automatically creates project with several files such as settings, url and WSGI. These files can be later modified to suit the specific needs.

To host our service, a Django app named flickrservices is created and registered in the setting.py file:

INSTALLED_APPS = (
‘django.contrib.admin’,
‘django.contrib.auth’,
‘django.contrib.contenttypes’,
‘django.contrib.sessions’,
‘django.contrib.messages’,
‘django.contrib.staticfiles’,
flickrservices‘,  # the application created to hold the processing functionality
)

The url.py file parses the url provided by the user through the web browser and points to the functions in the application to process the user request.

The url.py file contains:

url(r’^$’, ‘flickrservices.views.jsonRespone’, name=’jsonRespone’),
# ^$ means any request will be redirected to flickrservices.views.jsonRespone function.

The flickrservice.views contains the functions for processing the request and returning the result to the client user interface. At the moment, it only contains the test code which has been created to get started with the development and find out how Django works. Next week (CW16), this file will include all the code necessary to process the user request and return the result back to the client user interface such as searching the index, processing the extracted tags etc., which is already developed and needs to be slightly modified.

@csrf_exempt
def jsonRespone(request):

#  for Testing
# if the client sends raw json data:
#received_json_data=json.loads(request.body)
#objs = json.loads(request.raw_post_data)
# objs = json.loads(request.POST[‘jsonData’])
#json_data = request.POST.get(‘details’, None)
#if json_data:
#    data = json.loads(json_data)
#    response_data = {}
#    response_data[‘result’] = ‘ok’
#    response_data[‘wordMapURL’] = ‘/static/template/images/wordcloud1.jpg’

if request.method == ‘POST’:

received_json_data = request.POST.get(‘details’, None) # the data object are the feature values sent from the client

if received_json_data:
data = json.loads(received_json_data)

// TODO: search index model to get a list of the ids of similar images
// TODO: search mongoDB to get the tags of all similar images and the first 5 most similar image urls.
// TODO: process the tags, statistical analysis for pie and bar char.
// TODO: generate word cloud image
// TODO: encode the results into JSON array and return to client

 

response_data = {}
response_data[‘result’] = ‘ok’
response_data[‘wordMapURL’] = ‘/images/wordcloud1.jpg’

# how you return json data before django 1.7
return HttpResponse(json.dumps(response_data), content_type=”application/json”)

#return JsonResponse(response_data)

To learn how to get started with the development with Django I have read parts of the “The Django Book” which is open source and available online. I have also watched tutorials on YouTube. The first seven tutorials from the given link, are sufficient for the needs of this project.

Figure 1, illustrates the Django project that is created from the above procedure. PyCharm, a Python IDE is used for Django development.

Figure 1: Snapshot of Django project and files in PyCharm IDE

Figure 1: Snapshot of Django project and files in PyCharm IDE

This week, the query image feature extraction code has been integrated into the user interface and therefore when the user selects an image and clicks the process button, the features are extracted and sent to the server.

The work on the Final Delivery Documentation has also been continued this week. The following sections: Software Architecture, Database Design, Data Flow Diagram, UML Cases, Competitors, Time Description, Timeline and Costing for Completion of Development have been written.

To create the Data Flow diagram and UML cases I have used the following book for reference (Kendall, Kenneth E.; Kendall, Julie E., Systems Analysis and Design: Global Edition, 8/E, Pearson 2011. Pg. 221-255 and 63-74)

2015-04-12 17.48.34

 

To help me identify the development stages and tasks within these stages, for completing the development of this prototype as a web service, I have referred to the following book (Erl, T., Service-Oriented Architecture, Prentice Hall 2012, Pg.357-586)

2015-04-12 17.46.08

 

References:
[1] HOLOVATY, A., and J. KAPLAN-MOSS. “The Django Book: Version 2.0.” The Django Book 16 (2009) [Online]. Available from: http://www.djangobook.com/en/2.0/index.html


CW 14 (30th March – 5th April) Mongo DB

This week mongoDB has been installed to store the image subset metadata for tags retrieval.

MongoDB is an open source, NoSQL database based on key-value store, where each item in the database is stored as an attribute together with its value. MongoDB has a dynamic schema which allows insertion of extra fields at any time, because the schema is not predefined as in the case of relational databases.

MongoDB has been chosen as it is a free, open source, NoSQL database. It is easy to install and use and provides good documentation covering all aspects of deployment and usage as well as additional resources such as user groups, guides, webinars and many more. Because of its dynamic schema, new fields can be inserted at any time, which is especially useful for this project, as the requirements are not known well in advance.

Following installation guide available from the official mongoDB website, the installation process was very smooth and quick. No problems were encountered.

To learn how to use MongoDB and how to insert the required data, I have used the ‘Reference’ section of mongoDB manual. Each subsection contains a clear explanation with specific examples and references to other material. MogoDB official website also provides free online courses at https://university.mongodb.com/.
To insert the subset metadata and index model ID into mongoDB, a python program has been written.

Screenshot from 2015-04-05 10_55_51

Figure 1: Snapshot of program to insert data into MongoDB

All python programs have been written in iPython notebook, as shown in Figure 1, which is a web based interactive python development environment.

Figure 2 below demonstrates the output of findOne() command, that returns documents which satisfy the queried criteria. In Figure 2, the specified criteria is for the first document from the Ireland and UK image collection, which is the metadata for the first image from the subset.

Figure 2: MongoDB result of findOne() command

Figure 2: MongoDB result of findOne() command

In addition to the work on MongoDB, Executive Summary of the project, Service Description, Service Analysis, Management Report and Interface Rationale section has been written for Final Delivery Documentation.

The work over the next 3 weeks will involve building a web server, integrating it with the user interface and continuing the writing of the Final Delivery Documentation.


CW 13 (23rd March – 29th March) Server

This week the work on the server from CW 10 has been continued.

The index model with one of the LIRE features (Edge Histogram) using 5 million images, as a selected subset, has been built.

The released LIRE feature set contains a set of thirteen different visual image descriptors, from which only one of these, namely Edge Histogram has been chosen for similarity matching in this project, due to limited processing resources.

Edge Histogram is a structure to represent the local edge distribution of an image under 5 types of edges: vertical, horizontal, 45 degree, 135 degree and non-directional edge. The image is partitioned into 16 sub-images of equal size and edge distribution histogram is generated for each sub-image [1].

The edge histogram descriptor captures the spatial distribution of edges. The distribution of edges is a good texture signature that is useful for image to image matching even when the underlying texture is not homogeneous [2].

Previous work shows that Edge Histogram achieved relatively good results in content based image retrieval tasks [2][3]. Thus, in this work Edge Histogram has been selected. As previously discussed, recent state-of-the-art research work shows that AlexNet feature achieve better performance in content based image retrieval tasks, however, this feature set will not be available during the time of this project.

Since the aim of this project is to prove the concept of auto tagging using a large reference image dataset, selecting and evaluating an optimal set of features is out of the scope of this work.

The LIRE features for the full 100 million image dataset, are released in 9921 compressed files (tar.gz). In order to build the index model for the selected subset, a number of steps had been taken:

  1. Get Edge Histogram feature from feature dataset:

Decompress and gather Edge Histogram feature, on which the similarity matching will be based, among all features from the 9,921 compressed LIRE feature files released by Yahoo!. To do this, a Python program has been written, which loads in the compressed feature files, one by one, extracts the Edge Histogram feature and appends it to the feature output file. The time cost of this extraction process is approximately 9.5 hours. The final file containing Edge Histogram feature for 100 million images is 20.3 GB.

  1. Mapping entries in image meta dataset to entries in feature dataset:

The metadata released by Yahoo! contains image ID and other information for the 100 million images. The feature files, however, contain MD-5 hashed image feature identifier. In order to link the feature data with metadata, a Python dictionary table (key: image ID, value: feature ID) has been built, which returns the feature ID for a given image ID.

  1. Construct feature set of the subset data:

As described in CW10, processing the whole 100 million image requires a lot of computer resources, which is only available in a computing cloud. For this project, a 5 million image subset, captured in Ireland and UK, was selected (See CW10).  Based on our previous experiment, the index model that is built from this sub data can fit into the RAM of a standard desktop machine.

To get the Edge Histogram features of the subset, a second dictionary has been built, which contains all the feature IDs of the selected subset as the key. Thus, for a given feature ID, if it is in this key set, then the feature is in the selected subset.

All features in the feature output file (created from step 1) had been looped through, line by line, if its feature ID is in the subset, it is saved in a separate sub-feature file. The size of the subset feature file is approximate 4.5G.

  1. Build ANN Index Model:

At this stage, having image IDs and feature IDs for the subset, an attempt to build ANNOY index model using unique image IDs has been made, however, unsuccessful. Two possible reasons for this may be that the image IDs were not in order, or that the image IDs were not in sequence (5 million images extracted from 100 million images).

The work this week was very challenging because of the number and size of files that were used to create the desired final output. The written Python programs took several hours to run and crashed many times. The next step is to set up a database, which will store the metadata of the dataset. Once the meta data is saved in the database, the integration and testing of the auto tagging system cab start.

References:

[1] Won, Chee Sun, Dong Kwon Park, and Soo-Jun Park. “Efficient use of MPEG-7 edge histogram descriptor.” Etri Journal 24.1 (2002): 23-30 [Online]. Available from: http://etrij.etri.re.kr/etrij/journal/article/article.do?volume=24&issue=1&page=23 [Last Accessed 29 March 2015]

[2] Manjunath, Bangalore S., et al. “Color and texture descriptors.” Circuits and Systems for Video Technology, IEEE Transactions on 11.6 (2001): 703-715.

[3] Won, Chee Sun. “Feature extraction and evaluation using edge histogram descriptor in MPEG-7.” Advances in Multimedia Information Processing-PCM 2004. Springer Berlin Heidelberg, 2005. 583-590.

 


CW 12 (16th March – 22nd March) Client: Graphical User Interface

The development of initial version of the client side system has been completed this week.

Java programming language has been chosen as the development language to enable easy integration of the feature extraction code with Graphical User Interface (GUI).

Specifically, Java 8, the newest release of Java has been used, as it provides many features and tools that were not available in previous version of Java, such as the build in JavaFX library.

JavaFX is a set of graphics and media packages that enables developers to design, create, test, debug, and deploy rich client applications that operate consistently across diverse platforms. More details about JavaFX can be found at the following link.

A new visual layout tool – JavaFX Scene Builder, is also released. It allows users quickly design JavaFX application user interfaces, without coding. Users can drag and drop UI components to a work area, modify their properties, apply style sheets, and the FXML code, for the layout that they are creating, is automatically generated in the background. The result is an FXML file that can then be combined with a Java project by binding the UI to the applications logic [1].

The design of the client side system follows the Model-View-Controller (MVC) [2] pattern, which separates presentation, logic and data.

JavaFX Scene Builder

Figure 1: JavaFX Scene Builder

The view requests information from the model and present these information to the user. In this client system, the view is built using JavaFX Scene Builder (shown in Figure 1).

The controller contains a series of functions that handle user inputs (such as a click of a button) or system events (e.g. image feature is extracted).

The model notifies its associated views and controllers when there has been a change in its state, e.g. processing results returned from the server.

Figure 2: Packages of the client side system

Figure 2: Packages of the client side system

A list of packages is shown in Figure 2. GUIPanel.fxml is the designed UI file generated from JavaFX Scene Builder and it is linked to GUIController.java which handles all the users interactions and system events. ModelContainer.java contains all the data. Utils package consists of few tools including: reading/writing files from/to hard drive, reading system time and converting to string format for display etc.

A screen shot of the client side GUI is shown in Figure 3. This initial version of the client system does not contain the functionality of data communication with the back end server.

Figure 3: A screenshot of Client side system User Interface

Figure 3: A screenshot of Client side system User Interface

The project source code is hosted on a cloud based revision control system called Bitbucket.

References:
[1] JavaFX Scene Builder – A Visual Layout Tool for JavaFX Application, http://www.oracle.com/technetwork/java/javase/downloads/javafxscenebuilder-info-2157684.html, (Last Accessed: 22 March  2015)
[2] Glenn E. Krasner and Stephen T. Pope. 1988. A cookbook for using the model-view controller user interface paradigm in Smalltalk-80. J. Object Oriented Program. 1, 3 (August 1988), 26-49.


CW 11 (9th March – 15th March) Quick Update

Due to a number of assignments and coming deadlines, this post is a short update of CW 11 work, which will be explained in details in next week’s post (e.g system design, library used etc.).

Development of the interface for demonstration purpose of the backend system has been started.

Figure 1: Initial design of interface for demonstration of the backend system

Figure 1: Initial design of interface for demonstration of the backend system

The following functionality has been designed and an initial test version has been developed (as shown in Figure 1):

  • uploading of query image from local file system or from URL
  • word cloud figure ( example shown is using dummy data which will be replaced with real image tags returned from backend system)
  • histogram and pie chart plot (example shown is using dummy data) and
  • a list of the most similar images to the query image are shown at the bottom (Figure 1 shows a sample image, which will be replaced by a list of images returned from backend system).

 


CW 10 (2nd March – 8th March) LIRe feature extraction, Annoy Index Model and Flickr Dataset Subset

Project is back on schedule!

As discussed in the previous post, caffe cannot be used due to the delay of planned release date of AlexNet features. Thus, for the development of this prototype, image feature extraction will be implemented using LIRe software library and integrated with Approximate Nearest Neighbor for indexing.

The Installation of LIRe library turned out to be much simpler than caffe. It depends on some other libraries and packages, but it is not as restrictive as caffe.
LIRe (Lucene Index Retrieval) [1] is an open source Java development library provided under GPL licence. It is built on top of the Lucene open source text search engine, provides text based retrieval and indexing [2]. As discussed in previous post, the same feature set released by Flickr can be extracted using LIRe library.

Based on the evaluation of advanced techniques in feature extraction and effective visual matching performed by Insight Centre, it appears that AlexNet Layer 7 features produce more accurate results [3][4][5].  Although, ideally AlexNet features could be extracted locally using caffe software library, for 100 million images, it would take 231 days (5 images/second) on a standard desktop machine with NVidia Titan graphics card installed. Also, a subset of the dataset could be used, but because of the time limitation the  prototype will be based on what is currently available, which is LIRe features even if it may not produce the same quality of results. However the prototype system designed is flexible, and LIRe features can be replaced with AlexNet features once these are available.

This week, a Java project has been developed, which extracts the following features from the input image (same as released features from Flickr): CEDD, Auto Color Correlogram, Basic features, Jpeg Coefficient Histogram, Color Layout, Edge Histogram, Fuzzy Color Histogram, Fuzzy Opponent Histogram, Gabor, Joint Histogram, Scalable Color and Tamura features. The extracted features are presented as a list of feature values.

ANNOY (Approximate Nearest Neighbor) is a C++ library with Python bindings, which is selected to index the Flickr image features dataset. This approach has been chosen based on the previous research work carried out by a group of researchers at Insight [3]. ANNOY creates static read only files, which means that the creation of the index is separate to loading and the model can be created once, and shared by different processes. Index trees created with ANNOY are very memory efficient and the performance is very good if the index fits RAM. This very well suits this project as millions of images with high dimension vectors need to be indexed.

Because LIRe is a Java library and the code for image extraction is written in Java, thus in order to integrate feature extraction with indexed model, Java version of ANNOY(Annoy-java https://github.com/spotify/annoy-java) will be used. However, annoy-java only implements loading trees built by the Python version of annoy; it cannot yet create its own. Thus, the ANN index model needs to be built in Python and the tree model can be loaded, once it is built for the real dataset, using Java to enable the search for similar images to the query image.

Several experiments have been carried out to analyze the performance of ANNOY in terms of memory usage and index model building time. These experiments will determine the hardware resources required for processing the entire dtaaset, or the number of images that can be processed given the resources available. Figure 1 shows the relative memory used and index built time vs. the number of images. The experiments have been ran on a standard desktop machine with 16 GB RAM, Intel i7-3770 CPU, Ubuntu 14.04 LTS OS installed.

Annoy Index Experiments Result

Figure 1: Annoy Index Experiments Result

In order to evaluate the performance of the system, a subset of the 100 million dataset is selected.

If the performance is satisfying, the size of the subset can be increased or the whole dataset can be used, depending on the time and resources needed and available. The five possible approaches to select a subset have been identified:
Linear select: every Xth image
Random select: select images on a random basis
Region (e.g. country, continent) selection based on the Geo-tag: There are 3rd party services that convert the geo-value to location, such as PyGeocoder. However, like most of other services, this python package is using Google map API at the backend, which only allows 25,000 requests per day and 5 requests per second. This means that for the full dataset it would take over 4,000 days to extract regions.
Extracting location from Flickr website: the link to the website is included in the image metadata. Experiment shows that extraction of geo location for 600 images take around 13 mins using single thread and 3.4 mins using 10 multi threads. Therefore if the whole dataset were to be used, using single thread would take approximately 578 days, 10 threads around 150 days.
– Box solution: In this work, box solution (developed by Insight researcher) was applied to select images from the whole dataset, which have Geo-tag value in the region of Ireland and UK. The speed of this solution is extremely fast, it takes 400s to process 100 million dataset (loading data file from hard drive is not included). After implementation of this solution, 5,339,258 images from Ireland and UK have be selected, which will be used for building and evaluation of the proposed system. According to the experiments above, the RAM resources are available to use this subset for prototyping.

This week I have got two major tasks done, which is development of query image feature extraction code and index model. The subset of the dataset has also been selected and is ready to use. As everything is now ready, the next step is to implement these three components into one working system. I believe that the biggest challenges have been overcome and the only challenge for the rest of the semester is the time. As I do not have any team members, writing project documentation, completing the project, and completing assignments for the other four modules is extremely challenging in terms of time.

References:
[1] Lux, Mathias. “Content based image retrieval with LIRe.” In Proceedings of the 19th ACM international conference on Multimedia, pp. 735-738. ACM, 2011.
[2] Lux, Mathias, and Savvas A. Chatzichristofis. “Lire: lucene image retrieval: an extensible java CBIR library.” Proceedings of the 16th ACM international conference on Multimedia. ACM, 2008.
[3] McGuinness, Kevin, et al. “Insight Centre for Data Analytics (DCU) at TRECVid 2014: instance search and semantic indexing tasks.” , 2014.
[4] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional
neural networks. In F. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25,pages 1097–1105. Curran Associates, Inc., 2012.
[5] Artem Babenko, Anton Slesarev, Alexander Chigorin, and Victor S. Lempitsky. Neural codes for image retrieval. In Computer Vision – ECCV 2014 – 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I, pages 584–599, 2014.