Monthly Archives: April 2015

CW 16 (13th April – 19th April) Final Post… Establishing Communication Channel and Integrating the Components

The prototype of the system is now complete and the Final Delivery Documentation is written, therefore this is the last blog post.

This week the communication between client and the server has been established and all the building blocks have been integrated into a complete system.

  • The index model has been loaded into the Django server when it starts, to enable the search for IDs of similar images.
  • The mongoDB is now automatically searched for images with returned IDs from the index model.
  • The tags are extracted from the images and are analysed. The frequency of their occurrences are obtained.
  • A word cloud image is generated on the server with the extracted tags. The code is creating a unique name every time the image is generated. Currently, the word cloud images are stored on the server but ideally, it should be deleted when the session ends, or generated on the client side in real deployed system.
  • The communication channel built between the client and server enables the result to be sent back to the client
  • The client generates the pie chart and histogram and displays the result to the user.

The three design patterns used during the development of this prototype are:

  1. Model View Controller (MVC) described in CW 12.
  2. Singleton Pattern which ensures the uniqueness of the instance in the class and provides a global access to that instance. The ModelContainer.java in Model package is a singleton class to ensure data consistency and easy access.
  3. Observer and Observable is used to notify the controller (observer) of any changes in the data hold in the model (observable).

During the development of the web server, many errors have been encountered. Some of them were simple to fix, and some needed a deeper analysis. Few examples are described below.

  • ASCII Error

Some of the tags extracted from the similar images to the query image, were not in ASCII format. The word cloud could not be generated because of this. To solve the problem, these should have been decoded, but because of the time limitation, statements to ignore the characters have been written.

  • KeyError

The error shown in Figure 1, was caused because of the inconsistent name for feature values. In JSON data, the name of the feature value has been defined as ‘featureValue’ and on the server as ‘feature’.

Figure 1: Snapshot of the Key Error

Figure 1: Snapshot of the Key Error

  • Value Error

The value error as shown in Figure 2, occurred because JSON sends data as a string from the client to the server and it needs to be converted into a number. The occurrence of the squared bracket caused the problem. It had to be removed to allow the conversion to the number.

Figure 2: Snapshot of the Value Error

Figure 2: Snapshot of the Value Error

  • Argument Error

Wrong parameter has been passed to the load index function. Instead of the feature dimension, which is required, a variable ‘f’ has been passed, which has previously been defined as a file holder. The snapshot of the argument error is shown in Figure 3 below.

Figure 3: Snapshot of Argument Error

Figure 3: Snapshot of Argument Error

 

In addition to the completion of the system development, work on the Final Delivery Documentation has been continued. The rest of the sections had been written and submitted for review to my supervisor.

The document will be corrected according to my supervisor’s suggestions and submitted on the 25th April as .pdf and in the printed version.

This project has been very challenging and the experience and knowledge gained is invaluable.

 


CW 15 (6th April – 12th April) Web Server with DJango

The building blocks of the overall system are now ready. The next milestone is to establish a communication between the client and the server and integrate the final system.

The communication between the client and the server is as follows:

  • The user selects an image from local file system or provides an image URL from the web.
  • When the process button is clicked, a feature set of the image is extracted using LIRE feature extraction library.
  • The client sends a HTTP request to the server with the extracted feature values in JSON data interchange format.
  • The server receives the request and searches the pre-built ANN index model, for similar entries. The index model returns index IDs of these similar images.
  • The metadata collection in MongoDB is searched, based on these IDs. Tags together with the image URLs and other information are extracted.
  • Tag analysis takes place on the server. The word cloud image is generated from the extracted tags.
  • The server returns the data to the client. Client’s Graphical User Interface displays the word cloud, pie chart, histogram and the first 5 of the most similar images.

Originally, a socket server was planned to be used to build a communication channel between the client and the server. However, after reading some online forums and discussing with my  mentor, it was decided to build a web server instead.

Some of the reasons for making this decision include the richer functionality which web server offers and easier maintenance and expansion at a later stage, compared to socket server.

To do this, Django has been chosen. It is an open source, high-level Python Web framework which reduces the amount of repetitive and time consuming tasks by for example by providing shortcuts for frequent programming tasks [1].

Django automatically creates project with several files such as settings, url and WSGI. These files can be later modified to suit the specific needs.

To host our service, a Django app named flickrservices is created and registered in the setting.py file:

INSTALLED_APPS = (
‘django.contrib.admin’,
‘django.contrib.auth’,
‘django.contrib.contenttypes’,
‘django.contrib.sessions’,
‘django.contrib.messages’,
‘django.contrib.staticfiles’,
flickrservices‘,  # the application created to hold the processing functionality
)

The url.py file parses the url provided by the user through the web browser and points to the functions in the application to process the user request.

The url.py file contains:

url(r’^$’, ‘flickrservices.views.jsonRespone’, name=’jsonRespone’),
# ^$ means any request will be redirected to flickrservices.views.jsonRespone function.

The flickrservice.views contains the functions for processing the request and returning the result to the client user interface. At the moment, it only contains the test code which has been created to get started with the development and find out how Django works. Next week (CW16), this file will include all the code necessary to process the user request and return the result back to the client user interface such as searching the index, processing the extracted tags etc., which is already developed and needs to be slightly modified.

@csrf_exempt
def jsonRespone(request):

#  for Testing
# if the client sends raw json data:
#received_json_data=json.loads(request.body)
#objs = json.loads(request.raw_post_data)
# objs = json.loads(request.POST[‘jsonData’])
#json_data = request.POST.get(‘details’, None)
#if json_data:
#    data = json.loads(json_data)
#    response_data = {}
#    response_data[‘result’] = ‘ok’
#    response_data[‘wordMapURL’] = ‘/static/template/images/wordcloud1.jpg’

if request.method == ‘POST’:

received_json_data = request.POST.get(‘details’, None) # the data object are the feature values sent from the client

if received_json_data:
data = json.loads(received_json_data)

// TODO: search index model to get a list of the ids of similar images
// TODO: search mongoDB to get the tags of all similar images and the first 5 most similar image urls.
// TODO: process the tags, statistical analysis for pie and bar char.
// TODO: generate word cloud image
// TODO: encode the results into JSON array and return to client

 

response_data = {}
response_data[‘result’] = ‘ok’
response_data[‘wordMapURL’] = ‘/images/wordcloud1.jpg’

# how you return json data before django 1.7
return HttpResponse(json.dumps(response_data), content_type=”application/json”)

#return JsonResponse(response_data)

To learn how to get started with the development with Django I have read parts of the “The Django Book” which is open source and available online. I have also watched tutorials on YouTube. The first seven tutorials from the given link, are sufficient for the needs of this project.

Figure 1, illustrates the Django project that is created from the above procedure. PyCharm, a Python IDE is used for Django development.

Figure 1: Snapshot of Django project and files in PyCharm IDE

Figure 1: Snapshot of Django project and files in PyCharm IDE

This week, the query image feature extraction code has been integrated into the user interface and therefore when the user selects an image and clicks the process button, the features are extracted and sent to the server.

The work on the Final Delivery Documentation has also been continued this week. The following sections: Software Architecture, Database Design, Data Flow Diagram, UML Cases, Competitors, Time Description, Timeline and Costing for Completion of Development have been written.

To create the Data Flow diagram and UML cases I have used the following book for reference (Kendall, Kenneth E.; Kendall, Julie E., Systems Analysis and Design: Global Edition, 8/E, Pearson 2011. Pg. 221-255 and 63-74)

2015-04-12 17.48.34

 

To help me identify the development stages and tasks within these stages, for completing the development of this prototype as a web service, I have referred to the following book (Erl, T., Service-Oriented Architecture, Prentice Hall 2012, Pg.357-586)

2015-04-12 17.46.08

 

References:
[1] HOLOVATY, A., and J. KAPLAN-MOSS. “The Django Book: Version 2.0.” The Django Book 16 (2009) [Online]. Available from: http://www.djangobook.com/en/2.0/index.html


CW 14 (30th March – 5th April) Mongo DB

This week mongoDB has been installed to store the image subset metadata for tags retrieval.

MongoDB is an open source, NoSQL database based on key-value store, where each item in the database is stored as an attribute together with its value. MongoDB has a dynamic schema which allows insertion of extra fields at any time, because the schema is not predefined as in the case of relational databases.

MongoDB has been chosen as it is a free, open source, NoSQL database. It is easy to install and use and provides good documentation covering all aspects of deployment and usage as well as additional resources such as user groups, guides, webinars and many more. Because of its dynamic schema, new fields can be inserted at any time, which is especially useful for this project, as the requirements are not known well in advance.

Following installation guide available from the official mongoDB website, the installation process was very smooth and quick. No problems were encountered.

To learn how to use MongoDB and how to insert the required data, I have used the ‘Reference’ section of mongoDB manual. Each subsection contains a clear explanation with specific examples and references to other material. MogoDB official website also provides free online courses at https://university.mongodb.com/.
To insert the subset metadata and index model ID into mongoDB, a python program has been written.

Screenshot from 2015-04-05 10_55_51

Figure 1: Snapshot of program to insert data into MongoDB

All python programs have been written in iPython notebook, as shown in Figure 1, which is a web based interactive python development environment.

Figure 2 below demonstrates the output of findOne() command, that returns documents which satisfy the queried criteria. In Figure 2, the specified criteria is for the first document from the Ireland and UK image collection, which is the metadata for the first image from the subset.

Figure 2: MongoDB result of findOne() command

Figure 2: MongoDB result of findOne() command

In addition to the work on MongoDB, Executive Summary of the project, Service Description, Service Analysis, Management Report and Interface Rationale section has been written for Final Delivery Documentation.

The work over the next 3 weeks will involve building a web server, integrating it with the user interface and continuing the writing of the Final Delivery Documentation.