This blog is to show the development process of a new research paper that I am working on.
The goal of this string of blog posts is to slowly but surely develop a product that can aid in the data attribute labeling for humans and even other types of image data.
This can be used in several products from people identification, tracking, and statistical data analysis.
Are you ready? Try to keep up!
What is Attribute Recognition? It is the process of identifying what properties are present in an image. This is normally done on humans but can be done on pretty much anything from cities, cars, and even airplanes. The ability to predict the presence or absence of an item can be very beneficial. Tracking people, a safety check of a vehicle (like a bus or a plane) before departure, visual inspection of an assembled computer, even uses in nuclear power plants. A simple scan of an image can yield some very important warning which could be detected before a disaster can occur.
The data-set that I will be using will be the Market-1501 data-set (Zheng et al., 2015) which is commonly used for the Re-identification problems. Why use this data-set? I am using this data-set because of the size and variety of people in the images. The image quality is akin to that of a standard security camera. There are varied backgrounds for each image which will only make the program stronger at generalization by avoiding the use of a cleaned, non-noisy data-set. This data-set will give us many attributes to extract over the next few weeks.
Step 1 Battle of the Sexes:
The first and possibly easiest attribute to check is the gender of a person. This will be easy as it can be a binary classification problem, so not that big of a deal. If your reading this then more likely than not have read a Dog and Cat classification post somewhere when you started out learning CNNs. The model that we will build will be similar so I will not go into great detail of the model itself.
The first step we need to take is the pre-processing of the images. First, we need to separate the images into the two classes (male, female). These will be our classes for training. Then we need to split the data-set into training and testing sets.
I will use Keras’s image generator to do this as it will not only save time, but I can do all the other pre-processing steps at the same time. This is a list of all possible random image augmentations that will be performed on each image along with some pre-processing steps that will always be performed.
Here is the code for the generator for both training and validation data-set. By defining the image generators like this, it saved time splitting up the data-set yourself or having to load it into memory directly and use another python library to do the splitting.
train_datagen = ImageDataGenerator(rescale=1./255,
validation_split=0.2) # set validation split
train_generator = train_datagen.flow_from_directory(
subset='training') # set as training data
validation_generator = train_datagen.flow_from_directory(
DATA_PATH, # same directory as training data
subset='validation') # set as validation data
Now with that defined we can then use this in training the model. The model will be a simple binary classification model. There is no real need to make it too complex as this is just one of many models that will be used in the product.
As you can see the model started to produce pretty good results (~80% validation accuracy) after training.
322/322 [==============================] - 139s 432ms/step - loss: 0.6444 - acc: 0.6356 - val_loss: 0.6126 - val_acc: 0.7211
Epoch 00001: val_acc improved from -inf to 0.72109, saving model to GenderID-01-0.7211.ckpt
322/322 [==============================] - 128s 398ms/step - loss: 0.5833 - acc: 0.6987 - val_loss: 0.5848 - val_acc: 0.7490
Epoch 00002: val_acc improved from 0.72109 to 0.74902, saving model to GenderID-02-0.7490.ckpt
322/322 [==============================] - 128s 399ms/step - loss: 0.5459 - acc: 0.7334 - val_loss: 0.5795 - val_acc: 0.7565
Epoch 00003: val_acc improved from 0.74902 to 0.75647, saving model to GenderID-03-0.7565.ckpt
322/322 [==============================] - 125s 388ms/step - loss: 0.5208 - acc: 0.7462 - val_loss: 0.5736 - val_acc: 0.7137
Epoch 00004: val_acc did not improve from 0.75647
322/322 [==============================] - 125s 390ms/step - loss: 0.4986 - acc: 0.7637 - val_loss: 0.5472 - val_acc: 0.7212
Epoch 00005: val_acc did not improve from 0.75647
322/322 [==============================] - 124s 384ms/step - loss: 0.4912 - acc: 0.7667 - val_loss: 0.5136 - val_acc: 0.7851
Epoch 00006: val_acc improved from 0.75647 to 0.78510, saving model to GenderID-06-0.7851.ckpt
322/322 [==============================] - 124s 384ms/step - loss: 0.4674 - acc: 0.7799 - val_loss: 0.5209 - val_acc: 0.7745
Epoch 00007: val_acc did not improve from 0.78510
322/322 [==============================] - 124s 385ms/step - loss: 0.4485 - acc: 0.7925 - val_loss: 0.4978 - val_acc: 0.7643
Epoch 00008: val_acc did not improve from 0.78510
322/322 [==============================] - 123s 381ms/step - loss: 0.4323 - acc: 0.8022 - val_loss: 0.5000 - val_acc: 0.7737
Epoch 00009: val_acc did not improve from 0.78510
322/322 [==============================] - 124s 386ms/step - loss: 0.4277 - acc: 0.8037 - val_loss: 0.5061 - val_acc: 0.7565
Epoch 00010: val_acc did not improve from 0.78510
Testing on some images of both male and female the model did as expected ok.
For men, the accuracy was 65.17 % correct.
And for women, the accuracy was 48.36 % correct
So the model is a little more accurate for detecting men than women in the end.
With a total accuracy of 58.36% which is ok a little better than guessing randomly so I will take that as a win.
Now we can see the model is accurate for this complex problem. But how can we improve this model? Some improvements can be done by using a pre-train model to aid in the feature extraction of an image along with better data augmentation techniques.
The model can successfully predict if a person in an image is a man or a woman without the use of faces which is a very difficult task. Why is this important? This will allow for telling if someone sex from a distance even if their face is obscured by clothing or a jacket. So you can use lower resolution security cameras and still with a certain accuracy tell if the person is a man or a woman.
From here I will add in layer initializers, deepen the network, add in a pre-trained fine turned model, and improve the data augmentation for the model. This should give a little better results and possibly reaching my goal of 65% which would be a very good model for this particular task.
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable Person Re-identification: A Benchmark," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1116-1124.
lane : refresh_dsyms do|options|# バージョンの指定がなければ、最新バージョンのdSYMファイルをダウンロードする