ユニファ開発者ブログ

ユニファ株式会社プロダクトデベロップメント本部メンバーによるブログです。

Multi-Model Attribute Generator

By Matthew Millar R&D Scientist at ユニファ

Purpose:

This blog is to show the development process of a new research paper that I am working on.
The goal of this string of blog posts is to slowly but surely develop a product that can aid in the data attribute labeling for humans and even other types of image data.
This can be used in several products from people identification, tracking, and statistical data analysis.
Are you ready? Try to keep up!

Attribute Recognition:

What is Attribute Recognition? It is the process of identifying what properties are present in an image. This is normally done on humans but can be done on pretty much anything from cities, cars, and even airplanes. The ability to predict the presence or absence of an item can be very beneficial. Tracking people, a safety check of a vehicle (like a bus or a plane) before departure, visual inspection of an assembled computer, even uses in nuclear power plants. A simple scan of an image can yield some very important warning which could be detected before a disaster can occur.

Data set:

The data-set that I will be using will be the Market-1501 data-set (Zheng et al., 2015) which is commonly used for the Re-identification problems. Why use this data-set? I am using this data-set because of the size and variety of people in the images. The image quality is akin to that of a standard security camera. There are varied backgrounds for each image which will only make the program stronger at generalization by avoiding the use of a cleaned, non-noisy data-set. This data-set will give us many attributes to extract over the next few weeks.

Step 1 Battle of the Sexes:

The first and possibly easiest attribute to check is the gender of a person. This will be easy as it can be a binary classification problem, so not that big of a deal. If your reading this then more likely than not have read a Dog and Cat classification post somewhere when you started out learning CNNs. The model that we will build will be similar so I will not go into great detail of the model itself.

Pre-process Steps:

The first step we need to take is the pre-processing of the images. First, we need to separate the images into the two classes (male, female). These will be our classes for training. Then we need to split the data-set into training and testing sets.
I will use Keras’s image generator to do this as it will not only save time, but I can do all the other pre-processing steps at the same time. This is a list of all possible random image augmentations that will be performed on each image along with some pre-processing steps that will always be performed.

f:id:unifa_tech:20190911111019p:plain
Augmentation Table

Here is the code for the generator for both training and validation data-set. By defining the image generators like this, it saved time splitting up the data-set yourself or having to load it into memory directly and use another python library to do the splitting.

train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    rotation_range=15,
    zoom_range=0.2,
    horizontal_flip=True,                              
    validation_split=0.2) # set validation split

train_generator = train_datagen.flow_from_directory(
    DATA_PATH,
    target_size=(224, 224),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    DATA_PATH, # same directory as training data
    target_size=(224, 224),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation') # set as validation data


Now with that defined we can then use this in training the model. The model will be a simple binary classification model. There is no real need to make it too complex as this is just one of many models that will be used in the product.

def build():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(224, 224, 3)))
    model.add(Activation('relu'))
    model.add(Dropout(0.3))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(64, (3, 3)))
    model.add(Conv2D(64, (3, 3)))
    model.add(Activation('relu'))
    model.add(Dropout(0.3))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    
    model.add(Conv2D(128, (3, 3)))
    model.add(Conv2D(128, (3, 3)))
    model.add(Activation('relu'))
    model.add(Dropout(0.3))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Flatten())
    model.add(Dense(128))
    model.add(Dense(128))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
  
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    return model

Seeing that this is a binary problem the use sigmoid is an appropriate activation layer here. The model is not that deep as it is only 2 fully connected layers and one fully connected output layer.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 222, 222, 32)      896       
_________________________________________________________________
activation_1 (Activation)    (None, 222, 222, 32)      0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 222, 222, 32)      0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 111, 111, 32)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 109, 109, 64)      18496     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 107, 107, 64)      36928     
_________________________________________________________________
activation_2 (Activation)    (None, 107, 107, 64)      0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 107, 107, 64)      0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 53, 53, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 51, 51, 128)       73856     
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 49, 49, 128)       147584    
_________________________________________________________________
activation_3 (Activation)    (None, 49, 49, 128)       0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 49, 49, 128)       0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 24, 24, 128)       0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 73728)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               9437312   
_________________________________________________________________
dense_2 (Dense)              (None, 128)               16512     
_________________________________________________________________
activation_4 (Activation)    (None, 128)               0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 129       
_________________________________________________________________
activation_5 (Activation)    (None, 1)                 0         
=================================================================
Total params: 9,731,713
Trainable params: 9,731,713
Non-trainable params: 0
__________________________________________

Now with the model defined we will turn our attention to training the model.

opt = SGD(lr=LR, momentum=0.9, decay=LR / EPOCHS)
model = build(224, 224, 1)
model.compile(loss="binary_crossentropy", optimizer=opt,metrics=["accuracy"])
filepath= "GenderID-{epoch:02d}-{val_acc:.4f}.ckpt"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max', save_weights_only=False)
callbacks_list = [checkpoint]

model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.samples // BATCH_SIZE,
    validation_data = validation_generator, 
    validation_steps = validation_generator.samples // BATCH_SIZE,
    epochs = EPOCHS,
    verbose=1,
    callbacks=callbacks_list)

As you can see the model started to produce pretty good results (~80% validation accuracy) after training.

Epoch 1/10
322/322 [==============================] - 139s 432ms/step - loss: 0.6444 - acc: 0.6356 - val_loss: 0.6126 - val_acc: 0.7211

Epoch 00001: val_acc improved from -inf to 0.72109, saving model to GenderID-01-0.7211.ckpt
Epoch 2/10
322/322 [==============================] - 128s 398ms/step - loss: 0.5833 - acc: 0.6987 - val_loss: 0.5848 - val_acc: 0.7490

Epoch 00002: val_acc improved from 0.72109 to 0.74902, saving model to GenderID-02-0.7490.ckpt
Epoch 3/10
322/322 [==============================] - 128s 399ms/step - loss: 0.5459 - acc: 0.7334 - val_loss: 0.5795 - val_acc: 0.7565

Epoch 00003: val_acc improved from 0.74902 to 0.75647, saving model to GenderID-03-0.7565.ckpt
Epoch 4/10
322/322 [==============================] - 125s 388ms/step - loss: 0.5208 - acc: 0.7462 - val_loss: 0.5736 - val_acc: 0.7137

Epoch 00004: val_acc did not improve from 0.75647
Epoch 5/10
322/322 [==============================] - 125s 390ms/step - loss: 0.4986 - acc: 0.7637 - val_loss: 0.5472 - val_acc: 0.7212

Epoch 00005: val_acc did not improve from 0.75647
Epoch 6/10
322/322 [==============================] - 124s 384ms/step - loss: 0.4912 - acc: 0.7667 - val_loss: 0.5136 - val_acc: 0.7851

Epoch 00006: val_acc improved from 0.75647 to 0.78510, saving model to GenderID-06-0.7851.ckpt
Epoch 7/10
322/322 [==============================] - 124s 384ms/step - loss: 0.4674 - acc: 0.7799 - val_loss: 0.5209 - val_acc: 0.7745

Epoch 00007: val_acc did not improve from 0.78510
Epoch 8/10
322/322 [==============================] - 124s 385ms/step - loss: 0.4485 - acc: 0.7925 - val_loss: 0.4978 - val_acc: 0.7643

Epoch 00008: val_acc did not improve from 0.78510
Epoch 9/10
322/322 [==============================] - 123s 381ms/step - loss: 0.4323 - acc: 0.8022 - val_loss: 0.5000 - val_acc: 0.7737

Epoch 00009: val_acc did not improve from 0.78510
Epoch 10/10
322/322 [==============================] - 124s 386ms/step - loss: 0.4277 - acc: 0.8037 - val_loss: 0.5061 - val_acc: 0.7565

Epoch 00010: val_acc did not improve from 0.78510

Testing

Testing on some images of both male and female the model did as expected ok.
For men, the accuracy was 65.17 % correct.
And for women, the accuracy was 48.36 % correct
So the model is a little more accurate for detecting men than women in the end.
With a total accuracy of 58.36% which is ok a little better than guessing randomly so I will take that as a win.

CONCLUSION:

Now we can see the model is accurate for this complex problem. But how can we improve this model? Some improvements can be done by using a pre-train model to aid in the feature extraction of an image along with better data augmentation techniques.
The model can successfully predict if a person in an image is a man or a woman without the use of faces which is a very difficult task. Why is this important? This will allow for telling if someone sex from a distance even if their face is obscured by clothing or a jacket. So you can use lower resolution security cameras and still with a certain accuracy tell if the person is a man or a woman.

Future Improvement:

From here I will add in layer initializers, deepen the network, add in a pre-trained fine turned model, and improve the data augmentation for the model. This should give a little better results and possibly reaching my goal of 65% which would be a very good model for this particular task.

References:

L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable Person Re-identification: A Benchmark," 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, pp. 1116-1124.
doi: 10.1109/ICCV.2015.133