Auto-encoder to Neural Network Learning in Pytorch

By Matthew Millar R&D Scientist at ユニファ

Purpose:

This blog will cover a method for combining unsupervised learning with supervised learning. I will show how to use an autoencoder and combine that with a neural network for a classification problem in Pytorch.

Data Processing:

The first step will be easy as the same dataloader can be used for both training the autoencoder and the neural network.
I will be using the cifar10 dataset as this is available to everyone and is easy to deal with.

#Basic Transforms
SIZE = (32,32) # Resize the image to this shape
# Test and basic transform. This will reshape and then transform the raw image into a tensor for pytorch
basic = transforms.Compose([transforms.Resize(SIZE),
                            transforms.ToTensor()])

# Normalized transforms (0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261) retrived from here https://github.com/kuangliu/pytorch-cifar/issues/19
mean = (0.4914, 0.4822, 0.4465) # Mean
std = (0.247, 0.243, 0.261) # Standard deviation
# This will transform the image to the Size and then normalize the image
norm_tran = transforms.Compose([transforms.Resize(SIZE),
                                transforms.ToTensor(), 
                                transforms.Normalize(mean=mean, std=std)])

#Simple Data Augmentation
# Data augmentations
'''
Randomly flip the images both virtically and horizontally this will cover and orientation for images
Randomly rotate the image by 15. This will give images even more orientation than before but with limiting the black board issue of rotations
Random Resie and crop this will resize the image and remove any excess to act like a zoom feature
Normalize each image and make it a tensor
'''
aug_tran = transforms.Compose([transforms.RandomHorizontalFlip(),
                               transforms.RandomRotation(15),
                               transforms.RandomResizedCrop(SIZE, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=3),
                               transforms.ToTensor(),
                               transforms.Normalize(mean=mean, std=std)])

# Create Dataset
train_dataset = datasets.ImageFolder(TRAIN_DIR, transform=aug_tran)
test_dataset  = datasets.ImageFolder(TEST_DIR, transform=norm_tran) #No augmentation for testing sets
# Data loaders
# Parameters for setting up data loaders
BATCH_SIZE = 32
NUM_WORKERS = 4
VALIDATION_SIZE = 0.15

# Validatiaon split
num_train = len(train_dataset) # Number of training samples
indices = list(range(num_train)) # Create indices for each set
np.random.shuffle(indices) # Randomlly sample each of these by shuffling
split = int(np.floor(VALIDATION_SIZE * num_train)) # Create the split for validation
train_idx , val_idx = indices[split:], indices[:split] # Create the train and validation sets
train_sampler = SubsetRandomSampler(train_idx) # Subsample using pytroch
validation_sampler = SubsetRandomSampler(val_idx) # same here but for validation

# Create the data loaders
train_loader = DataLoader(train_dataset, 
                          batch_size=BATCH_SIZE,
                          sampler=train_sampler, 
                          num_workers=NUM_WORKERS)

validation_loader = DataLoader(train_dataset, 
                               batch_size=BATCH_SIZE,
                               sampler=validation_sampler,
                               num_workers=NUM_WORKERS)

test_loader = DataLoader(test_dataset, 
                         batch_size=BATCH_SIZE, 
                         shuffle=False, 
                         num_workers=NUM_WORKERS)

Also, I have a list of dataloaders on my Kaggle page for both Pytorh and Keras if you would like to learn how to build out custom dataloader and datasets with both languages.
https://www.kaggle.com/matthewmillar/pytorchdataloaderexamples
https://www.kaggle.com/matthewmillar/kerasgeneratorexamples

Autoencoder:

An autoencoder is an unsupervised method of learning encodings of data which that can be processed efficiently. This is done through dimension reduction and ignoring noise in the dataset. There are two sides to an autoencoder. The encoder and the decoder. The encoder job is to create a useful encoding that will remove unwanted noise in the dataset while keeping the most import parts of the data. The decoder job is to take the encodings and reassemble it into the original input form. Below is the Autoencoder that we will be using as the feature extraction system in our combination model.

The approach that will be taken is to train the autoencoder separately instead of together with the NN. This will allow for us to check the result of the output of the encoder as well as the decoder and see how well it works.

# define the NN architecture
class ConvAutoencoder(nn.Module):
    def __init__(self):
        super(ConvAutoencoder, self).__init__()
        ## encoder layers ##
        # conv layer (depth from 1 --> 16), 3x3 kernels
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)  
        # conv layer (depth from 16 --> 4), 3x3 kernels
        self.conv2 = nn.Conv2d(16, 4, 3, padding=1)
        # pooling layer to reduce x-y dims by two; kernel and stride of 2
        self.pool = nn.MaxPool2d(2, 2)
        
        ## decoder layers ##
        ## a kernel of 2 and a stride of 2 will increase the spatial dims by 2
        self.t_conv1 = nn.ConvTranspose2d(4, 16, 2, stride=2)
        self.t_conv2 = nn.ConvTranspose2d(16, 3, 2, stride=2)


    def forward(self, x):
        ## encode ##
        # add hidden layers with relu activation function
        # and maxpooling after
        x = torch.relu(self.conv1(x))
        x = self.pool(x)
        # add second hidden layer
        x = torch.relu(self.conv2(x))
        x = self.pool(x)  # compressed representation
        
        ## decode ##
        # add transpose conv layers, with relu activation function
        x = torch.relu(self.t_conv1(x))
        # output layer (with sigmoid for scaling from 0 to 1)
        x = torch.sigmoid(self.t_conv2(x))
                
        return x

# Loss and optimizers
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(ae_model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer, mode='min', factor=0.1, patience=3, verbose=True) # Automatically reduce learning rate on plateau

# number of epochs to train the model
n_epochs = 35
ae_model_filename = 'cifar_autoencoder.pt'
train_loss_min = np.Inf # track change in training loss

ae_train_loss_matrix = []
for epoch in range(1, n_epochs+1):
    # monitor training loss
    train_loss = 0.0
    
    ###################
    # train the model #
    ###################
    for data in train_loader:
        # _ stands in for labels, here
        # no need to flatten images
        
        images, _ = data
        if use_gpu:
            images = images.cuda()
        # clear the gradients of all optimized variables
        optimizer.zero_grad()
        # forward pass: compute predicted outputs by passing inputs to the model
        outputs = ae_model(images)
        # calculate the loss
        loss = loss_function(outputs, images)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()
        # update running training loss
        train_loss += loss.item()*images.size(0)
            
    # print avg training statistics 
    train_loss = train_loss/len(train_loader)
    scheduler.step(train_loss)
    ae_train_loss_matrix.append([train_loss, epoch])
    
    print('Epoch: {} \tTraining Loss: {:.6f}'.format(epoch, train_loss))
    
    # save model if validation loss has decreased
    if train_loss <= train_loss_min:
        print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
        train_loss_min,
        train_loss))
        torch.save(ae_model.state_dict(), ae_model_filename)
        train_loss_min = train_loss

f:id:unifa_tech:20200424135716p:plain — AE Loss

f:id:unifa_tech:20200424135814p:plain — Encoder Results

Looking at the above image the encoder works ok so we can use this with confidence.

Neural Network.

This will be the classification and supervised learning section of the model. The first this we need to do is freeze the autoencoder to ensure that its weights and bias do not get updated during training. Now we will define the NN using the autoencoder maxpooling layer as the output (the encoder part) and add on top of that Fully connected layers with a dropout layer as well to help normalize the output.
Here is the training code.

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        image_modules = list(ae_model.children())[:-2] #get only the encoder layers
        self.modelA = nn.Sequential(*image_modules)
        # Shape of max pool = 4, 112, 112
        self.fc1 = nn.Linear(4*16*16, 1024)
        self.fc2 = nn.Linear(1024,512)
        self.out = nn.Linear(512, 10)
        
        self.drop = nn.Dropout(0.2)
        
    def forward(self, x):
        x = self.modelA(x)
        x = x.view(x.size(0),4*16*16)
        x = torch.relu(self.fc1(x))
        x = self.drop(x)
        x = torch.relu(self.fc2(x))
        x = self.drop(x)
        x = self.out(x)
        return x

#Freze the autoencoder layers so they do not train. We did that already
# Train only the linear layers
for child in model.children():
    if isinstance(child, nn.Linear):
        print("Setting Layer {} to be trainable".format(child))
        for param in child.parameters():
            param.requires_grad = True
    else:
        for param in child.parameters():
            param.requires_grad = False

# Optimizer and Loss function
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr= 0.001)
# Decay LR by a factor of 0.1 every 7 epochs
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer, mode='min', factor=0.1, patience=3, verbose=True)

model_filename = 'model_cifar10.pt'
n_epochs = 40
valid_loss_min = np.Inf # track change in validation loss
train_loss_matrix = []
val_loss_matrix = []
val_acc_matrix = []

for epoch in range(1, n_epochs+1):

    # keep track of training and validation loss
    train_loss = 0.0
    valid_loss = 0.0
    
    train_correct = 0
    train_total = 0
    
    val_correct = 0
    val_total = 0
    
    
    ###################
    # train the model #
    ###################
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # move tensors to GPU if CUDA is available
        if use_gpu:
            data, target = data.cuda(), target.cuda()
        # clear the gradients of all optimized variables
        optimizer.zero_grad()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the batch loss
        loss = criterion(output, target)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()
        # update training loss
        train_loss += loss.item()*data.size(0)
        
        
    ######################    
    # validate the model #
    ######################
    model.eval()
    val_acc = 0.0
    for batch_idx, (data, target) in enumerate(validation_loader):
        # move tensors to GPU if CUDA is available
        if use_gpu:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the batch loss
        loss = criterion(output, target)
        # update average validation loss 
        valid_loss += loss.item()*data.size(0)
        
        val_acc += calc_accuracy(output, target)
        
        
    
    # calculate average losses
    train_loss = train_loss/len(train_loader.sampler)
    valid_loss = valid_loss/len(validation_loader.sampler)
    #exp_lr_scheduler.step()
    scheduler.step(valid_loss)
    
    # Add losses and acc to plot latter
    train_loss_matrix.append([train_loss, epoch])
    val_loss_matrix.append([valid_loss, epoch])
    val_acc_matrix.append([val_acc, epoch])
        
    # print training/validation statistics 
    print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}\tValidation Accuracy: {:.6f}'.format(
        epoch, train_loss, valid_loss, val_acc))
    
    # save model if validation loss has decreased
    if valid_loss <= valid_loss_min:
        print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
        valid_loss_min,valid_loss))
        
        torch.save(model.state_dict(), model_filename)
        valid_loss_min = valid_loss

Training the model will give the final accuracy for each class.

Test Accuracy of airplane: 45% (231/504)
Test Accuracy of automobile: 61% (312/504)
Test Accuracy of  bird: 18% (91/496)
Test Accuracy of   cat: 11% (55/496)
Test Accuracy of  deer: 27% (139/504)
Test Accuracy of   dog: 35% (181/504)
Test Accuracy of  frog: 63% (315/496)
Test Accuracy of horse: 49% (244/496)
Test Accuracy of  ship: 59% (298/504)
Test Accuracy of truck: 46% (234/504)

Test Accuracy (Overall): 41% (2100/5008)

Conclusion:

Looking at the loss and validation accuracy the accuracy is moving up steadily (all be it a little jumpy) while the losses are both decreasing with the validation loss consistently less than training loss. This shows that the model is not overfitting or underfitting, so it is learning well going forward. The accuracy is a little low compared to simply supervised learning, but giving enough time the accuracy could get higher.