こんにちは、R&Dチームの宮崎です。 DeepLearningの有名どころの画像分類モデルを用いて、CPU環境での推論時の処理速度を計測しましたので、共有したいと思います。

背景

DeepLearningモデルをサービスに適用する際、精度と同様に推論時の処理速度も重要になってきます。AWSなど時間課金の基盤上でモデルを動かす場合、処理速度が2倍になると運用にかかるコストは1/2になるためです。しかし、論文などでは精度やパラメータ数、FLOPSなどについては言及されているものの、処理速度についてはあまり記述がありません。あったとしてもGPU上での値だったりします。これらの値はCPUで動かした際の処理速度と比例しないため、モデル開発後にデプロイして評価してみると、想定より遅く、こんなはずじゃなかったとなったりします。

一方で、DeepLearningのモデルは、同じネットワーク構造と実行環境であれば、推論するラベルや重みが異なっても処理速度はほぼ変わらないことが期待されます。そこで今回、よく使われる画像分類モデルについて、CPU環境での処理速度を計測しました。皆様のモデル選択のご参考にしていただけると幸いです。

計測方法

今回の計測は以下の設定の環境で、画像100枚をシーケンシャルに推論し、1枚あたりの平均処理時間を求めました。

基盤: AWS ECS FargateSpot
CPU: 1 vCPU
Memory: 8GB (通常は2GBで事足りると思いますが、今回はコンテナ内に複数モデル格納したため、多く使用しています)
入力画像: COCO valデータセットの内100枚
モデル
- EfficientNet: https://github.com/qubvel/efficientnet/
- その他: https://github.com/keras-team/keras-applications/
重み: ImageNetの事前学習済みモデル
バッチサイズ: 8
推論タイプ:
- ローカルモデル: ローカルディスクのモデルを読み込んで推論
- TensorFlow Serving: TensorFlow Servingに推論リクエストを送信
ソースコード: ページ末尾に添付

計測結果

計測結果のバブルチャートです。縦軸がImageNetのTop-1 Accuracy、横軸がローカルモデルを用いた画像一枚あたりの処理時間(対数スケール)、バブルの大きさがモデルのパラメータ数です。ImageNetのTop-1 Accuracyは今回計測せず、各モデルのページから引用しました。チャートの左上にあるほど、高速で精度が良いモデルということになります。

f:id:unifa_tech:20200603135119j:plain — モデル毎の処理速度

詳細な値は以下となります。

モデル名	処理速度 [秒/枚] (ローカル)	処理速度 [秒/枚] (TF Serving)	ImageNet Top-1 Accuracy	パラメータ数
VGG16	0.990	0.987	0.713	138,357,544
MobileNetV2	0.064	0.075	0.713	3,538,984
ResNet50	0.286	0.288	0.749	25,636,712
InceptionV3	0.382	0.390	0.779	23,851,784
Xception	0.743	0.754	0.790	22,910,480
EfficientNetB0	0.115	0.120	0.772	5,330,564
EfficientNetB1	0.188	0.199	0.791	7,856,232
EfficientNetB2	0.255	0.264	0.802	9,177,562
EfficientNetB3	0.472	0.494	0.816	12,320,528
EfficientNetB4	1.487	1.536	0.830	19,466,816
EfficientNetB5	3.501	3.511	0.837	30,562,520
EfficientNetB6	6.078	6.197	0.841	43,265,136
EfficientNetB7	12.226	12.250	0.844	66,658,680

考察

例えば、InceptionV3とXceptionを比べると、パラメータ数はInceptionV3の方が多いですが、処理時間は半分ほどで、パラメータ数と処理速度が比例しないことがわかります。Xceptionに対し、InceptionV3の精度は約1.1%下がる程度のため、コストパフォーマンスはInceptionV3の方が高く感じます。また、EfficientNetシリーズは総じて高精度ですが、EfficientNetB4以降は画像1枚あたりに1秒以上かかってしまうため、選択する際はコスト面をよく考慮する必要があります。

ローカルモデルとTensorFlow Servingを比べると画像の転送時間もあり、TensorFlow Servingの方が処理時間が長いですが、その差は平均約0.02秒/枚と、影響は小さいです。また、これは画像サイズが大きいEfficientNetB7も同様であることがわかります。

まとめ

よく使われるDeepLearning画像分類モデルのCPU環境における処理速度を計測しました。モデルによって、その処理速度には大きく差があり、コスト面を考慮した上で選択する必要があることがわかりました。DeepLearningモデル選択時の参考にしていただけると幸いです。

付録(ソースコード)

実験に用いたソースコードを添付します。

モデル保存

モデル保存用のコードです。

import os
import tensorflow as tf
from absl import logging
import tensorflow.keras.applications as app
import efficientnet.tfkeras as efn

MODELS = [
    app.VGG16,
    app.ResNet50,
    app.InceptionV3,
    app.Xception,
    app.MobileNetV2,
    efn.EfficientNetB0,
    efn.EfficientNetB1,
    efn.EfficientNetB2,
    efn.EfficientNetB3,
    efn.EfficientNetB4,
    efn.EfficientNetB5,
    efn.EfficientNetB6,
    efn.EfficientNetB7,
]


def build_serve_fn(model):
    @tf.function(
        input_signature=[tf.TensorSpec(model.input.shape, dtype=tf.uint8)])
    def serve(images):
        images = tf.cast(images, tf.float32) / 255
        probabilities = model(images)
        classes = tf.argmax(probabilities, axis=-1)
        return {'classes': classes}

    return serve


def export_model(model, export_path):
    logging.info(f'Export {model.name}, input_shape: {model.input.shape}')
    serve_fn = build_serve_fn(model)
    tf.keras.backend.set_learning_phase(0)
    tf.saved_model.save(model, export_path, signatures={
        'serving_default': serve_fn})


def export_models(output_dir: str):
    for model_cls in MODELS:
        model = model_cls()
        export_path = os.path.join(output_dir, model.name, '1')
        export_model(model, export_path)


if __name__ == '__main__':
    logging.set_verbosity(logging.INFO)
    export_models(output_dir='models')

計測

計測用のコードです。事前に、上記で出力したモデルを持たせたTensorFlow Servingコンテナを立てる必要があります。

import os
import io
import time
import argparse
import functools
import grpc
import numpy as np
import tensorflow as tf
import boto3

from PIL import Image
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
from absl import logging

MODEL_INPUT_SHAPES = {
    'vgg16': 224,
    'resnet50': 224,
    'inception_v3': 299,
    'xception': 299,
    'mobilenetv2_1.00_224': 224,
    'efficientnet-b0': 224,
    'efficientnet-b1': 240,
    'efficientnet-b2': 260,
    'efficientnet-b3': 300,
    'efficientnet-b4': 380,
    'efficientnet-b5': 456,
    'efficientnet-b6': 528,
    'efficientnet-b7': 600,

}
BATCH_SIZE = 8
NUM_IMAGES = 100
MODEL_DIR = 'models'


class S3Client:
    def __init__(self, bucket):
        self.s3_client = boto3.Session().client('s3')
        self.bucket = bucket

    def get_s3_keys(self, directory='',
                    white_list_formats=('jpg', 'jpeg')):
        keys = []
        paginator = self.s3_client.get_paginator('list_objects')
        for result in \
                paginator.paginate(Bucket=self.bucket, Delimiter='/',
                                   Prefix=directory):
            if result.get('CommonPrefixes') is not None:
                for subdir in result.get('CommonPrefixes'):
                    keys += self.get_s3_keys(subdir.get('Prefix'),
                                             white_list_formats)
            if result.get('Contents') is not None:
                for file in result.get('Contents'):
                    key = file.get('Key')
                    if key.lower().endswith(white_list_formats):
                        keys.append(key)

        return keys

    def download_image(self, s3_key, target_size=None):
        s3_object = self.s3_client.get_object(Bucket=self.bucket,
                                              Key=s3_key)
        image_data = io.BytesIO(s3_object['Body'].read())
        pil_image = Image.open(image_data).convert('RGB')
        if target_size:
            pil_image = pil_image.resize((target_size, target_size), Image.NEAREST)
        image = np.asarray(pil_image)

        return image

    def download_images(self: str, s3_keys, target_size=None):
        images = []
        for key in s3_keys:
            image = self.download_image(key, target_size)
            images.append(image)

        images = np.asarray(images)
        return images


def predict_on_remote(server, model_name, timeout, images):
    with grpc.insecure_channel(server) as channel:
        stub = prediction_service_pb2_grpc.PredictionServiceStub(
            channel)

        request = predict_pb2.PredictRequest()
        request.model_spec.name = model_name
        request.model_spec.signature_name = 'serving_default'
        tensor = tf.make_tensor_proto(images,
                                      dtype=images.dtype.type,
                                      shape=images.shape)
        request.inputs['images'].CopyFrom(tensor)
        response = stub.Predict(request, timeout)
        result = {}
        for key, value in response.outputs.items():
            result[key] = tf.make_ndarray(value).reshape(
                [dim.size for dim in value.tensor_shape.dim])

    return result


def predict_on_local(model, images):
    f = model.signatures["serving_default"]
    result = f(images=tf.constant(images))

    return result


def run_predict(predict_fn, input_shape, num_images, batch_size, bucket, s3_dir):
    s3 = S3Client(bucket)
    s3_keys = s3.get_s3_keys(s3_dir)

    total_sec = 0

    for start in range(0, num_images, batch_size):
        end = min(start+batch_size, num_images)
        images = s3.download_images(s3_keys[start:end], input_shape)
        pred_start = time.time()
        predict_fn(images=images)
        pred_end = time.time()
        total_sec += (pred_end - pred_start)

    return total_sec


def main(args):
    logging.set_verbosity(logging.INFO)
    server = args.server
    num_images = args.num_images
    batch_size = args.batch_size
    model_dir = args.model_dir
    timeout = args.timeout
    bucket = args.bucket
    s3_dir = args.s3_dir

    models = os.listdir(model_dir)

    logging.info(f'models: {models}')
    logging.info(f'server: {server}')
    logging.info(f'timeout: {timeout}')

    # Remote prediction
    for model_name in models:
        input_shape = MODEL_INPUT_SHAPES[model_name]
        predict_fn = functools.partial(predict_on_remote,
                                       server=server, model_name=model_name,
                                       timeout=timeout)
        processing_sec = run_predict(predict_fn, input_shape,
                                     num_images, batch_size, bucket, s3_dir)

        logging.info(f'[Remote] model_name: {model_name}, '
                     f'processing_sec: {processing_sec:.3f}, '
                     f'processing_sec/image: {processing_sec/num_images:.3f}')

    # Local prediction
    for model_name in models:
        input_shape = MODEL_INPUT_SHAPES[model_name]
        model = tf.saved_model.load(os.path.join(model_dir, model_name, '1'))
        predict_fn = functools.partial(predict_on_local, model=model)
        processing_sec = run_predict(predict_fn, input_shape,
                                     num_images, batch_size, bucket, s3_dir)

        logging.info(f'[Local] model_name: {model_name}, '
                     f'processing_sec: {processing_sec:.3f}, '
                     f'processing_sec/image: {processing_sec/num_images:.3f}')


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add.argument('--bucket', type=str, required=True)
    parser.add.argument('--s3_dir', type=str, required=True)
    parser.add_argument(
        '--server', type=str, default='localhost:8500')
    parser.add_argument(
        '--num_images', type=int, default=NUM_IMAGES)
    parser.add_argument(
        '--batch_size', type=int, default=BATCH_SIZE)
    parser.add_argument(
        '--model_dir', type=str, default=MODEL_DIR)
    parser.add_argument(
        '--timeout', type=int, default=80)
    args = parser.parse_args()
    main(args)

ユニファ開発者ブログ

ユニファ株式会社プロダクトデベロップメント本部メンバーによるブログです。

DeepLearning画像分類モデルの処理速度計測

背景

計測方法

計測結果

考察

まとめ

付録(ソースコード)

モデル保存

計測