CloudFormationのCustomResourceでデータベースを作ってみた

こんにちは、プロダクトエンジニアリング部のちょうです。最近ひたすら既存システムの自動化をやっています。その中で、TerraformとCloudFormation両方を使っていました。TerraformあればCloudFormationが必要ないと思う方もいるかもしれませんが、CloudFormationを使ってTerraformができないことを可能にするCloudFormationのCustom Resourceを紹介しようと思います。

まず簡単にTerraformとCloudFormationの特徴を比較します。

	Terraform	CloudFormation
プラットフォーム	AWS, GCP etc	AWS
構文	HCL リソース定義	JSON/YAML リソース定義
IDEサポート（JetBrain)	◯	△ それ以外Web Designerある
リソース状態	tfstateファイルで管理	AWSが管理
オンライン実行	Terraform Cloud（有料）	画面で変更および実行できる（無料）
入力	環境変数 tfvarsファイル Terraform Cloud（有料）	作成・更新時に環境変数、JSONファイル作成後AWSが管理
ロールバック機能	なし、途中で失敗すると止まる/crash	あり
AWS API	最新ではない例えばECS Serviceの一部設定など	最新
拡張機能	Golang	Custom Resource (SNS/Lambda)

全体的に見ると、TerraformはIDEサポートがいいおかげて使いやすい、かつ複数プラットフォームを対応します。比べてCloudFormationはAWSのサービスとして、リソース状態や入力の管理は必要なくなります。そして注目すべきなのは、ロールバック機能と拡張機能です。

ロールバック機能は、実行するとき、一つのリソースの作成・更新がエラーになるとき、変更したほかのリソースは前の状態に戻れるかとういうことです。これは結構重要な機能で、アプリケーションがバージョンアップしてインフラをすこし変更したいときは中途半端な状態はだめです。ゆえ、CodePipelineにECS Scheduled Taskなどを更新（TaskDefinitionとEventBridge Rule Target両方）したいとき、CloudFormationが必要です。

もう一つは拡張機能です。TerraformはGolangで書かれているのでGolangをサポートしています。正直インフラをやる人はPythonをすこし書ける人一番多いかもしれません。比べてCloudFormationはAWSのサービスですのでLambdaを使えます。Lambdaが複数言語を対応するのとLambdaをVPCと接続すればVPC内のリソースを触れます。後者はTerraformが基本できないことです。

例えば、RDSのインスタンスを作って、アプリケーション用のデータベースを作りたいときは、EC2でしたら、何らかのツールを経由してRDSインスタンスに接続してデータベースを作ります。Fargateを使うと、EC2がなく、RDSインスタンスは基本プライベートサブネットにいるので、接続できず作成できません。そして手動に戻り、Cloud9/EC2インスタンスを作って、インスタンスに入って作業します。もしLambdaがあれば、VPCと繋がってRDSインスタンスと通信することができます。

では、実際CloudFormationのCustom Resource(Lambda)を使ってみましょう。いきなりデータベースをいじるより、簡単なCustom Resourceをやります。例えば、S3バケットにあるテキストファイル。

（以下のコードでは利用されるS3バケットは状況に応じて作ってください）

利用側

# s3-text-file-example.yaml
AWSTemplateFormatVersion: "2010-09-09"
Description: S3 Text File Example

Parameters:
    Prefix:
        Type: String

Resources:  
    EcsEnvFile:
        Type: "AWS::CloudFormation::CustomResource"
        Properties:
            ServiceToken: !ImportValue "CrS3TextFileFunctionArn"
            Bucket: !Sub "${Prefix}-ecs"
            ObjectKey: "app.env"
            Content: !Sub |
                PREFIX=${Prefix}
                ACCOUNT_ID=${AWS::AccountId}
                REGION=${AWS::Region}

CloudFormationを詳しくない方向けに説明します。CloudFormationはStackという単位で管理します。基本1 Stackイコール1テンプレートファイルです。テンプレートファイルに

AWSTemplateFormatVersion バージョン
Description 説明
Parameters 入力
Resources リソース定義

があります。Parametersにパラメータ名と型の一覧があります。リソースにはリソースタイプと属性が定義されています。Custom Resourceにはリソースタイプは AWS::CloudFormation::CustomResource （ほかにもある）になります。属性にServiceTokenはLambdaのARNになります。残りはすべてLambdaの入力パラメータです。

CloudFormationのテンプレートファイルにいくつの関数があります。!ImportValue はほかのStackがエクスポートした変数を取り入れます。!Sub は ${} で囲まる変数を実際の値に変換します。入力のPrefix以外に、最初からAWS::AccountIdとAWS::Region（値はその名前通り）も使えます。

ではもう一回みると、EcsEnvFileというリソースはLambdaを呼び出して、Bucket, ObjectKeyなどを入力することになります。Lambda側はこのようなコードです。

// index.js
exports.handler = async (event, context) => {
    let resourceProperties = event['ResourceProperties'];
    let bucket = resourceProperties['Bucket'];
    let objectKey = resourceProperties['ObjectKey'];
    let content = resourceProperties['Content'];

    let requestType = event['RequestType'];
    let s3Client = new AWS.S3();
    if (requestType === 'Create' || requestType === 'Update') {
        await s3Client.putObject({
            Bucket: bucket,
            Key: objectKey,
            Body: content
        }).promise();
    } else if (requestType === 'Delete') {
        await s3Client.deleteObject({
            Bucket: bucket,
            Key: objectKey
        }).promise();
    }
    // cloudFormationReply(event, context, success, responseData, physicalResourceId, noEcho)
    await cloudFormationReply(event, context, true, {}, `s3://${bucket}/${objectKey}`, false);
}

基本AWSサービスをいじるときNodeJSが書きやすいです。ロジックはそんなに難しくないと思います。RequestType によって、作成・更新と削除が行います。最後にある cloudFormationReply はAWSのライブラリcfn-responseの関数です。こちらにドキュメントがあります。

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-lambda-function-code-cfnresponsemodule.html

Lambdaのテンプレートは以下のようになります。

# custom-resource-s3-text-file.yaml
AWSTemplateFormatVersion: "2010-09-09"
Description: Custom Resources S3 Text File

Resources:
  CrS3TextFileRole:
    Type: "AWS::IAM::Role"
    Properties:
      RoleName: "custom-resource-s3-text-file"
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: "Allow"
            Principal:
              Service:
                - "lambda.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
      Policies:
        - PolicyName: s3
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - "s3:PutObject"
                  - "s3:GetObject"
                  - "s3:DeleteObject"
                Resource: "arn:aws:s3:::*/*"

  CrS3TextFileFunction:
    Type: "AWS::Lambda::Function"
    Properties:
      FunctionName: "custom-resource-s3-text-file"
      Code: ../lambda/custom-resource-s3-text-file
      Handler: index.handler
      Role: !GetAtt CrS3TextFileRole.Arn
      Runtime: nodejs12.x

Outputs:
  CrS3TextFileFunctionArn:
    Value: !GetAtt CrS3TextFileFunction.Arn
    Export:
      Name: "CrS3TextFileFunctionArn"

LambdaをserverlessやAWSのSAMで管理できますが、serverlessが複数プラットフォームを対応するため、Lambdaの設定が複雑になると、結局CloudFormationのリソースを書かないといけないです。AWSのSAMはCloudFormationのベースにしたLambda開発に特化したツールです。ただ特別な要望がないなら、そのままCloudFormationで管理することもできます。

テンプレートファイルに２つのリソースがあります。IAMロールとLambdaです。IAMロールに基本的なAWSLambdaBasicExecutionRoleとS3を操作できるポリシーが入っています。LambdaリソースのCode属性に注意してください。実際のコードは別ファイルになります。

ではLambdaをデプロイします。

$ # aws s3 mb cfn-package-abcd
$ aws cloudformation package --template-file=custom-resources-s3-text-file.yaml \
                             --s3-bucket=cfn-package-abcd --s3-prefix=custom-resource-s3-text-file \
                             --output-template-file out-custom-resource-s3-text-file.yaml

$ aws cloudformation deploy --stack-name=custom-resources-s3-text-file \
                            --template-file=out-custom-resource-s3-text-file.yaml \
                            --capabilities CAPABILITY_NAMED_IAM

まずcloudformationのpackageコマンドを利用して、LambdaのコードをS3にアップロードします。packageは変換したテンプレートファイルを出力します。次はcloudformationのdeployコマンドを使って、出力テンプレートファイルを使ってデプロイします。デプロイコマンドはStackないなら作成、あるなら更新するという意味です。最後にIAMロールを作成ならCAPABILITY_NAMED_IAMが必要になります。

Lambdaテンプレートファイルの最後にOutputsがあります。この出力は最初のテンプレートに使われています。cloudformationのテンプレートファイルは出力とエクスポートを指定することで、別のStackに利用されることができます。

LambdaのStackができたことで、利用側のテンプレートをデプロイしましょう。

$ aws cloudformation deploy --stack-name=s3-text-file-example \
                            --template-file=s3-text-file-example.yaml \
                            --parameter-overrides Prefix="my"

もしS3バケットにapp.envができたら成功です！

ところで、このファイル実はFargate 1.4.0からサポートされる環境変数ファイルであります。CloudFormationはS3バケットのオブジェクトを作成するリソースがないですが、こうすることで、自分でカバーすることができます。

本題のデータベースに戻ります。大体の流れがわかることで、データベースを作成するLambdaも難しくないでしょう。

# custom-resource-pg-database.yaml
AWSTemplateFormatVersion: "2010-09-09"
Description: Custom Resources PostgreSQL Database

Parameters:
  VpcId:
    Type: AWS::EC2::VPC::Id
  PrivateSubnetIds:
    Type: List<AWS::EC2::Subnet::Id>

Resources:
  # https://docs.aws.amazon.com/lambda/latest/dg/services-rds-tutorial.html
  CrPgDatabaseRole:
    Type: "AWS::IAM::Role"
    Properties:
      RoleName: "custom-resource-pg-database"
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: "Allow"
            Principal:
              Service:
                - "lambda.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        - "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
        - "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"

  CrPgDatabaseSecurityGroup:
    Type: "AWS::EC2::SecurityGroup"
    Properties:
      GroupName: "custom-resources-pg-database"
      GroupDescription: "custom resource pg database"
      VpcId: !Ref VpcId
      SecurityGroupEgress:
        - IpProtocol: tcp
          FromPort: 5432
          ToPort: 5432
          CidrIp: 0.0.0.0/0
          Description: "PostgreSQL"
        - IpProtocol: tcp
          FromPort: 443
          ToPort: 443
          CidrIp: 0.0.0.0/0
          Description: "CloudFormation Response"

  CrPgDatabaseFunction:
    Type: "AWS::Lambda::Function"
    Properties:
      FunctionName: "custom-resource-pg-database"
      Code: ../lambda/custom-resource-pg-database
      Handler: app.handler
      Role: !GetAtt CrPgDatabaseRole.Arn
      Runtime: python3.8
      VpcConfig:
        SecurityGroupIds:
          - !Ref CrPgDatabaseSecurityGroup
        SubnetIds: !Ref PrivateSubnetIds

Outputs:
  CrPgDatabaseFunctionArn:
    Value: !GetAtt CrPgDatabaseFunction.Arn
    Export:
      Name: "CrPgDatabaseFunctionArn"

VPCとつながるので、AWSLambdaVPCAccessExecutionRoleが必要です。そして、Security GroupにPostgreSQLの5432以外に443もアクセスようにしてください、そうではないとCustom Resourceの状態は成功にできません。残りは普通のLambdaの設定になります。続いては利用側です。

# database.yml
AWSTemplateFormatVersion: "2010-09-09"
Description: Database

Parameters:
  DatabaseHost:
    Type: String
  DatabaseMasterUsername:
    Type: String
    NoEcho: true
  DatabaseMasterPassword:
    Type: String
    NoEcho: true
  DatabaseName:
    Type: String
  MigrationUsername:
    Type: String
    NoEcho: true
  MigrationPassword:
    Type: String
    NoEcho: true

Resources:
  PgDatabase:
    Type: "AWS::CloudFormation::CustomResource"
    Properties:
      ServiceToken: !ImportValue "CrPgDatabaseFunctionArn"
      DatabaseHost: !Ref DatabaseHost
      DatabaseMasterUsername: !Ref DatabaseMasterUsername
      DatabaseMasterPassword: !Ref DatabaseMasterPassword
      DatabaseName: !Ref DatabaseName
      MigrationUsername: !Ref MigrationUsername
      MigrationPassword: !Ref MigrationPassword

  DatabaseHostParameter:
    Type: "AWS::SSM::Parameter"
    Properties:
      Name: /database-host
      Type: String
      Value: !Ref DatabaseHost

  DatabaseNameParameter:
    Type: "AWS::SSM::Parameter"
    Properties:
      Name: /database-name
      Type: String
      Value: !Ref DatabaseName

  DatabaseMigrationSecrets:
    Type: "AWS::SecretsManager::Secret"
    Properties:
      Name: /database-migration-secrets
      SecretString: !Sub '{"username":"${MigrationUsername}","password":"${MigrationPassword}"}'

Outputs:
  DatabaseHostParameter:
    Value: !Ref DatabaseHostParameter
    Export:
      Name: "DatabaseHostParameter"
  DatabaseNameParameter:
    Value: !Ref DatabaseNameParameter
    Export:
      Name: "DatabaseNameParameter"
  DatabaseMigrationUsernameArn:
    Value: !Sub "${DatabaseMigrationSecrets}:username::"
    Export:
      Name: "DatabaseMigrationUsernameArn"
  DatabaseMigrationPasswordArn:
    Value: !Sub "${DatabaseMigrationSecrets}:password::"
    Export:
      Name: "DatabaseMigrationPasswordArn"

利用側のParametersにNoEchoというパラメータが設定されています。最初の比較テーブルに書いたように、CloudFormationの入力は作成後AWSに管理されます。ゆえ、パスワードなどの情報にNoEchoを設定して表示しないようにします。CustomResource以外にParameter StoreとSecretsManagerにデータベースの情報を保存します。これでデータベースの情報が一元管理します。データベースが作成できなかったらこれらの情報も保存すべきではないです。ちなみに、CloudFormationにParameter StoreのSecureStringタイプのパラメータを作成できません。おそらくパラメータなどの情報はParameter Storeに保存するのおすすめしないという方針で、SecretsManagerに誘導しています。SecretsManagerにアクセス記録、パスワードローテーションやJSONでまとめて保存するいろんな機能があるので、喜んで受け入れましょう。

データベースを作成するLambdaはPythonで書かれています。作成、更新と削除を対応するため、すこし複雑になります。以下の抜粋したコードです。

import postgresql.driver as pg_driver

class Request:
    def __init__(self, event):
        self.event = event
        self.kind = event['RequestType']
        self.event_properties = event['ResourceProperties']
        self.database_host = get_resource_property(self.event_properties, "DatabaseHost")
        self.database_port = get_resource_property(self.event_properties, "DatabasePort", required=False, default_value=5432)
        self.database_master_username = get_resource_property(self.event_properties, "DatabaseMasterUsername")
        self.database_master_password = get_resource_property(self.event_properties, "DatabaseMasterPassword")
        self.database_name = get_resource_property(self.event_properties, "DatabaseName")
        self.schema_name = get_resource_property(self.event_properties, "SchemaName", required=False, default_value="app")
        self.migration_role_name = get_resource_property(self.event_properties, "MigrationRoleName", required=False, default_value=f"{self.database_name}_migration")
        self.migration_username = get_resource_property(self.event_properties, "MigrationUsername")
        self.migration_password = get_resource_property(self.event_properties, "MigrationPassword")
        self.crud_role_name = get_resource_property(self.event_properties, "CrudRoleName", required=False, default_value=f"{self.database_name}_crud")
        self.crud_username = get_resource_property(self.event_properties, "CrudUsername", required=False)
        self.crud_password = get_resource_property(self.event_properties, "CrudPassword", required=(self.crud_username != None))

def create_database(request):
    # conn.execute(f'CREATE DATABASE "{database_name}"')
    create_database(request.database_name, request.schema_name)

    # conn.execute(f'CREATE ROLE "{role_name}"')
    # ...
    create_migration_role(request.database_name, request.schema_name, request.migration_role_name)

    # conn.execute(f"CREATE USER \"{username}\" WITH PASSWORD '{password}'")
    # ...
    create_user(request.migration_username, request.migration_password, request.database_name, request.schema_name, request.migration_role_name)

def handler(event, context):
    request = Request(event)
    try:
        if request.kind == 'Create':
            create_database(request)
            cloud_formation_reply(event, context, "SUCCESS", responseData={}, physicalResourceId=request.database_name)
        elif request.kind == 'Update':
            print("unsupport operation [update]")
            cloud_formation_reply(event, context, "FAILED", responseData={}, physicalResourceId=request.database_name)
        elif request.kind == 'Delete':
            delete_database(request)
            cloud_formation_reply(event, context, "SUCCESS", responseData={}, physicalResourceId=request.database_name)
    except Exception as error:
        cloud_formation_reply(event, context, "FAILED", responseData={}, reason=str(error), physicalResourceId=request.database_name)

S3テキストファイルと同じ、Lambdaを先にデプロイして、続いてデータベースのテンプレートをデプロイすれば、データベースが自動的に作成されます。Cloud9/EC2を作成する必要がありません。もちろん、テーブルを作成するSQLがあっても、コードをすこしいじって、SQLを実行するのも可能です。そして、Custom Resourceの属性を変えない限り、Custom Resourceに更新リクエストが来ないです。別のリソースをすきに変更しても問題ありません。

ここまで２つの例を見ていかがでしょうか。CloudFormationのCustom ResourceがLambdaを利用できることで、管理できる領域が結構広げましたね。ほかの例というと

API Gateway (v1) の自動デプロイ
AES256にランダムなIVとKeyを作成する
内部用のOAuthアプリケーションを作成する
Redisにあるアプリケーションの設定をいじる

最後に、TerraformができないことはCloudFormationができるといっても、既存のTerraformを放棄しすべてをCloudFormationにする意味ではないです。個人的には

アプリケーションに近いほど環境の設定はTerraformにとって難しいため（インスタンスの設定はできないなど）、Terraform以外のツールに変えるほうがいい
AWSのCodeシリーズ（CodeBuild/CodePipeline）をTerraformにしない
- 簡単なものがいいですが、複雑なものはエンジニアが一番くわしい（感覚的にエンジニア七割、インフラ三割）
- 変更する可能性が高いものほどtfstate/tfvarsの管理に意味がない、インフラはただボトルネックになる
- ゆえ、CloudFormationはよいかと
アプリケーションの起動設定（ECS Task Definitionなど）はTerraformから離す
- Terraformの適応はアプリケーションのデプロイになる
- 失敗したら中途半端になりかねない（デプロイに関してはロールバック機能が必要、Blue/Greenデプロイなど）
CodePipelineに環境変更があったらCloudFormationがいい、例えばECS Scheduled Task、API Gateway

と考えています。要するに、アプリケーションと開発の設定はインフラに依頼しなくても済むことです。このDevOpsの間にあるものはCloudFormationがよいだと思います。そしてCustom Resourceを利用してどんどん自動化できると信じています。

ユニファ開発者ブログ

ユニファ株式会社プロダクトデベロップメント本部メンバーによるブログです。

CloudFormationのCustomResourceでデータベースを作ってみた