Automating The Process Of Archiving Data From RDS to S3 Glacier

MySQL, RDS, DMS, S3, S3 Glacier, Lambda, IAM, EC2, Fast Glacier

14 min readJan 24, 2024

If you compare the cost of per GB storage per month for Amazon RDS PostgreSQL Multi-AZ GP2 with that of Amazon Glacier in the us-east-1 Region, the Amazon Glacier storage cost is about 57 times cheaper than that of Amazon RDS. For example, the cost of 100 GB of storage per month in Amazon Glacier is under a dollar ($0.10), compared to Amazon RDS Multi-AZ storage, which is $23.00 (at the time of this writing).

# Perquisites

An AWS account that provides access to the services shown in the above Diagram.
knowledge of Amazon RDS, AWS DMS, Amazon VPC, Amazon S3, and AWS Lambda.
Installed PostgreSQL Client(psql)

## Table of Task

1. Create S3 Bucket
2. Create the Amazon Glacier vault
3. Create an Amazon EC2 client
4. Create MySQL Demo Database
5. Creating Replication Instance in AWS DMS
6. Populate MySQL Database
7. Create Endpoints in AWS DMS
- Create Source Endpoint as RDS
- Create Endpoint Target Endpoint as S3
8. Create Database Migration Task
9. Creating IAM Role
10. Creating Lambda Function
11. Trigger Events with AWS Lambda
12. Clean Up All Resources

1. Create S3 Bucket

Go to S3 Dashboard and Click on Create Bucket

2. Give the unique name to the bucket glacier-demo-bucket-2024 (make sure bucket is not exist)

3. let all the options default and click on Create Bucket at last.

Note the bucket name. Now, our Bucket has been created. let`s create folder in it for storing our RDS Data.

4. Go inside our bucket and click on Create Folder

5. Give object name as RDS_CSV_DATA

6. Click on Create folder

at this point our bucket and directory for storing RDS data has been created. Note the folder name.

2. Create the Amazon Glacier vault

Go to Amazon Glacier console here. click on Create Vault

2. Create vault rdsarchive

3. Under Event notifications area, choose the option Turn off notifications, and then click on Create vault.

Note:- Note the name and Amazon Resource Name (ARN) of the vault

3. Create an Amazon EC2 client

To create and configure an Amazon EC2 client to populate your source database.

On the EC2 Dashboard, choose Launch instance.

On the Launch an Instance page, enter the following values:

In the Name and tags section, enter DMSClient for Name.
In the Application and OS Images (Amazon Machine Image) section, leave the settings as they are.

3. In the Instance Type section, choose t2.micro.

4. In the Key pair (login) section, choose Create a new key pair or choose an existing one.

On the Create key pair page, enter the following:
Key pair name: DMSKeyPair
Key pair type: Leave as RSA.
Private key file format: Choose pem for OpenSSH on MacOS or Linux, or ppk for PuTTY on Windows.
Save the key file when prompted.

5. In the Network Settings section, choose Edit. Choose the following settings:

VPC — required: Choose the VPC with the ID that you recorded for the DB-Instances.
Subnet: ap-south-1a.
Auto-assign public IP: Choose Enable.
Leave the rest of the settings as they are, and choose Launch instance.

4. Create MySQL Demo Database

Go to Amazon RDS Dashboard and Select databases from navigation pane

2. Click on Create Database and Select Mysql Engine

3. Come down leave Engine Version default and Select Dev/Test box.

4. Change DB Instance name to dms-instance

5. leave user as admin and give password admin123

6. Select burstable classes and then choose db.t3.micro

7. Choose storage type General Purpose SSD(gp2) and Allocated Storage 20GB

8. Select Connect to an EC2 compute resource box and select our MySQL Client EC2 Instance ID

9. In DB Subnet Group → Choose existing and select default vpc

10. In VPC security group → Choose existing then choose default VPC security group

11. Uncheck Enable Enhanced Monitoring

12. Click on Create database

you can see our MySQL Database is in creating stage.

Note the Endpoint of RDS Instance > for this click on RDS Instance name then you will see following page.

5. Creating Replication Instance in AWS DMS

Go to AWS DMS(Data Migration Service) and Click on Replication Instance

2. Click on Create Replication Instance

3. Name it dms-rc and following options are optional but select Istance Class: dms.t3.micro

4. High Availability: Dev or test workload (Single-AZ)

5. Allocated Storage: 10GB

6. Let all the options default and click on Create replication instance

It will take 5–10 minutes to create, without wasting time populate out Database with some queries.

6. Populate MySQL Database

1. we have created EC2 Client Instance in the step 3. so, now Connect to that client instance using the host name and public key that you saved in previous steps. If you are using PuTTY, enable TCP keepalives on the Connection settings page so that your connection doesn’t time out from inactivity

2. Install MySQL Client Confirm installation as needed.

wget https://dev.mysql.com/get/mysql80-community-release-el9-3.noarch.rpm
sudo dnf install mysql80-community-release-el9-3.noarch.rpm
sudo dnf install mysql-community-server -y

3. Start the MySQL client with the following command. Provide your MySQL database endpoint.

mysql -h <RDS_Endpoint> -P 3306 -u admin -padmin123

2. Create Database and Table

CREATE DATABASE dms_db;

USE dms_db;

CREATE TABLE employees (
    id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    phone_number VARCHAR(15)
);

2. Run the following command to insert a record into the database.

INSERT INTO employees (id, first_name, last_name, email, phone_number)
VALUES (1, 'John', 'Doe', 'john.doe@example.com', '123-456-7890');

INSERT INTO employees (id, first_name, last_name, email, phone_number)
VALUES (2, 'Jane', 'Doe', 'jane.doe@example.com', '098-765-4321');

INSERT INTO employees (id, first_name, last_name, email, phone_number)
VALUES (3, 'Jim', 'Brown', 'jim.brown@example.com', '111-222-3333');

INSERT INTO employees (id, first_name, last_name, email, phone_number)
VALUES (4, 'Jill', 'Smith', 'jill.smith@example.com', '444-555-6666');

3. Run this command to see entry

select * from employees;

7. Create Endpoints in AWS DMS

7.1 Create Source Endpoint as RDS

1. Go to AWS DMS Console and choose Endpoints from navigation pane.

2. Click on Create Endpoint

3. check Select RDS DB instance → dms-instance

4. Leave other things default and check Provide access information manually

5. Change your password to admin123 and let all things default

6. Finally, Test the Endpoint. if the test get succeeded it will show successful in status after that click on Create Endpoint

7.2 Create Endpoint Target Endpoint as S3

Before creating a target Endpoint, we have to get the role ARN which will provide permission to store data from RDS to S3 bucket.

For role ARN go to IAM console
search for dms-vpc-role
then click on that role and copy ARN and store it

Now we have the ARN, let`s create Target Endpoint.

Again come to Dashboard and click on Create Endpoint

2. now choose Target endpoint box

3. Endpoint Configuration

Endpoint Identifier: dms_bucket

Target Engine: Amazon S3

ARN: paste ARN which have stored it from IAM dms-vpc-role

Bucket Name: paste s3 bucket name glacier-demo-bucket-2024 which we have created recently.

Bucket folder: give folder name that we have created after created the S3 bucket

Finally, test the Endpoint and make sure status must be successful.

after that click on Create Endpoint

8. Create Database Migration Task

Go to DMS console and select Database Migration tasks form navigation pane then click on Create task

2. Task Configuration

Task Identifier: dms-task-rds-s3
leave empty optional things
Replication Instance: dms-rc
Source Database Endpoint: dms-instance
Target Database Endpoint: dms-bucket
Migration Type: Migrate existing data

3. Leave default rest of the above things and click on Enter a schema and Give source name as dms_db

4. at last select Manually later and click on Create task

9. Creating IAM Role

Create IAM Role with permissions, to get data from S3 and send it to S3 Glacier

Go to IAM Console and select Roles form navigation pane then click on Create role

2. Choose AWS Services box and then choose Lambda in Use Cases

3. Search for following permissions in search box and then select it one by one in last click on Next.

AmazonS3FullAccess
AmazonGlacierFullAccess

4. Give name to the Role S3AndGlacierFullAccessRole

5. Click on Create Role

10. Creating Lambda Function

Go to Lambda dashboard. click on Create Function on the up-right hand corner

2. Create Lambda Function

Function Name: S3-To-Glacier
Runtime: python 3.12

expand the Change default execution role
Check this option Use an existing role
Select our new role S3AndGlacierFullAccessRole
click on Create Function

3. you will see this ⬇️kind of page, go down and you will find editor like environment.

4. click on configuration → General Configuration → Edit

5. Set timeout to 5minutes and the click on Save

6. Create Lambda Automation Script or copy and paste⬇️script.

click on “lambda-function.py” file on left hand side
delete content of that file

7. paste(Ctrl + V) the following script in it and do the following changes:

'glacier-demo-bucket-2024' Your S3 bucket name

'RDS_CSV_DATA'Your S3 folder name

'rdsarchive' —Your S3 Glacier vault name

import boto3

def lambda_handler(event, context):
    s3 = boto3.client('s3')
    glacier = boto3.client('glacier')

    # Specify your bucket name
    bucket_name = 'glacier-demo-bucket-2024'

    # Specify your Glacier vault name
    glacier_vault_name = 'rdsarchive'

    # Specify the prefix
    prefix = 'RDS_CSV_DATA/dms_db/employees/'

    # Get the list of all objects in the bucket with the specified prefix
    paginator = s3.get_paginator('list_objects_v2')
    pages = paginator.paginate(Bucket=bucket_name, Prefix=prefix)

    for page in pages:
        for obj in page['Contents']:
            object_key = obj['Key']

            # Get the object from S3
            response = s3.get_object(Bucket=bucket_name, Key=object_key)
            data = response['Body'].read()

            # Check if data is not empty
            if data:
                # Store the data into the Glacier vault
                response = glacier.upload_archive(vaultName=glacier_vault_name, body=data)
            else:
                print(f"No data found in object {object_key}. Skipping...")

    return {
        'statusCode': 200,
        'body': f"All non-empty data from {bucket_name} has been stored in {glacier_vault_name} successfully."
    }

8. after make changes in the script click on Deploy.

9. and then test the script by clicking on Test button → give name to test event as Test-event → in Event JSON Terminal delete existing code and type { } only → Save

11. Trigger Events with AWS Lambda

Click on Add Trigger

2. Trigger Configuration

Choose S3 as a trigger
select our bucket from dropdown menu
Event types: All objects create events

3. Give prefix as a folder name RDS_CSV_DATA then check acknowledgement box → and Save

Now All tasks are completed and its time to test. just start the DMS task now.

Load has been completed and total 4 Rows are stored in S3 bucket

If you will go to this path in our S3 Bucket then you will see our .csv file glacier-demo-bucket-2024 → RDS_CSV_DATA/ → dms_db → employees → LOAD00000001.csv

Go to Lambda functions page and will see the massage Last fetched now that mean our lambda function is triggered and store data into S3 Glacier Vault

Now you can see our data has been stored into S3 Glacier Vault. it might take 12–24 Hours to reflect your stored data in the glacier.

You can`t see S3 Glacier Vault data directly you have use AWSCLI, AWS SDK or third-party software. like in my case I have used Fast Glacier