Automating The Process Of Archiving Data From RDS to S3 Glacier
MySQL, RDS, DMS, S3, S3 Glacier, Lambda, IAM, EC2, Fast Glacier
If you compare the cost of per GB storage per month for Amazon RDS PostgreSQL Multi-AZ GP2 with that of Amazon Glacier in the us-east-1 Region, the Amazon Glacier storage cost is about 57 times cheaper than that of Amazon RDS. For example, the cost of 100 GB of storage per month in Amazon Glacier is under a dollar ($0.10), compared to Amazon RDS Multi-AZ storage, which is $23.00 (at the time of this writing).
# Perquisites
- An AWS account that provides access to the services shown in the above Diagram.
- knowledge of Amazon RDS, AWS DMS, Amazon VPC, Amazon S3, and AWS Lambda.
- Installed PostgreSQL Client(psql)
## Table of Task
1. Create S3 Bucket
2. Create the Amazon Glacier vault
3. Create an Amazon EC2 client
4. Create MySQL Demo Database
5. Creating Replication Instance in AWS DMS
6. Populate MySQL Database
7. Create Endpoints in AWS DMS
- Create Source Endpoint as RDS
- Create Endpoint Target Endpoint as S3
8. Create Database Migration Task
9. Creating IAM Role
10. Creating Lambda Function
11. Trigger Events with AWS Lambda
12. Clean Up All Resources
1. Create S3 Bucket
- Go to S3 Dashboard and Click on Create Bucket
2. Give the unique name to the bucket glacier-demo-bucket-2024 (make sure bucket is not exist)
3. let all the options default and click on Create Bucket at last.
Note the bucket name. Now, our Bucket has been created. let`s create folder in it for storing our RDS Data.
4. Go inside our bucket and click on Create Folder
5. Give object name as RDS_CSV_DATA
6. Click on Create folder
at this point our bucket and directory for storing RDS data has been created. Note the folder name.
2. Create the Amazon Glacier vault
- Go to Amazon Glacier console here. click on Create Vault
2. Create vault rdsarchive
3. Under Event notifications area, choose the option Turn off notifications, and then click on Create vault.
Note:- Note the name and Amazon Resource Name (ARN) of the vault
3. Create an Amazon EC2 client
To create and configure an Amazon EC2 client to populate your source database.
On the EC2 Dashboard, choose Launch instance.
On the Launch an Instance page, enter the following values:
- In the Name and tags section, enter
DMSClient
for Name. - In the Application and OS Images (Amazon Machine Image) section, leave the settings as they are.
3. In the Instance Type section, choose t2.micro.
4. In the Key pair (login) section, choose Create a new key pair or choose an existing one.
- On the Create key pair page, enter the following:
- Key pair name:
DMSKeyPair
- Key pair type: Leave as RSA.
- Private key file format: Choose pem for OpenSSH on MacOS or Linux, or ppk for PuTTY on Windows.
- Save the key file when prompted.
5. In the Network Settings section, choose Edit. Choose the following settings:
- VPC — required: Choose the VPC with the ID that you recorded for the DB-Instances.
- Subnet: ap-south-1a.
- Auto-assign public IP: Choose Enable.
- Leave the rest of the settings as they are, and choose Launch instance.
4. Create MySQL Demo Database
- Go to Amazon RDS Dashboard and Select databases from navigation pane
2. Click on Create Database and Select Mysql Engine
3. Come down leave Engine Version default and Select Dev/Test box.
4. Change DB Instance name to dms-instance
5. leave user as admin and give password admin123
6. Select burstable classes and then choose db.t3.micro
7. Choose storage type General Purpose SSD(gp2) and Allocated Storage 20GB
8. Select Connect to an EC2 compute resource box and select our MySQL Client EC2 Instance ID
9. In DB Subnet Group → Choose existing and select default vpc
10. In VPC security group → Choose existing then choose default VPC security group
11. Uncheck Enable Enhanced Monitoring
12. Click on Create database
you can see our MySQL Database is in creating stage.
Note the Endpoint of RDS Instance > for this click on RDS Instance name then you will see following page.
5. Creating Replication Instance in AWS DMS
- Go to AWS DMS(Data Migration Service) and Click on Replication Instance
2. Click on Create Replication Instance
3. Name it dms-rc and following options are optional but select Istance Class: dms.t3.micro
4. High Availability: Dev or test workload (Single-AZ)
5. Allocated Storage: 10GB
6. Let all the options default and click on Create replication instance
It will take 5–10 minutes to create, without wasting time populate out Database with some queries.
6. Populate MySQL Database
1. we have created EC2 Client Instance in the step 3. so, now Connect to that client instance using the host name and public key that you saved in previous steps. If you are using PuTTY, enable TCP keepalives on the Connection settings page so that your connection doesn’t time out from inactivity
2. Install MySQL Client Confirm installation as needed.
wget https://dev.mysql.com/get/mysql80-community-release-el9-3.noarch.rpm
sudo dnf install mysql80-community-release-el9-3.noarch.rpm
sudo dnf install mysql-community-server -y
3. Start the MySQL client with the following command. Provide your MySQL database endpoint.
mysql -h <RDS_Endpoint> -P 3306 -u admin -padmin123
2. Create Database and Table
CREATE DATABASE dms_db;
USE dms_db;
CREATE TABLE employees (
id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
phone_number VARCHAR(15)
);
2. Run the following command to insert a record into the database.
INSERT INTO employees (id, first_name, last_name, email, phone_number)
VALUES (1, 'John', 'Doe', 'john.doe@example.com', '123-456-7890');
INSERT INTO employees (id, first_name, last_name, email, phone_number)
VALUES (2, 'Jane', 'Doe', 'jane.doe@example.com', '098-765-4321');
INSERT INTO employees (id, first_name, last_name, email, phone_number)
VALUES (3, 'Jim', 'Brown', 'jim.brown@example.com', '111-222-3333');
INSERT INTO employees (id, first_name, last_name, email, phone_number)
VALUES (4, 'Jill', 'Smith', 'jill.smith@example.com', '444-555-6666');
3. Run this command to see entry
select * from employees;
7. Create Endpoints in AWS DMS
7.1 Create Source Endpoint as RDS
1. Go to AWS DMS Console and choose Endpoints from navigation pane.
2. Click on Create Endpoint
3. check Select RDS DB instance → dms-instance
4. Leave other things default and check Provide access information manually
5. Change your password to admin123 and let all things default
6. Finally, Test the Endpoint. if the test get succeeded it will show successful in status after that click on Create Endpoint
7.2 Create Endpoint Target Endpoint as S3
Before creating a target Endpoint, we have to get the role ARN which will provide permission to store data from RDS to S3 bucket.
- For role ARN go to IAM console
- search for dms-vpc-role
- then click on that role and copy ARN and store it
Now we have the ARN, let`s create Target Endpoint.
- Again come to Dashboard and click on Create Endpoint
2. now choose Target endpoint box
3. Endpoint Configuration
Endpoint Identifier: dms_bucket
Target Engine: Amazon S3
ARN: paste ARN which have stored it from IAM dms-vpc-role
Bucket Name: paste s3 bucket name glacier-demo-bucket-2024 which we have created recently.
Bucket folder: give folder name that we have created after created the S3 bucket
Finally, test the Endpoint and make sure status must be successful.
after that click on Create Endpoint
8. Create Database Migration Task
- Go to DMS console and select Database Migration tasks form navigation pane then click on Create task
2. Task Configuration
- Task Identifier: dms-task-rds-s3
- leave empty optional things
- Replication Instance: dms-rc
- Source Database Endpoint: dms-instance
- Target Database Endpoint: dms-bucket
- Migration Type: Migrate existing data
3. Leave default rest of the above things and click on Enter a schema and Give source name as dms_db
4. at last select Manually later and click on Create task
9. Creating IAM Role
Create IAM Role with permissions, to get data from S3 and send it to S3 Glacier
- Go to IAM Console and select Roles form navigation pane then click on Create role
2. Choose AWS Services box and then choose Lambda in Use Cases
3. Search for following permissions in search box and then select it one by one in last click on Next.
- AmazonS3FullAccess
- AmazonGlacierFullAccess
4. Give name to the Role S3AndGlacierFullAccessRole
5. Click on Create Role
10. Creating Lambda Function
- Go to Lambda dashboard. click on Create Function on the up-right hand corner
2. Create Lambda Function
- Function Name: S3-To-Glacier
- Runtime: python 3.12
- expand the Change default execution role
- Check this option Use an existing role
- Select our new role S3AndGlacierFullAccessRole
- click on Create Function
3. you will see this ⬇️kind of page, go down and you will find editor like environment.
4. click on configuration → General Configuration → Edit
5. Set timeout to 5minutes and the click on Save
6. Create Lambda Automation Script or copy and paste⬇️script.
- click on “lambda-function.py” file on left hand side
- delete content of that file
7. paste(Ctrl + V) the following script in it and do the following changes:
'glacier-demo-bucket-2024'
Your S3 bucket name
'RDS_CSV_DATA'
Your S3 folder name
'rdsarchive'
—Your S3 Glacier vault name
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
glacier = boto3.client('glacier')
# Specify your bucket name
bucket_name = 'glacier-demo-bucket-2024'
# Specify your Glacier vault name
glacier_vault_name = 'rdsarchive'
# Specify the prefix
prefix = 'RDS_CSV_DATA/dms_db/employees/'
# Get the list of all objects in the bucket with the specified prefix
paginator = s3.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=bucket_name, Prefix=prefix)
for page in pages:
for obj in page['Contents']:
object_key = obj['Key']
# Get the object from S3
response = s3.get_object(Bucket=bucket_name, Key=object_key)
data = response['Body'].read()
# Check if data is not empty
if data:
# Store the data into the Glacier vault
response = glacier.upload_archive(vaultName=glacier_vault_name, body=data)
else:
print(f"No data found in object {object_key}. Skipping...")
return {
'statusCode': 200,
'body': f"All non-empty data from {bucket_name} has been stored in {glacier_vault_name} successfully."
}
8. after make changes in the script click on Deploy.
9. and then test the script by clicking on Test button → give name to test event as Test-event → in Event JSON Terminal delete existing code and type { } only → Save
11. Trigger Events with AWS Lambda
- Click on Add Trigger
2. Trigger Configuration
- Choose S3 as a trigger
- select our bucket from dropdown menu
- Event types: All objects create events
3. Give prefix as a folder name RDS_CSV_DATA then check acknowledgement box → and Save
Now All tasks are completed and its time to test. just start the DMS task now.
Load has been completed and total 4 Rows are stored in S3 bucket
If you will go to this path in our S3 Bucket then you will see our .csv file glacier-demo-bucket-2024 → RDS_CSV_DATA/ → dms_db → employees → LOAD00000001.csv
Go to Lambda functions page and will see the massage Last fetched now that mean our lambda function is triggered and store data into S3 Glacier Vault
Now you can see our data has been stored into S3 Glacier Vault. it might take 12–24 Hours to reflect your stored data in the glacier.
You can`t see S3 Glacier Vault data directly you have use AWSCLI, AWS SDK or third-party software. like in my case I have used Fast Glacier
12. Clean Up All Resources
— Deleting DMS Resources —
Make sure to delete the migration tasks before deleting the replication instance and endpoints.
🚫To delete a migration task using the console
- On the AWS DMS console navigation pane, choose Database migration tasks.
- Choose dms-task.
- Choose Actions, Delete.
🚫To delete a replication instance using the console
- On the AWS DMS console navigation pane, choose Replication instances.
- Choose DMS-instance.
- Choose Actions, Delete.
AWS DMS deletes the replication instance within 5 minutes and removes it from the Replication instances page.
🚫 To remove endpoints using the console
- On the AWS DMS console navigation pane, choose Endpoints.
- Choose Endpoints one by one
- Choose Actions, Delete.
After you delete your AWS DMS resources, make sure also to delete the following resources.
1. Your RDS databases.
- On the AWS RDS console navigation pane, Choose Databases.
- Choose Databases
- Choose Actions, Delete
- type delete me and click on Delete
- Uncheck Create final snapshot
- Ucheck Retain automated backups
- Check achknowledement box
- Click on delete
2. Your S3 Bucket
- first delete all data from bucket. for this click on Empty
- write permanently delete and click on Empty then Exit
- Again select the bucket and now click on Delete
- Copy and paste your bucket name
- last click on Delete bucket
3. Your Lambda Function
- select lambda function → Action menu → delete
- in pop up type delete and then click on Delete
4. Your S3 Glacier Vault
- select vault → Delete
- in pop up click on Delete
5. Your MySQL Client Instance
- Select Instance → Instance State → Terminate Instance
6. Your IAM User
- Search for a Role in the search box → select Role → Click on Delete
- Copy and paste our IAM Role name in the box S3AndGlacierFullAccessRole
- Click on Delete