Introduction

Amazon Aurora PostgreSQL is a managed relational database service that combines the performance and availability of commercial databases with the simplicity and cost-effectiveness of open-source databases. Built on a fully distributed, fault-tolerant, and self-healing storage system, Aurora PostgreSQL is ideal for high-performance applications.

Monitoring the health of Aurora PostgreSQL instances is crucial for maintaining optimal performance, ensuring reliability, and preventing potential issues before they impact your applications. However, monitoring a large deployment of Amazon Aurora PostgreSQL instances poses significant challenges, particularly when resources are limited. The sheer volume of data and the complexity of managing thousands of instances make manual health checks impractical for a small team of database administrators (DBAs). Administrators need a comprehensive view of the overall health of the fleet, identifying how many instances are healthy and how many are not. They also need to collect performance data over extended periods to identify persistent issues. An automated, scalable, and streamlined monitoring solution that assists them in focusing on specific issues with unhealthy instances is needed.

In this project, we automated and streamlined the monitoring tasks for large-scale deployments by leveraging AWS managed services for scalability and efficiency. Users can define their own standards for what constitutes an “unhealthy” instance and configure the observing period to establish historical trends over time. Summary reports and visuals provide an overall view of the general status of the entire fleet. By logging the performance data for the potentially “unhealthy” instances into our system tables, users can focus on a smaller subset of instances, dive into the details of the findings, and use the built-in troubleshooting guide as a reference. All of this makes troubleshooting much easier to handle, significantly reducing the overall monitoring effort for customers with large fleets of databases.

Project Outcome

Main View

Detailed View

Solution Architecture

Deployment

Prerequisites

An AWS Account.
An IAM user with permissions to deploy AWS CloudFormation templates, install AWS CDK and AWS CLI and set up RDS.
A fleet of Aurora PostgreSQL clusters with each instance configured with Aurora PostgreSQL logs published to Amazon CloudWatch Logs. The project outcome visuals are derived from a setup to monitor 7 Aurora PostgreSQL clusters.
You also need to define which metrics to monitor. You can monitor both cluster level and instance level metrics listed in the documentation based on the observability requirements. For demo purpose, we choose the following three instance level metrics:

• BufferCachehitRatio (%)

• CPUUtilization (%)

• FreeableMemory (bytes)

In the solution, you have the option to define the values of below measures. An instance will only be reported as “unhealthy” when all the conditions are met. • Metrics name: the instance metric we monitor • Threshold: Specific value for the metric • Metric Interval: The duration of time over which a metric data point is collected and aggregated. For example: one hour • Statistics: type of statistics of a metric. For example: Minimum

In this project, the default metric setup to check the health of a database instance is BufferCachehitRatio at a minimum of 90%, CPUUtilization at a maximum of 80%, FreeableMemory at a minimum of 2 GB.

Deploy

Clone this repository and install the component dependencies using the below commands:

git clone https://github.com/aws-samples/monitoring-aurora-postgresql-health-large-scale-deployments.git
cd web
npm install 
cd ..
cd server
npm install

Deploy using following command:

cd server
sourceIp=PUBLIC_IP_GOES_HERE npm run deploy

Replace PUBLIC_IP_GOES_HERE value with the public IP of your machine. This makes sure that the API gateway is only accessible from your machine and not publically accessible. You can retrieve your public IP using a tool like whatsmyip.

After the deployment is successful, you can use the URL from the output to launch the application. Note that the solution captures the metrics for the databases every hour by default and will monitor the Aurora PostgreSQL database instances only.

Cleanup

You can clean up using this command:

cd server
cdk destroy --all

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
images		images
server		server
web		web
.DS_Store		.DS_Store
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Contributing.md		Contributing.md
LICENSE		LICENSE
architecture.drawio		architecture.drawio
package-lock.json		package-lock.json
readme.md		readme.md
run_codeguru_security.sh		run_codeguru_security.sh
runsecurityscan.sh		runsecurityscan.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Project Outcome

Main View

Detailed View

Solution Architecture

Deployment

Prerequisites

Deploy

Cleanup

About

Releases

Packages

Contributors 5

Languages

License

aws-samples/monitoring-aurora-postgresql-health-large-scale-deployments

Folders and files

Latest commit

History

Repository files navigation

Introduction

Project Outcome

Main View

Detailed View

Solution Architecture

Deployment

Prerequisites

Deploy

Cleanup

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages