This project automates setup of Cost and Usage Reports (CUR) in a billing account with an Athena table enabling querying of the latest data for each month. This is an alternative to the pre configured athena setup provided by the Billing Team and is more suited to customers with 10s or 100s of CUR files provided in each report.
In summary this project deploys:
- Creates the CUR in the billing service and the bucket which receives the reports.
- Configures a Glue Database and Table for use by Athena.
- Deploys a lambda to manage the partitions using eventbridge S3 events.
Once deployed all you need to do is wait till AWS pushes the first report to the solution, this can take up to 8 hours in my experience, then you should be able to log into Athena and start querying the data.
The goal of this project is to provide a consistent view of cost and usage reports (CUR) at all times in a Athena table. To do this we use hive symlinks, which are updated each time a new snapshot arrives providing an atomic single file update for new CURs. This is different to the status table provided by AWS Billing team in the pre configured Athena setup.
When you enable the option to keep all versions of the CUR, AWS will upload a new snapshot then once complete update the manifest containing a list of file paths for that billing period. We use an s3 file create event to trigger reading of that manifest and creation of a symlink in the hive directory we maintain in the same bucket. This provides Athena with a partitioned structure to query without worrying about CUR files being updated while it is reading them.
- An AWS account.
- AWS SAM CLI.
- Exported environment variables for
AWS_DEFAULT_REGION
,AWS_REGION
andAWS_PROFILE
, I use direnv to maintain these variables in a.envrc
file.
First you will need to deploy the bucket we use to store lambda and CFN artifacts.
make deploy-bucket
Deploy the solution.
make deploy
This will deploy the following components:
- Setup a bucket for the CUR in the region you configured via
AWS_DEFAULT_REGION
. - Create a CUR in the billing service.
- Deploy the Glue database and table used by Athena.
- Deploy the Athena workspace with an encrypted secure S3 bucket for artifacts.
- Deploy the template which manages hive symlinks and partitions in Athena when new reports arrive.
The Cost Allocation Tags in billing allows you to record data which is included in the CUR. This is a great resource for attributing cost to a user, role or service, or alternatively a cloudformation stack.
I enable the following AWS tags for collection and inclusion in the CUR.
aws:cloudformation:stack-name
aws:createdBy
I also enable some of my own custom tags for collection and inclusion in the CUR.
application
component
branch
environment
You can see how these are added in the Makefile
when the stacks are launched, the tags assigned to the cloudformation stacks are mostly propagated to the resources which they create.
There are some great resources with queries which provide insights from your CUR data, one of the best is Level 300: AWS CUR Query Library from the The Well-Architected Labs website.
This project is released under Apache 2.0 license and is copyright Mark Wolfe.