This solution is a reference implementation for AWS Kinesis service and its streaming data capabilities. The primary objective is to understand how to ingest large amount of real-time data into a serverless AWS cloud application and store it in a database. The solution can be extended by using AWS services like Elasticsearch, Comprehend, SageMaker etc. to process the real-time data. It is based on a sample solution proposed by AWS.
- A python script that will use Twitter API to retrieve tweets in real-time and send them to Kinesis.
- The CloudFormation template contains the following:
- A Kinesis data stream to ingest tweets in real-time.
- A Lambda function that reads data from Kinesis and stores them in a DynamoDB table.
- A DynamoDB table to store data that can be later used for analytics etc.
The solution includes a python script which generates a stream of tweets in real-time. The tweets are filtered by a user specified keyword and are then sent to an AWS Kinesis Data Stream.
python twitter-client.py <filter keyword>
For example:
python twitter-client.py pizza
The Cloudformation template requires the following input:
- LambdaS3BucketName: S3 bucket name where Lambda code resides
- LambdaZipfileName: Lambda code zipfile name (default: index.zip)
- LambdaHandler: Lambda code handler name (default: index.handler)
The python script which generates data-stream requires the following environment variables to be set:
- TWITTER_API_CONSUMER_KEY
- TWITTER_API_CONSUMER_SECRET
- TWITTER_API_TOKEN_KEY
- TWITTER_API_TOKEN_SECRET
- AWS_ACCESS_KEY_ID
- AWS_SECRET_KEY
- AWS_REGION
- AWS_STREAM_NAME
- To invoke Twitter APIs and read the tweets, you must have a Twitter Developer account.
- Deploy the CloudFormation template in AWS
- Set the environment variables required for the python script
- Execute the python script by providing a filter keyword
- The real-time tweets will be sent to Kinesis and then stored in a DynamoDB table named
<stack-name>-EventData
- Stop the python script
- Delete the CloudFormation stack