Real-time Data-Stream Processing using AWS Kinesis

This solution is a reference implementation for AWS Kinesis service and its streaming data capabilities. The primary objective is to understand how to ingest large amount of real-time data into a serverless AWS cloud application and store it in a database. The solution can be extended by using AWS services like Elasticsearch, Comprehend, SageMaker etc. to process the real-time data. It is based on a sample solution proposed by AWS.

Solution Components

A python script that will use Twitter API to retrieve tweets in real-time and send them to Kinesis.
The CloudFormation template contains the following:
- A Kinesis data stream to ingest tweets in real-time.
- A Lambda function that reads data from Kinesis and stores them in a DynamoDB table.
- A DynamoDB table to store data that can be later used for analytics etc.

Twitter Client (data-stream producer)

The solution includes a python script which generates a stream of tweets in real-time. The tweets are filtered by a user specified keyword and are then sent to an AWS Kinesis Data Stream.

Usage:

python twitter-client.py <filter keyword>

For example:

python twitter-client.py pizza

Input Parameters

The Cloudformation template requires the following input:

LambdaS3BucketName: S3 bucket name where Lambda code resides
LambdaZipfileName: Lambda code zipfile name (default: index.zip)
LambdaHandler: Lambda code handler name (default: index.handler)

The python script which generates data-stream requires the following environment variables to be set:

TWITTER_API_CONSUMER_KEY
TWITTER_API_CONSUMER_SECRET
TWITTER_API_TOKEN_KEY
TWITTER_API_TOKEN_SECRET
AWS_ACCESS_KEY_ID
AWS_SECRET_KEY
AWS_REGION
AWS_STREAM_NAME

Instructions

To invoke Twitter APIs and read the tweets, you must have a Twitter Developer account.
Deploy the CloudFormation template in AWS
Set the environment variables required for the python script
Execute the python script by providing a filter keyword
The real-time tweets will be sent to Kinesis and then stored in a DynamoDB table named <stack-name>-EventData

Cleanup

Stop the python script
Delete the CloudFormation stack

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
src		src
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time Data-Stream Processing using AWS Kinesis

Solution Components

Twitter Client (data-stream producer)

Usage:

Input Parameters

Instructions

Cleanup

About

Releases

Packages

Languages

License

oyenamit/aws-kinesis-stream-processing

Folders and files

Latest commit

History

Repository files navigation

Real-time Data-Stream Processing using AWS Kinesis

Solution Components

Twitter Client (data-stream producer)

Usage:

Input Parameters

Instructions

Cleanup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages