Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add limits to get records #87

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

kailashhd
Copy link

No description provided.

@kailashhd
Copy link
Author

Adding a test for this now. Early changes to check if the approach looks good.

@kailashhd
Copy link
Author

@mhart Request for a review.

@mhart
Copy link
Owner

mhart commented Mar 5, 2019

At first glance it looks... a bit odd? Like, sumOfBytes is just going to increase each time you call the function – so if you make two requests with 3MB then the second one will go over. Also, I'm not sure what mathjs is for? And in general I prefer to keep the number of dependencies down – so would prefer not to depend on object-sizeof either.

Also, it would be good to have a test that is one byte under 5MB and one that is just one byte over – so both paths can be tested.

@kailashhd
Copy link
Author

Thank you very much for the response.
I end up resetting the sumOfBytes towards the end of the readStream so after the getRecords call returns, the sumOfBytes will be reset to zero.

Yes I agree it is good to have limited dependencies. But could not have a good way to calculate the size of an object without using object-sizeof(Nodejs noob here). The mathjs is pulled in to the dependencies for the pow function (I can remove that).

I will add the tests and will request for a review soon. Thanks a lot for your time reviewing this.

@mhart
Copy link
Owner

mhart commented Mar 6, 2019

Ah, I see – yeah, JavaScript has a built-in global Math object, so Math.pow is available without needing extra libraries: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/pow

In any case, I prefer 5 * 1024 * 1024, I think it's easier to read.

Leave object-sizeof in there for now, I'll have a look at the PR when you're done with it and see if it's worth including, or whether a simple function would do.

In terms of your sumOfBytes comment – if two read streams are occurring at once, then it won't be correct. (JS is single-threaded, but streams are asynchronous and chunked, so multiple streams could be active at once). Best to just have sumOfBytes as a bound variable in the function.

If you address the tests, I can clean up the JS code as best I can.

@kailashhd
Copy link
Author

kailashhd commented Mar 9, 2019

I added the tests as requested. I was also able to remove both the object and math dependencies. I am not sure if I am handling all the possible exception correctly or not. Do let me know if I can improve that.

Also do i need to keep the package-lock consistent to the previous version? (I maybe using the newer libraries and ended up creating a new package-lock file).

Also would it make sense for me to squash and create 1 single commit?

@kailashhd
Copy link
Author

@mhart Apologies for the delay in addressing code review comments. Request for review.

@kailashhd
Copy link
Author

@mhart Gentle remainder.

actions/getRecords.js Outdated Show resolved Hide resolved
@kailashhd
Copy link
Author

@mhart Gentle remainder.

@mhart
Copy link
Owner

mhart commented Mar 24, 2019

@kailashhd I'm struggling to reproduce this on production – I'm just reading over the documentation now at https://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetRecords.html and it states "each shard can read up to 2 MiB per second" and "The maximum size of data that GetRecords can return is 10 MiB"

So I'm not sure where the 5MiB figure came from? When I run your test (on production) it only retrieves one record, not five – I'm guessing that's because it's running into the 2MiB per second limit.

@kailashhd
Copy link
Author

@mhart Apologies. I must have used the limits from Put records: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecords.html instead of the value from the get-records. That's where the 5 MB comes from.

So my tests should be returning 5 records (atleast that's what I am asserting) in 1 single response. It should not hit the 2 MB limit because put_records ensure that each record is a maximum of 1 MB. (Andin leveldb with kinesalite we don't change the records at all).

That's the reason I have included other test will have a total of 15 records where it will be returned over 3 different responses.

@mhart
Copy link
Owner

mhart commented Mar 25, 2019

When you put those records into stream on Kinesis (I mean, on a real AWS Kinesis stream) – and then retrieve them with GetRecords – you're saying you get all five back? I only get one back.

@kailashhd
Copy link
Author

This was my test: 1) Created a bunch of records of size ~1 MB 2) Used the following aws CLI command to get shard iterator aws kinesis get-shard-iterator --stream-name <test-stream> --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON 3) Got records aws kinesis get-records --shard-iterator <shard-iterator>. In this case I got all the 5 records at 1 go.

@mhart
Copy link
Owner

mhart commented Mar 25, 2019

You need to try with --no-paginate otherwise the AWS CLI will automatically keep making GetRecords calls until NextShardIterator is empty

@kailashhd
Copy link
Author

Strangely even with the --no-paginate, I am still able to get all the 5 records at 1 go. I also tried to produce 5 additional records (total of 10 records) and did a get-records. Still got only 5 records (as I was close to the 10 MiB mark with 5 records).

My aws version: aws-cli/1.16.70 Python/3.7.1 Darwin/17.7.0 botocore/1.12.60
I wonder if we are using different versions.

@mhart
Copy link
Owner

mhart commented Mar 26, 2019

Hmmm, strange – maybe it's an issue with the PutRecords call in your test then? Maybe it's only putting one record? I'll double check tomorrow.

@mhart
Copy link
Owner

mhart commented Mar 26, 2019

Ah yeah, that's the issue. Here's the return from the PutRecords call:

[ { SequenceNumber: '49594189147623985939251953501666965394405037144469929986',
    ShardId: 'shardId-000000000000' },
  { ErrorCode: 'ProvisionedThroughputExceededException',
    ErrorMessage:
     'Rate exceeded for shard shardId-000000000000 in stream __kinesalite_test_415476646818213 under account XXXXXXXXXXX.' },
  { ErrorCode: 'ProvisionedThroughputExceededException',
    ErrorMessage:
     'Rate exceeded for shard shardId-000000000000 in stream __kinesalite_test_415476646818213 under account XXXXXXXXXXX.' },
  { ErrorCode: 'ProvisionedThroughputExceededException',
    ErrorMessage:
     'Rate exceeded for shard shardId-000000000000 in stream __kinesalite_test_415476646818213 under account XXXXXXXXXXX.' },
  { ErrorCode: 'ProvisionedThroughputExceededException',
    ErrorMessage:
     'Rate exceeded for shard shardId-000000000000 in stream __kinesalite_test_415476646818213 under account XXXXXXXXXXX.' } ]

@kailashhd
Copy link
Author

Ha I did not realize that was the issue. The max 5MB request size for put records defined here: https://docs.aws.amazon.com/cli/latest/reference/kinesis/put-records.html works only if each shard gets a max of 1 MB per second. Since we are testing with 1 shard, this is causing problems. Let me try to make changes to the test to do sequential put_record instead of put_records tomorrow. Will also change it to be lessthan10Mb instead of 5MB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants