-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add S3Uploader to Flow Aggregator #4143
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4143 +/- ##
==========================================
- Coverage 66.31% 60.06% -6.25%
==========================================
Files 304 383 +79
Lines 46613 54256 +7643
==========================================
+ Hits 30911 32589 +1678
- Misses 13296 19222 +5926
- Partials 2406 2445 +39
|
5c03a61
to
80a6123
Compare
I found an issue, and I haven't work out a solution. The .gz file generated by the latest commit CANNOT be opened by Mac's app Archive Utility ↑, but CAN be opened with command Compare the latest commit - write record to the buffer one by one, |
@heanlan I didn't take an in-depth look since you still need to make changes to the code, but maybe you need to call Note that I think it's simpler just to allocate a new Gzip writer each time. |
Thanks. I called |
@heanlan
we probably need 2 queues:
we need a single lock to protect currentBuffer and Q1. then in pseudo-code: func CacheRecord(r) {
lock()
defer unlock()
writeToCurrentBuffer(r)
if currentBufferIsFull() {
addCurrentBufferToQ1()
resetCurrentBuffer()
}
}
func Upload() {
lock()
newBuffers = getAllBuffersFromQ1()
unlock()
sendAllBuffers() // newBuffers + Q2 buffers
foreach failedBuffer {
addBufferToQ2(failedBuffer) // enforce max length for Q2
}
}
func TimerFunction() {
lock()
addCurrentBufferToQ1()
resetCurrentBuffer()
unlock()
Upload()
} let me know what you think. Sorry for some of my comments that were misleading. When reading from the buffers during upload, we need to wrap the reader := `bytes.NewReader(buffer.Bytes())` This doesn't make any copy. |
Thanks @antoninbas . I have two questions:
Why it will typically have <=1 buffers? Because we set the
foreach failedBuffer {
addBufferToQ2(failedBuffer) // enforce max length for Q2
} In the design, we are making actual CSV data copies for |
Yes, but we can also consider waking up the upload goroutine when we add a buffer to Q1, without waiting for the next timer to fire. This can be done with a signal channel for example.
Yes, it it only to handle upload failures. If we didn't need to handle upload failures, we would be ok with a single queue.
No we don't make any copy. We remove the |
I didn't realize before that we are removing buffers from Q1. It looks good then. Thanks. |
Signed-off-by: Antonin Bas <[email protected]>
Hi @antoninbas , thanks for the suggestion. I've implemented it. Please help check.
I didn't add this for now. I think it will bring possible concurrent read&write to Q2. And the second half of I didn't add the logging for number of flow records has been uploaded either, just for simplicity. Please let me know if you believe we do need any of these two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you still have the same issue with the generated gzip files? because I don't remember having this issue when I prototyped that code in the past.
I don't think so. All the accesses to Q2 would still happen in |
Yes, I have. Yes, your poc code doesn't have this issue. And my first commit doesn't have this issue. Both of them creates the gzipWriter, write the file at one time, and close it. The current version write the file every time it caches a record. I suppose it should be the known issue with Archive Utility: https://apple.stackexchange.com/questions/388759/archive-utility-cant-open-some-gzipped-text-files-based-on-their-contents |
Yes, you are right. I was thinking wrong. I have implemented a Yes, it should work well with the current Q1&Q2 approach. We can add it later if we need it. |
could you check that you do get |
I wanted to check that. But I didn't find a way to run the app in terminal. I don't know how they did it. That's why I say I don't know how to verify it is the cause. I've tried the following, they still pop out the UI window.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds S3Uploader as a new exporter of Flow Aggregator. It periodically exports expired flow records from Flow Aggregator to AWS S3 storage bucket. Signed-off-by: heanlan <[email protected]>
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving again after squash
@dreamtalen any more comments on this? |
No, LGTM |
This PR adds S3Uploader as a new exporter of Flow Aggregator. It
periodically exports expired flow records from Flow Aggregator
to AWS S3 storage bucket.
Signed-off-by: heanlan [email protected]