Add --unbuffered option to sbp2json.py #1400

dgburr · 2024-01-10T01:43:30Z

Description

Add a new --unbuffered command line option to sbp2json.py which enables a mode where it reads and parses messages one-at-a-time. The existing behaviour (which remains the default) waits for 4096 bytes before parsing, which means that when decoding real-time streams which only output messages intermittently, there are often large pauses where no output is produced, followed by a large batch of decoded messages at once.

The rust version of sbp2json does not have this issue. However, there are various use cases (e.g. embedded platforms) where it is difficult (if not impossible) to build the rust tools, but the python version works just fine.

API compatibility

Does this change introduce a API compatibility risk?

No. The previous behaviour remains the default.

API compatibility plan

N/A

JIRA Reference

https://swift-nav.atlassian.net/browse/BOARD-XXXX

sonarcloud · 2024-01-10T01:58:34Z

Quality Gate passed for 'libsbp-c'

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

pcrumley

It would be nice to have a test of a very small message that just lived in buffer land so that it returns when --unbuffered is passed but not without that flag.

Also... I commented in line but I am unsure about this strategy of flushing the buffer by pointing it to a slice in a binary string. I don't know if the earlier memory will actually be freed that way. Do you have some explanation of python that can assuage me there or have some test to prove the buf that isn't part of the slice is garbage collected?

dgburr · 2024-01-10T02:35:39Z

Also... I commented in line but I am unsure about this strategy of flushing the buffer by pointing it to a slice in a binary string. I don't know if the earlier memory will actually be freed that way. Do you have some explanation of python that can assuage me there or have some test to prove the buf that isn't part of the slice is garbage collected?

From what I can find online (e.g. the first answer to https://stackoverflow.com/questions/64871329/does-string-slicing-perform-copy-in-memory and https://discuss.python.org/t/string-slicing-in-python/25651) it seems that making a slice makes a copy of the original PyObject (except for the the special case where the slice corresponds to the entire string). So the buf symbol will be changed to refer to the new PyObject, meaning that the reference count on the old PyObject will drop to 0, resulting in it getting garbage collected.

dgburr · 2024-01-10T02:43:50Z

It would be nice to have a test of a very small message that just lived in buffer land so that it returns when --unbuffered is passed but not without that flag.

To be clear: running with or without the --unbuffered flag does not (should not!) make any difference to the final output which is produced. The only difference when running with the --unbuffered option is the delay between the input and the output when the rate of input is very low.

pcrumley · 2024-01-10T05:36:15Z

Also... I commented in line but I am unsure about this strategy of flushing the buffer by pointing it to a slice in a binary string. I don't know if the earlier memory will actually be freed that way. Do you have some explanation of python that can assuage me there or have some test to prove the buf that isn't part of the slice is garbage collected?

From what I can find online (e.g. the first answer to https://stackoverflow.com/questions/64871329/does-string-slicing-perform-copy-in-memory and https://discuss.python.org/t/string-slicing-in-python/25651) it seems that making a slice makes a copy of the original PyObject (except for the the special case where the slice corresponds to the entire string). So the buf symbol will be changed to refer to the new PyObject, meaning that the reference count on the old PyObject will drop to 0, resulting in it getting garbage collected.

TIL. I suppose that is why the other implementation uses memoryview. I do think using memory view should be possible here and you can see you are allocating several times per message, do you think this affects performance in a measureable way

dgburr · 2024-01-10T13:19:06Z

TIL. I suppose that is why the other implementation uses memoryview. I do think using memory view should be possible here and you can see you are allocating several times per message, do you think this affects performance in a measureable way

I ran the following tests a few times and consistently got results like this:

$ time sbp2json < benchmark.sbp > benchmark.json.buffered

real	0m14.717s
user	0m14.926s
sys	0m1.489s
$ time sbp2json --unbuffered < benchmark.sbp > benchmark.json.unbuffered

real	0m14.438s
user	0m14.665s
sys	0m1.561s

The pattern which I observed was as follows:

real and user values are slightly lower with the unbuffered version
sys values are (very) slightly higher with the unbuffered version

This makes sense to me, given that:

The buffered version will make less fewer calls to read(), explaining why it has a lower value for sys
The unbuffered version will perform less shuffling of bytes in memory - in an ideal case, the unbuffered version will read exactly the amount of bytes required, parse one message and empty the buffer. But the buffered version will read 4096 bytes, parse multiple message from the buffer, then move any leftover bytes back to the beginning of the buffer.

But at the end of the day this is all immaterial since previous profiling has shown that sbp2json.py spends the vast majority of its time in the call to sbp.table.dispatch(msg) rather than the actual parsing code.

Also, I do not think that a memoryview is relevant here since they can only be used to represent fixed sized objects (per the comment "Resizing is not allowed" in https://docs.python.org/dev/library/stdtypes.html#typememoryview).

pcrumley

Looks good. Thanks for walking me through the PR!

dgburr added 3 commits January 10, 2024 02:26

Add --unbuffered option to sbp2json.py

58fa7ac

Fix permissions on .py files

b43776b

Add test case

900b813

dgburr requested a review from a team as a code owner January 10, 2024 01:43

pcrumley reviewed Jan 10, 2024

View reviewed changes

pcrumley approved these changes Jan 10, 2024

View reviewed changes

dgburr merged commit 22f5e7a into master Jan 10, 2024
8 checks passed

dgburr deleted the dburr/sbp2json.py-unbuffered branch January 10, 2024 14:50

dgburr mentioned this pull request Apr 4, 2024

Fix kaitai py tests #1410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --unbuffered option to sbp2json.py #1400

Add --unbuffered option to sbp2json.py #1400

dgburr commented Jan 10, 2024 •

edited

Loading

sonarcloud bot commented Jan 10, 2024

pcrumley left a comment •

edited

Loading

dgburr commented Jan 10, 2024 •

edited

Loading

dgburr commented Jan 10, 2024 •

edited

Loading

pcrumley commented Jan 10, 2024 •

edited

Loading

dgburr commented Jan 10, 2024 •

edited

Loading

pcrumley left a comment

Add --unbuffered option to sbp2json.py #1400

Add --unbuffered option to sbp2json.py #1400

Conversation

dgburr commented Jan 10, 2024 • edited Loading

Description

API compatibility

API compatibility plan

JIRA Reference

sonarcloud bot commented Jan 10, 2024

Quality Gate passed for 'libsbp-c'

pcrumley left a comment • edited Loading

Choose a reason for hiding this comment

dgburr commented Jan 10, 2024 • edited Loading

dgburr commented Jan 10, 2024 • edited Loading

pcrumley commented Jan 10, 2024 • edited Loading

dgburr commented Jan 10, 2024 • edited Loading

pcrumley left a comment

Choose a reason for hiding this comment

dgburr commented Jan 10, 2024 •

edited

Loading

pcrumley left a comment •

edited

Loading

dgburr commented Jan 10, 2024 •

edited

Loading

dgburr commented Jan 10, 2024 •

edited

Loading

pcrumley commented Jan 10, 2024 •

edited

Loading

dgburr commented Jan 10, 2024 •

edited

Loading