-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inappropriate implementation of LTC_CLEAN_STACK
using burn_stack()
api
#486
Comments
Looking at reason behind introduction of So I have tried to come up with below approach to clear stack in corresponding function only and avoid multiple Approach 1
Above approach works if compiler allocates stack variables at contiguous stack memory locations which might not be true in case padding involved which could lead to some stack memory uncleaned (equal to size of padding). Approach 2Multiple
I would be happy to hear any feedback/suggestions regarding which of above approach seems most suitable. [1] Line 1283 in 3fb0eea
|
You could also stick everything in a struct...
|
I prefer "Approach 2" as it keeps it simple without making any assumptions about the stack layout. Regarding performance I think either approach will beat the old burnstack approach. |
@ksherlock Performance wise approach suggested by you looks better than "Approach 2". The only thing that doesn't looks good is a |
Any more preferences among above 3 approaches (Approach 3: suggested by @ksherlock)? Or any further suggestions to address this issue? |
because I'm not certain yet on how to solve it best... I like approach 3, but prefer 2 for its simplicity and no macro magic I already thought about an approach 4 where we leave the |
What should be the appropriate buffer size? Since we can't have too big buffer for APIs that only need to clean few bytes and too small buffer that caused this overshoot. Overall this approach still seems to be fragile. |
I prefer #2 for its simplicity.
|
@karel-m 2 fine for you as well? |
I do not use |
Thanks everyone for your feedback. Will use "Approach 2" to create a fix patch. |
Does anyone think it's worth to keep the old approach alive? (probably as |
if the initial analysis posted by OP is correct, the old approach uses undefined behaviour. i don't think that should be kept.
explicit zeromem calls for every single (sensitive) local variable seems to be an approach that leads to bigger functions (both in code as in binary), as well as error-prone because the programmer needs to manually take care of every single variable. if we instead stuff everything into a struct, there's only a single call to zeromem, and one can simply put new vars into the struct and forget about adding another call with the right arguments. |
It seems likely that we can't get agreement on one of above discussed approaches. So how about following simple change to use variable length array instead of recursive call in
It seems to resolve stack overflow issue and does stack clean operation too. This change is something that was suggested by @daniel-thompson. |
Except that it still relies on undefined behavior. |
using vla's is the worst possible choice. there have been countless numbers of CVE's due to it. |
@rofl0r: I agree that in general VLAs should be avoided... but I don't agree that they are the worst possible choice since I think that spot is already occupied by the existing burn_stack() code ;-) ! I've got no skin in the game here but FWIW I like approach 3 best, either with macros or just making it such a common idiomatic construct across the library that nobody minds giving it a single letter variable name. |
@rofl0r Agree but with a minimal change like VLA, we could improve the Anyway, let me try to work on a common variable declaration construct that may make |
Sorry for the delay, but I was busy in the last weeks.
If that's the case then I also think that the entire The VLA approach would indeed be the least intrusive change, but then it uses VLA's which officially exist only since c99 ... so a clear no.
that's already the case, so this wouldn't change ;-) but that's also one of the main reasons why I like approach 3, automatic determination of the size of what to clear.
I'm looking forward to an improved version of approach 3! |
Description
Some parts of
LibTomCrypt
have been imported in an open source projectOP-TEE
[1] for implementing crypto in software (imported code can be found here [2]). But recently we have enabledLTC_CLEAN_STACK
so thatLibTomCrypt
wipes out key material and other sensitive data once no longer used.Given stack space is quiet expensive for
optee_os
and currently its limited to 2k only. So while testing on ARM64 machine withLTC_CLEAN_STACK
enabled,burn_stack()
behaved inappropriately burning stack space 2 times the actual requirement leading to corruption of stack canaries.Looking deeper into
burn_stack()
api [3], it seems to be a recursive function which assumes that only 32 bytes of stack space is utilized in every recursive call. But this assumption is incorrect as it doesn't take into account the usage of stack space during function calls. Specifically for function return pointer, stack frame pointer, any padding compiler introduces etc.[1] https://optee.readthedocs.io/
[2] https://github.com/OP-TEE/optee_os/tree/master/core/lib/libtomcrypt/src
[3] https://github.com/libtom/libtomcrypt/blob/develop/src/misc/burn_stack.c
Steps to Reproduce
Simply add a debug print to dump address of 32 bytes buffer kept on stack in
burn_stack()
api as follows:Following are
buf addr
dumps which are taken on ARM64 machine whereburn_stack()
api is used to clean stack for_sha512_compress()
api. Size to be cleared: 724 bytes but actual size cleared from below dump: 1472 bytes.Version
This could be reproduced on latest upstream
develop
branch.Suggestion
We would suggest to clean stack in corresponding function only where it is being used. As it seems one can't assume anything about what stack space has been consumed by the compiler for a given function call.
The text was updated successfully, but these errors were encountered: