Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sophus ensure calls abort() #533

Closed
TLescoatTFX opened this issue Dec 15, 2023 · 10 comments
Closed

Sophus ensure calls abort() #533

TLescoatTFX opened this issue Dec 15, 2023 · 10 comments
Labels

Comments

@TLescoatTFX
Copy link

Describe the bug

In my program, Sophus called std::abort():

Sophus ensure failed in function 'static SO3<Scalar> Sophus::SO3<float>::expAndTheta(const Tangent &, Scalar *) [Scalar_ = float, Options = 0]', file '/opt/homebrew/include/sophus/so3.hpp', line 729.
SO3::exp failed! omega: nan nan nan, real: nan, img: nan

A library shall never ever call std::abort(), std::terminate(), std::exit(), ... It does not know the gravity of the error and the recovery policy of the caller, so please do not impose one.

To Reproduce

  1. Fail any SOPHUS_ENSURE. This is implemented with SOPHUS_DEDAULT_ENSURE_FAILURE_IMPL, which in turns calls std::abort().

Expected behavior

Default implementation should raise an exception. Or just keep going, but don't kill the program !

Sophus version: commit hash

main-1.x:

std::abort(); \

@fwcore
Copy link
Contributor

fwcore commented Dec 18, 2023

SOPHUS_ENSURE can be turned off by setting SOPHUS_DISABLE_ENSURES, see

#if defined(SOPHUS_DISABLE_ENSURES)

In principle, one can enable SOPHUS_ENSURE in debug mode for safety and turn it off in release mode for performance by cmake.

Hope this can help.

@TLescoatTFX
Copy link
Author

Thanks, but I think the default shall not be aborting. It should be throwing an exception for example, which would crash anyway if it's not catched.

By the way, encountering NaN is hardly a situation for aborting a program, NaNs handling depends a lot on the application.

So it boils down to Sophus providing better defaults:

  • in debug, throw an exception or trigger a breakpoint
  • in release, at most print the error, but do not crash

@fwcore
Copy link
Contributor

fwcore commented Dec 22, 2023

@TLescoatTFX Thanks for your suggestions. After checking openCV, I think your suggestion is quite reasonable.

Here is what openCV do:
https://github.com/opencv/opencv/blob/853e5dfcdf091023b78283d6dc03d91fd6763495/modules/core/doc/intro.markdown?plain=1#L296-L298

#CV_Assert(condition) macro that checks the condition and throws an exception when it is not
satisfied. For performance-critical code, there is #CV_DbgAssert(condition) that is only retained in
the Debug configuration.

Interestingly, I also noticed that abort is used in the definition of those macros specifically for static analyzer. I don't know why. Here is the relevant code:
https://github.com/opencv/opencv/blob/853e5dfcdf091023b78283d6dc03d91fd6763495/modules/core/include/opencv2/core/base.hpp#L300-L306

#ifdef CV_STATIC_ANALYSIS

// In practice, some macro are not processed correctly (noreturn is not detected).
// We need to use simplified definition for them.
#define CV_Error(code, msg) do { (void)(code); (void)(msg); abort(); } while (0)
#define CV_Error_(code, args) do { (void)(code); (void)(cv::format args); abort(); } while (0)
#define CV_Assert( expr ) do { if (!(expr)) abort(); } while (0)

@TLescoatTFX
Copy link
Author

TLescoatTFX commented Dec 22, 2023

I'm glad you find such suggestion reasonable ! The behavior of OpenCV seems also good, I think they put the abort() so that the static analysis understands better that the condition in assert shall always be true (a workaround, maybe ?)

@strasdat
Copy link
Owner

strasdat commented May 1, 2024

A library shall never ever call std::abort(), std::terminate(), std::exit(),

Well, in c++ libraries can do anything when preconditions are violated. In my opinion, calling abort is much better than UB.

@strasdat strasdat closed this as completed May 1, 2024
@TLescoatTFX
Copy link
Author

Hard disagree:

  • NaN are not UB, from my understanding it is implementation-defined (but everyone use IEEE754).
  • abort() gets in the way of debuggers, while also hiding the error. This is not just terrible, this is the worst possible way to report an error. If you have a bit of respect for your user's time, use OpenCV behavior please.
  • Doing anything when preconditions are violated is not a C++ thing, this tautology is obviously true for other languages. But most libraries will kindly report an error and not crash in the presence of a minor error...

Anyway, that's your library, you do what you want, but fixing that faulty behavior would definitely be an improvement

@Neoyning
Copy link

Hard disagree:

* NaN are _not_ UB, from my understanding it is implementation-defined (but everyone use IEEE754).

* abort() gets in the way of debuggers, while also hiding the error. This is not just terrible, this is the worst possible way to report an error. If you have a bit of respect for your user's time, use OpenCV behavior please.

* Doing anything when preconditions are violated is not a C++ thing, this tautology is obviously true for other languages. But most libraries will kindly report an error and not crash in the presence of a minor error...

Anyway, that's your library, you do what you want, but fixing that faulty behavior would definitely be an improvement

Yeah, agree~ "The program has unexpectedly finished." with no reason is pretty hard for the user to debug.

@strasdat
Copy link
Owner

strasdat commented Jun 3, 2024

@TLescoatTFX, thanks for you candid feedback. So let me elaborate.

In sophus, NAN's on input variables are undocumented (I have to admit) precondition violations. When I will find some time, I make these preconditions more explicit in the comments.
On on precondition violations, the library if free to do anything, and SOPHUS chooses to call std::abort by default.

  • abort() gets in the way of debuggers, while also hiding the error. This is not just terrible, this is the worst possible way to report an error. If you have a bit of respect for your user's time, use OpenCV behavior please.

I know there are different schools of thoughts, but in my workflow (as well as the workflows of the companies I worked at), it is actually the opposite. In this case, I will get an error message with a line number of the root of the error without the need to run it with a debugger. And when running it with the debugger, I still get a full stack-trace with std::abort (but not sure this is specific to my setup somehow...).

I also believe that it is the opposite of hiding an error. In most cases, such precondition violations, NANs or not normalized quaternions inputs are often caused by bugs in parsing code or numeric optimization routines. Throw an exception or side-stepping the case (e.g. by re-normalizing the quaternion silently), is actually hiding those non-trivial problems, and failing on a precondition violation makes them explicit, and causing a runtime abort often very close to the actual root cause.
I speaking from experience here (on large production mono-repos).

(A future API extension - following my opinionated design - could offer various tryGetFoo(...) overloads for GetFoo(...) which would return an std::expected<...> type. This way you can have an API where the set of preconditions is reduced.)


But, I know that there is a point of view where folks do not like std::abort on precondition violations. That's why there is the custom SOPHUS_ENABLE_ENSURE_HANDLER where you can hook in your own error handling (e.g. OpenCV's).

Having that said, I have not used SOPHUS_ENABLE_ENSURE_HANDLER and similar macros in a while - since I'm mainly working on sophus2 and sophus-rs.

So please, @TLescoatTFX, if you have some PRs to make this easier to plug in your own error handling system, I'm more than happy to accept them (and if CI passes). And please ping me since I'm not that often review the Sophus (1) repo anymore.

@strasdat
Copy link
Owner

strasdat commented Jun 9, 2024

abort() gets in the way of debuggers, while also hiding the error.

I just tried it out and I'm getting this stack-trace in GDB with the corresponding line number (compiled with RelWithDebInfo):

Sophus ensure failed in function 'int main()', file '/home/strasdat/dev/Sophus/examples/HelloSO3.cpp', line 31.
Message: 33

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352537920) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352537920) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352537920) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352537920, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff78287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000055555555793b in main () at /home/strasdat/dev/Sophus/examples/HelloSO3.cpp:31

@TLescoatTFX what is you concrete OS and build config?

@strasdat strasdat reopened this Jun 9, 2024
@strasdat
Copy link
Owner

The PR above includes an example of how to use a custom SOPHUS_ENABLE_ENSURE_HANDLER and hence throw an exception instead of calling std::abort:

option(SOPHUS_ENABLE_ENSURE_HANDLER "Enable the custem ensure handler." OFF)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants