Skip to content
This repository has been archived by the owner on Oct 15, 2024. It is now read-only.

Crash of Python binding #25

Closed
markus2330 opened this issue Jul 21, 2014 · 9 comments
Closed

Crash of Python binding #25

markus2330 opened this issue Jul 21, 2014 · 9 comments
Assignees

Comments

@markus2330
Copy link
Contributor

Pino Toscano wrote:

  • 'make test' after build passes almost fully; the only issue seems
    to
    be in the exception handling within the Python binding: in
    particular,

in test_key.py, Key::test_properties:

   with self.assertRaises(kdb.KeyInvalidName):
         k.name = "foo"

the above causes the crash of the Python (3.4) interpreter (!),
not even a graceful failure.

@manuelm
Copy link
Contributor

manuelm commented Jul 21, 2014

Works for me...

# python3 -V
Python 3.4.1

# swig -version
SWIG Version 2.0.11

[...]
     Start  6: test_key.py
6/36 Test  #6: test_key.py ........................   Passed    0.09 sec
[...]

@pinotree
Copy link
Contributor

The crash happens when trying elektra 0.8.6. I've not checked again with current Git/master.

@manuelm
Copy link
Contributor

manuelm commented Jul 22, 2014

Please retest with master. SWIG bindings have preview state in 0.8.6 so I don't care much.

Imho you shouldn't package them at all until the next release.

@pinotree
Copy link
Contributor

Please retest with master.

I can reproduce the crash as well, python3 binding built from current Git/master on a Debian/Jessie system (Python3 3.4.1, swig 2.0.12).

SWIG bindings have preview state in 0.8.6 so I don't care much.

Imho you shouldn't package them at all until the next release.

Not a problem preventing an upload at least to Debian experimental.

@manuelm
Copy link
Contributor

manuelm commented Jul 22, 2014

I'm still not able to reproduce this on Fedora. SWIG 2.0.12 works fine here. Don't have any Debian machine available.

Can you post the backtrace?

@markus2330
Copy link
Contributor Author

Starting program: /usr/bin/python3.4 -B /libelektra/src/bindings/swig/python/tests/test_key.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
....
Program received signal SIGSEGV, Segmentation fault.
SWIG_Python_SetErrorObj (errtype=<optimized out>, obj=0x0) at /build/src/bindings/swig/python/kdbPYTHON_wrap.cxx:1220
1220      Py_DECREF(obj);
(gdb) bt full
#0  SWIG_Python_SetErrorObj (errtype=<optimized out>, obj=0x0) at /build/src/bindings/swig/python/kdbPYTHON_wrap.cxx:1220
        _py_decref_tmp = 0x0
#1  0x00007ffff6763526 in _wrap_Key__setName (args=<optimized out>) at /build/src/bindings/swig/python/kdbPYTHON_wrap.cxx:5839
        arg2 = 0xad8eb0
        argp1 = 0xadea30
        res2 = 512
        swig_obj = {<Key(this=<SwigPyObject at remote 0x7ffff59a2ea0>) at remote 0x7ffff5933978>, 'foo'}
        resultobj = 0x0
        arg1 = <optimized out>
        res1 = <optimized out>
#2  0x000000000050d677 in PyObject_Call (kw=0x0, arg=(<Key(this=<SwigPyObject at remote 0x7ffff59a2ea0>) at remote 0x7ffff5933978>, 'foo'), 
    func=<built-in method Key__setName of module object at remote 0x7ffff6a90458>) at ../Objects/abstract.c:2067
        result = <optimized out>
        call = 0x4f3850 <PyCFunction_Call>
#3  PyObject_CallFunctionObjArgs () at ../Objects/abstract.c:2359
        vargs = {{gp_offset = 8, fp_offset = 48, overflow_arg_area = 0x7fffffffd2e0, reg_save_area = 0x7fffffffd210}}
#4  0x000000000050d46b in property_descr_set.lto_priv () at ../Objects/descrobject.c:1408
        gs = <optimized out>
        func = <optimized out>
        res = <optimized out>
#5  0x00000000004ffa54 in _PyObject_GenericSetAttrWithDict () at ../Objects/object.c:1135
        tp = <optimized out>
        f = 0x50d450 <property_descr_set.lto_priv>
        res = -1
#6  0x00000000004fe84c in PyObject_GenericSetAttr (value='foo', name='name', obj=<Key(this=<SwigPyObject at remote 0x7ffff59a2ea0>) at remote 0x7ffff5933978>)
    at ../Objects/object.c:1185
No locals.
#7  PyObject_SetAttr () at ../Objects/object.c:914
        tp = 0xb9d8a8
        err = <optimized out>
#8  0x00000000004e8def in PyEval_EvalFrameEx () at ../Python/ceval.c:2121
        name = <optimized out>
        owner = <Key(this=<SwigPyObject at remote 0x7ffff59a2ea0>) at remote 0x7ffff5933978>
        v = 'foo'
        err = <optimized out>
        stack_pointer = 0x7ffff59403c8
        next_instr = 0x7ffff7e495db "Wd"
        opcode = 95
        oparg = <optimized out>
        why = WHY_NOT
        fastlocals = 0x7ffff59403b0
        freevars = 0x7ffff59403c0
        retval = <optimized out>
        tstate = <optimized out>
        co = 0x7ffff7ee9390
        instr_ub = -1
        instr_lb = <optimized out>
        instr_prev = -1
        first_instr = <optimized out>
        names = ('assertEqual', 'key', 'name', 'value', 'basename', 'dirname', 'fullname', 'bkey', 'kdb', 'Key', 'KEY_VALUE', 'assertFalse', 'isBinary', 'assertIsNone', 'getMeta', 'assertTrue', 'assertIsInstance', 'assertRaises', 'KeyInvalidName')
        consts = (None, 'user/foo/bar', 'value', 'bar', 'user/foo', 'user:myowner/foo/bar', 'system/bkey', b'bvalue\x00\x00', 'bkey', 'system', 'user/key1', 'binary', 'system/key2', 'key3', 'system/key3', 'user/key2', 'foo')
        opcode_targets = {0x4eff2d <PyEval_EvalFrameEx+32557>, 0x4e8194 <PyEval_EvalFrameEx+404>, 0x4e9765 <PyEval_EvalFrameEx+5989>, 0x4e9d1e <PyEval_EvalFrameEx+7454>, 
          0x4e9944 <PyEval_EvalFrameEx+6468>, 0x4ebb5a <PyEval_EvalFrameEx+15194>, 0x4eff2d <PyEval_EvalFrameEx+32557>, 0x4eff2d <PyEval_EvalFrameEx+32557>,

(shortened) ...... frame 83

why = WHY_NOT

is quite amusing
Valgrind tells even more, here the last bits (seem most interesting):

==15123==    at 0x6A95579: SWIG_Python_SetErrorObj(_object*, _object*) (kdbPYTHON_wrap.cxx:1220)
==15123==    by 0x6AA6525: _wrap_Key__setName (kdbPYTHON_wrap.cxx:5839)
==15123==    by 0x50D676: PyObject_CallFunctionObjArgs (abstract.c:2067)
==15123==    by 0x50D46A: property_descr_set.lto_priv.337 (descrobject.c:1408)
==15123==    by 0x4FFA53: _PyObject_GenericSetAttrWithDict (object.c:1135)
==15123==    by 0x4FE84B: PyObject_SetAttr (object.c:1185)
==15123==    by 0x4E8DEE: PyEval_EvalFrameEx (ceval.c:2121)
==15123==    by 0x4EC0CE: PyEval_EvalFrameEx (ceval.c:4331)
==15123==    by 0x503A39: function_call.lto_priv.376 (ceval.c:3585)
==15123==    by 0x4F1DCD: PyObject_Call (abstract.c:2067)
==15123==    by 0x4EA1DF: PyEval_EvalFrameEx (ceval.c:4558)
==15123==    by 0x503A39: function_call.lto_priv.376 (ceval.c:3585)
==15123==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==15123== 
==15123== 
==15123== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==15123==  Access not within mapped region at address 0x0
==15123==    at 0x6A95579: SWIG_Python_SetErrorObj(_object*, _object*) (kdbPYTHON_wrap.cxx:1220)
==15123==    by 0x6AA6525: _wrap_Key__setName (kdbPYTHON_wrap.cxx:5839)
==15123==    by 0x50D676: PyObject_CallFunctionObjArgs (abstract.c:2067)
==15123==    by 0x50D46A: property_descr_set.lto_priv.337 (descrobject.c:1408)
==15123==    by 0x4FFA53: _PyObject_GenericSetAttrWithDict (object.c:1135)
==15123==    by 0x4FE84B: PyObject_SetAttr (object.c:1185)
==15123==    by 0x4E8DEE: PyEval_EvalFrameEx (ceval.c:2121)
==15123==    by 0x4EC0CE: PyEval_EvalFrameEx (ceval.c:4331)
==15123==    by 0x503A39: function_call.lto_priv.376 (ceval.c:3585)
==15123==    by 0x4F1DCD: PyObject_Call (abstract.c:2067)
==15123==    by 0x4EA1DF: PyEval_EvalFrameEx (ceval.c:4558)
==15123==    by 0x503A39: function_call.lto_priv.376 (ceval.c:3585)

@manuelm
Copy link
Contributor

manuelm commented Jul 26, 2014

I spend more than 24h to trace this allocation bug, but finally found the issue.

In short, the bug only happens if python is build with:

  • CFLAGS='-flto -ffat-lto-objects'
  • Modules/Setup.local contains "_pickle _pickle.c"
  • make build_all_generate_profile (or make profile-opt. both targets will build python with gcov support)

As soon as you remove one of these the bug won't get triggered anymore.

Tracing this at the source code level, the allocation failure happens in the swig generated file at the PyBaseObject_Type.tp_new((PyTypeObject*) data->newargs, Py_None, Py_None) call. This is used to create the shadowed instance of the exception.

Tracing this call in the python source will get you to the object_new function in Objects/typeobject.c. The failing condition is type->tp_new != object_new.

Since type is our data->newargs, couldn't we just call data->newargs->tp_new(data->newargs, ...) to fix this issue? Turns out we can: swig/swig@c063bb8

So the fix is to update swig or just patch the generated swig file.

P.S. No idea why this doesn't happen if python gets build without LTO or gcov or even pickle?!...

@pinotree is it ok to close this issue?

@pinotree
Copy link
Contributor

Thanks for the detailed analysis!

Turns out we can: swig/swig@c063bb8

It looks like this is available only in swig 3, so I'll try to switch to it in my next Debian releases.
Would it be possible for you, given you you traced the issue so well, to send a bug report to swig, to backport that fix to their 2.x serie?

@pinotree is it ok to close this issue?

I guess it is, thanks for your work.

@manuelm
Copy link
Contributor

manuelm commented Jul 26, 2014

afaik swig 2 is dead

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants