Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C API] Segmentation fault in PyUnicodeWriter when writing nothing and with initial length = 0 #121849

Closed
picnixz opened this issue Jul 16, 2024 · 3 comments
Labels
topic-C-API type-bug An unexpected behavior, bug, or error

Comments

@picnixz
Copy link
Contributor

picnixz commented Jul 16, 2024

Bug report

Bug description:

While working on accelerating fnmatch, I wanted to use the PyUnicodeWriter API instead of extracting two substrings and use concat(). I encountered a segmentation fault when attempting to write an empty range if the writer allocated actually 0 characters:

#include <Python.h>

int main(void) {
    Py_Initialize();
    PyUnicodeWriter *writer = PyUnicodeWriter_Create(0);
    PyObject *str = PyUnicode_FromString("abc");
    PyUnicodeWriter_WriteSubstring(writer, str, 3, 3); // SEGFAULT
    Py_DECREF(str);
    PyUnicodeWriter_Discard(writer);
    Py_Finalize();
    return 0;
}

Note that changing PyUnicodeWriter_Create(0) to PyUnicodeWriter_Create(1) make it work:

#include <Python.h>

int main(void) {
    Py_Initialize();
    PyUnicodeWriter *writer = PyUnicodeWriter_Create(1);
    PyObject *str = PyUnicode_FromString("abc");
    PyUnicodeWriter_WriteSubstring(writer, str, 3, 3); // OK!
    Py_DECREF(str);
    PyUnicodeWriter_Discard(writer);
    Py_Finalize();
    return 0;
}

@vstinner I can investigate or you can take the task if you want (best is to also add a test and confirm that I did not misused the API). By the way, keeping 0 as a length estimation but writing a non-empty string works:

#include <Python.h>

int main(void) {
    Py_Initialize();
    PyUnicodeWriter *writer = PyUnicodeWriter_Create(0);
    PyObject *str = PyUnicode_FromString("abc");
    PyUnicodeWriter_WriteSubstring(writer, str, 2, 3); // OK!
    Py_DECREF(str);
    PyUnicodeWriter_Discard(writer);
    Py_Finalize();
    return 0;
}

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

@picnixz picnixz added type-bug An unexpected behavior, bug, or error topic-C-API labels Jul 16, 2024
@ZeroIntensity
Copy link
Member

ZeroIntensity commented Jul 16, 2024

This is a bug. The problem is that _PyUnicodeWriter_Prepare does nothing if the length is zero, and simply leaves writer->buffer as is (which in this case, leaves it as NULL), but it still tries to copy the characters, resulting in trying to dereference a NULL pointer.

The "band-aid" fix would be to simply add 1 to the length if it's zero.

@vstinner
Copy link
Member

I wrote #121896 to fix the bug.

vstinner added a commit that referenced this issue Jul 17, 2024
@picnixz
Copy link
Contributor Author

picnixz commented Jul 17, 2024

Fixed by #121896

@picnixz picnixz closed this as completed Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants