Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access violation in stringlib_default_find #106615

Closed
xbeastx opened this issue Jul 11, 2023 · 3 comments
Closed

Access violation in stringlib_default_find #106615

xbeastx opened this issue Jul 11, 2023 · 3 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@xbeastx
Copy link

xbeastx commented Jul 11, 2023

Crash report

I was able to simplify code to minimal one but it still needs input file to map: example.zip (needs to be unpacked before using)

import mmap
import sys
from contextlib import closing

print("start")
with open(sys.argv[1], "rb") as f:
    with closing(mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)) as buf:
        print(buf.find(b"@~"))
print("end")

C:\python311x64\python.exe crash.py example.bin

So crash will happen on buf.find call.
If don't use mmap (just read to buf) — no crash.
If change file or pattern to find — no crash.

Error messages

Access violation - code c0000005 (first/second chance not available)

CONTEXT:  (.ecxr)
rax=0000000000000000 rbx=000001c4d1b40000 rcx=0000000000000019
rdx=000000000007fffe rsi=0000000000000001 rdi=0000000000000000
rip=00007ffb2afb6218 rsp=000000a68c9eef88 rbp=000000000007fffe
 r8=0000000000000000  r9=0000000040000001 r10=0000000000000001
r11=0000000000000000 r12=000001c4d1b40001 r13=000001c4d219fbc0
r14=0000000000000002 r15=000000000000007e
iopl=0         nv up ei ng nz ac po cy
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010297
python311!PyList_Reverse+0x1a44:
00007ffb`2afb6218 420fbe442201    movsx   eax,byte ptr [rdx+r12+1] ds:000001c4`d1bc0000=f0
Resetting default scope

FAULTING_IP: 
python311!PyList_Reverse+1a44
00007ffb`2afb6218 420fbe442201    movsx   eax,byte ptr [rdx+r12+1]

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffb2afb6218 (python311!PyList_Reverse+0x0000000000001a44)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 000001c4d1bc0000
Attempt to read from address 000001c4d1bc0000

DEFAULT_BUCKET_ID:  INVALID_POINTER_READ

PROCESS_NAME:  python.exe

FOLLOWUP_IP: 
python311!PyList_Reverse+1a44
00007ffb`2afb6218 420fbe442201    movsx   eax,byte ptr [rdx+r12+1]

STACK_TEXT:  
python311!PyList_Reverse+0x1a44
python311!PyList_Reverse+0x18fa
python311!PyList_Reverse+0x169c
python311!Py_BytesMain+0xa04
python311!PyObject_Vectorcall+0x5fc
python311!PyEval_EvalFrameDefault+0x784
python311!PyMapping_Check+0x1eb
python311!PyEval_EvalCode+0x97
python311!Py_SourceAsString+0x646
python311!Py_SourceAsString+0x5c2
python311!PyThread_tss_is_created+0x4ea14
python311!PyRun_SimpleFileObject+0x11d
python311!PyRun_AnyFileObject+0x54
python311!PyDict_DelItemString+0x6ff
python311!PyDict_DelItemString+0x5bb
python311!Py_RunMain+0x368
python311!Py_RunMain+0x15
python311!Py_Main+0x25
python+0x1230
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

Crash dump can provide on demand.

Your environment

For me it reproduces only on Windows with CPython 3.11.0 - 3.11.4 only. For 3.10.x — no crash

@xbeastx xbeastx added the type-crash A hard crash of the interpreter, possibly with a core dump label Jul 11, 2023
@eryksun
Copy link
Contributor

eryksun commented Jul 11, 2023

The stack trace is displayed without private symbols, so the debugger lists addresses relative to exported symbol names such PyList_Reverse. But an offset of 0x1a44 is a long way off from PyList_Reverse. This is actually a bug in stringlib, as shown by debugging a current build of the main branch in source mode:

>  600:             if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
python313_d!stringlib_default_find+0x2c8:
00007ff9`7eeca638 0fbe4001        movsx   eax,byte ptr [rax+1] ds:00000176`db8f0000=??
0:000> kc 5
Call Site
python313_d!stringlib_default_find
python313_d!fastsearch
python313_d!_PyBytes_Find
python313_d!mmap_gfind
python313_d!mmap_find_method

Here's the source link for stringlib_default_find():

static inline Py_ssize_t
STRINGLIB(default_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
const STRINGLIB_CHAR* p, Py_ssize_t m,
Py_ssize_t maxcount, int mode)
{
const Py_ssize_t w = n - m;
Py_ssize_t mlast = m - 1, count = 0;
Py_ssize_t gap = mlast;
const STRINGLIB_CHAR last = p[mlast];
const STRINGLIB_CHAR *const ss = &s[mlast];
unsigned long mask = 0;
for (Py_ssize_t i = 0; i < mlast; i++) {
STRINGLIB_BLOOM_ADD(mask, p[i]);
if (p[i] == last) {
gap = mlast - i - 1;
}
}
STRINGLIB_BLOOM_ADD(mask, last);
for (Py_ssize_t i = 0; i <= w; i++) {
if (ss[i] == last) {
/* candidate match */
Py_ssize_t j;
for (j = 0; j < mlast; j++) {
if (s[i+j] != p[j]) {
break;
}
}
if (j == mlast) {
/* got a match! */
if (mode != FAST_COUNT) {
return i;
}
count++;
if (count == maxcount) {
return maxcount;
}
i = i + mlast;
continue;
}
/* miss: check if next character is part of pattern */
if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
i = i + m;
}
else {
i = i + gap;
}
}
else {
/* skip: check if next character is part of pattern */
if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
i = i + m;
}
}
}
return mode == FAST_COUNT ? count : -1;
}

In the source, ss is initialized to &s[mlast], which in this case is s + 1. The value of w is set to n - m, i.e. the size of the string less the size of the searched substring, which in this case is 524286 (from 524288 - 2). The second loop iterates i up to the value of w inclusively (i.e. the loop test is i <= w). Thus in the last pass, i is 524286. The code on line 600 that triggers the access violation peeks ahead to check ss[i + 1], i.e. ss[524287]. Recall that ss is initialized to s + 1, so this code tries to read ss[524288], which is out of bounds for the input buffer. Given that the size of the mapped view in this case is an integral number of pages, even one byte beyond it is unlikely to be an allocated region of memory. That's the case here, so the out-of-bounds read raises an access violation.

@eryksun eryksun added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Jul 11, 2023
@eryksun eryksun changed the title Access violation on PyList_Reverse Access violation in stringlib_default_find Jul 11, 2023
@sweeneyde
Copy link
Member

I believe this is duplicate of #105235

@eryksun
Copy link
Contributor

eryksun commented Jul 11, 2023

Yes, this is a duplicate of #105235.

@eryksun eryksun closed this as not planned Won't fix, can't repro, duplicate, stale Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

4 participants
@eryksun @xbeastx @sweeneyde and others