Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite connections are not closed when calling .close() in 3.11 #103837

Closed
jhoekx opened this issue Apr 25, 2023 · 25 comments
Closed

SQLite connections are not closed when calling .close() in 3.11 #103837

jhoekx opened this issue Apr 25, 2023 · 25 comments
Assignees
Labels
topic-sqlite3 type-bug An unexpected behavior, bug, or error

Comments

@jhoekx
Copy link

jhoekx commented Apr 25, 2023

Bug report

In our tests we are using in-memory SQLite databases with a shared cache. This makes it possible to create multiple connections with the same connection string and see the same data, as recommended in In-Memory Databases. After updating from Python 3.10 to 3.11 our tests started failing.

The minimal reproduction case:

import sqlite3
import pytest

def write_value():
    with sqlite3.connect("file:test?mode=memory&cache=shared", uri=True) as db:
        db.execute("insert into Test (value) values ('hello')")


@pytest.fixture
def db():
    conn_1 = sqlite3.connect("file:test?mode=memory&cache=shared", uri=True)
    conn_1.execute("create table Test (value text, unique(value))")
    yield write_value
    conn_1.close()


def test_hello_1(db):
    db()


def test_hello_2(db):
    db()

This works in 3.10 (and worked in 3.8 and 3.9), but fails in 3.11:

ERROR sqlite-connection-close.py::test_hello_2 - sqlite3.OperationalError: table Test already exists

The SQLite documentations includes:

The database is automatically deleted and memory is reclaimed when the last connection to the database closes.

We used a workaround mentioned in #97641 for now (calling gc.collect()).
I opened a new issue, since that one looks similar, but is Windows specific, while this one happens on Linux as well.

Your environment

  • CPython versions tested on: 3.10.10, 3.11.3
  • Operating system and architecture: both on Ubuntu 22.04 (from deadsnakes PPA) and Arch (AUR)
  • SQLite versions: 3.37.2 and 3.41.2
@jhoekx jhoekx added the type-bug An unexpected behavior, bug, or error label Apr 25, 2023
@erlend-aasland erlend-aasland self-assigned this Apr 25, 2023
@erlend-aasland
Copy link
Contributor

erlend-aasland commented May 8, 2023

I cannot reproduce this on 3.11.3 or 3.12.0a7 (macOS).

@erlend-aasland
Copy link
Contributor

I cannot reproduce with 3.11 on Ubuntu 22.04 either. Unless you have a more deterministic reproducer, I'm not sure what to do with this :(

@WouterWouters
Copy link

For me it fails on 3.11.3 (macOS). Running python -m pytest test.py results in:

test.py .E                                                                                                                                                       [100%]

================================================================================ ERRORS ================================================================================
____________________________________________________________________ ERROR at setup of test_hello_2 ____________________________________________________________________

    @pytest.fixture
    def db():
        conn_1 = sqlite3.connect("file:test?mode=memory&cache=shared", uri=True)
>       conn_1.execute("create table Test (value text, unique(value))")
E       sqlite3.OperationalError: table Test already exists

test.py:12: OperationalError
======================================================================= short test summary info ========================================================================
ERROR test.py::test_hello_2 - sqlite3.OperationalError: table Test already exists
====================================================================== 1 passed, 1 error in 0.06s ======================================================================

@erlend-aasland

This comment was marked as outdated.

@erlend-aasland
Copy link
Contributor

What's the output of this script?

import sqlite3
cx = sqlite3.connect(":memory:")
for row in cx.execute("pragma compile_options"):
    print(row[0])

@erlend-aasland

This comment was marked as outdated.

@erlend-aasland
Copy link
Contributor

All right, I can reproduce this now. Looking into it.

@erlend-aasland
Copy link
Contributor

Yeah, I can confirm this is caused the cyclic reference between the statement cache and the connection. I'm not sure what's the best solution for this ATM.

@erlend-aasland
Copy link
Contributor

@vstinner, do you have tips for how we can handle this better? Some background:

In f461a7f, I tore out the _sqlite3 LRU cache and replaced it with functools.lru_cache. It seemed like a good idea at the time: less code to maintain, don't repeat yourself (don't repeat functools), and remove the old fishy refcount manipulation that was used to avoid the cyclic ref problems. With functools.lru_cache, that ref count hack could be removed. As Python 3.11 gets wider adoption, we now get reports (see also gh-97641) where third party test suites start failing because the sqlite3 connection objects are not collected, because of the ref cycle not being explicitly broken. Eryk Sun suggested wrapping functools.lru_cache in a shim and handling the cycle there, kinda like the old code; it may work. Previously, I was not convinced this was a real bug; Python delivers no guarantees on when objects are collected, so explicit resource management is required where deterministic output is expected (the case in test suites).

@jhoekx
Copy link
Author

jhoekx commented May 8, 2023

Python delivers no guarantees on when objects are collected, so explicit resource management is required where deterministic output is expected (the case in test suites).

From the Python user perspective: we are calling .close() to achieve exactly that (the title given to this issue).

@erlend-aasland
Copy link
Contributor

.close does not guarantee that an object is collected or deallocated, and it never has. Perhaps you meant del con?

@jhoekx
Copy link
Author

jhoekx commented May 8, 2023

Sorry, I was imprecise when quoting. As a user, indeed, I do not care when an object is collected. I do care that the resource that object holds is released. To me, this issue feels as if calling .close() on an open file would not close the underlying file descriptor. (Been there with the JVM not releasing mmaps until you force the GC to run)

Are there any other workarounds besides calling gc.collect() - which slows down the test suite a lot? I had tried del, but I do not understand it enough to know how that could influence things. I also tried setting cached_statements=0 for every connection I create, but that also doesn't solve the issue. Our app uses the SQLite as an application file format approach, so keeping connections open to every database all the time would also be quite wasteful.

@vstinner
Copy link
Member

vstinner commented May 9, 2023

In general, it's a bad idea to rely on the GC to release resources. This assumption is wrong on PyPy which has a different GC implementation.

I strongly prefer explicit resource management. Usually, a close() method should be called. And a ResourceWarning is emitted if a resource is not released explicitly.

In the first example, it's unclear to me when and how write_value() connection is supposed to be closed. For me, a ResourceWarning should be emitted, and the connection should be closed explicitly.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented May 10, 2023

In the sqlite3 module, the underlying SQLite C API we use is sqlite3_close Ah, no we're using sqlite3_close_v2, not sqlite3_close. Perhaps using the latter will help improve things. Quoting the SQLite docs:

If the database connection is associated with unfinalized prepared statements, BLOB handlers, and/or unfinished sqlite3_backup objects then sqlite3_close() will leave the database connection open and return SQLITE_BUSY.

We're asserting that the sqlite3_close sqlite3_close_v2 API return SQLITE_OK1. This means that as far as we can tell (from the information given to us by the SQLite library), the connection was closed. It should not matter if the sqlite3.Connection object lives longer, as it now only wraps a closed sqlite3 * database pointer.

EDIT: I misremembered; we're using sqlite3_close_v2, not sqlite3_close. I'll switch to the former and see what happens.

Footnotes

  1. come think of it, we should probably add code for emitting a ResourceWarning instead of just asserting the assumed return value

@erlend-aasland
Copy link
Contributor

Quick update after #103837 (comment):

Changing the underlying SQLite C API from sqlite3_close_v2 (which might not deallocate the connection immediately) to sqlite3_close (which require explicit resource management wrt. the SQLite data structures) did not help; SQLite reports SQLITE_OK, indicating that the connection was closed and deallocated. Unfortunately, the error still occurs.

@jhoekx
Copy link
Author

jhoekx commented May 10, 2023

Ok. I think you can close this issue.

The documentation clearly spells out that what we're doing is based on incorrect assumptions.

I assumed that the context manager closed the connection, but it only manages a transaction. As @vstinner points out, we are not calling close() on the connection. Then I can agree that this is not a bug.

If I change the sample code to this, then the test passes:

import sqlite3
import pytest

def write_value():
    db = sqlite3.connect("file:test?mode=memory&cache=shared", uri=True)
    with db:
        db.execute("insert into Test (value) values ('hello')")
    db.close()


@pytest.fixture
def db():
    conn_1 = sqlite3.connect("file:test?mode=memory&cache=shared", uri=True)
    conn_1.execute("create table Test (value text, unique(value))")
    yield write_value
    conn_1.close()


def test_hello_1(db):
    db()


def test_hello_2(db):
    db()

Thanks and sorry for the 'panic'.

@erlend-aasland
Copy link
Contributor

I did not spot the missing close! That explains the problem, indeed 😃 I still think we should consider changing the underlying API so we can emit a resource warning if close failed.

@erlend-aasland erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2023
@erlend-aasland
Copy link
Contributor

Also, check out contextlib.closing, if you did not know about that one already :)

@jhoekx
Copy link
Author

jhoekx commented May 10, 2023

In our application we do have a context manager that closes the connection in a finally. It also sets the PRAGMAs we need before or after use and starts a transaction immediately on writes so we don't get hit with an SQLITE_BUSY when the transaction turns from reading to writing. That is why I made the mistake in the 'reproduction' case - we never need to close() it manually.

But obviously there is a place where we're not using that context manager in the proper way. We did have some errors in the past we could not explain on application startup where a database was busy during database migrations. That's also the place were we use that shared memory trick with in-memory databases for during tests. More work needed on our side.

Thanks again.

@vstinner
Copy link
Member

changing the underlying API so we can emit a resource warning if close failed.

In the Python developer mode, FileIO.close() logs an unraisable exception on error. It's only in developer mode to avoid any backward compatibilty issue, but I always want to always log this unraisable exception.

ResourceWarning is more if the user didn't call close() explicitly (which is a different case).

@vstinner
Copy link
Member

IMO sqlite3 should be modified to emit a ResourceWarning.

Currently, no ResourceWarning is emitted when a connection is not closed explicitly:

$ ./python -Wdefault
Python 3.12.0a7+ (heads/main:fe694a6db6, May 10 2023, 00:29:39) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> con=sqlite3.connect("file:test?mode=memory&cache=shared", uri=True)
>>> con=None
>>> import gc; gc.collect()
32

@erlend-aasland
Copy link
Contributor

IMO sqlite3 should be modified to emit a ResourceWarning.

Currently, no ResourceWarning is emitted when a connection is not closed explicitly:

Yes, I support this, as stated in #103837 (comment)

@vstinner
Copy link
Member

Oh ok. I was just confused by "if close failed" in your statement.

@erlend-aasland
Copy link
Contributor

Oh ok. I was just confused by "if close failed" in your statement.

Yeah, I got too side tracked to see the real issue.

@erlend-aasland
Copy link
Contributor

IMO sqlite3 should be modified to emit a ResourceWarning.

Currently, no ResourceWarning is emitted when a connection is not closed explicitly:

See gh-108015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-sqlite3 type-bug An unexpected behavior, bug, or error
Projects
Status: Discarded
Development

No branches or pull requests

4 participants