Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤔 Discussion — SQLite: disabling foreign keys for migration/import? #2471

Closed
geelen opened this issue Aug 1, 2024 · 5 comments

Comments

@geelen
Copy link
Collaborator

geelen commented Aug 1, 2024

We have a couple of related issues:

1. D1 export produces unimportable files

D1 Export follows SQL dump, which outputs SQL in the following order for a DB with two tables, A and B:

  1. pragma defer_foreign_keys=TRUE
  2. Table definition A
  3. Inserts for row A
  4. Table definition B
  5. Inserts for row B

If table A has a column that references B, the INSERT on step 3 will fail with Parse error: no such table: main.B. If we reorder 3 and 4, then defer_foreign_keys does what you'd expect and allows the rows in A to be added with no corresponding rows in B to reference (without defer_foreign_keys, step 3 would fail with FOREIGN KEY constraint failed).

This appears to be an unavoidable difference between SQLite's default behaviour of setting PRAGMA foreign_keys=OFF; at the beginning of DB exports, and D1/workerd using pragma defer_foreign_keys=TRUE. From the SQLite docs, these are "DML errors" and are separate to the section on immediate/deferred constraints:

If the database schema contains foreign key errors that require looking at more than one table definition to identify, then those errors are not detected when the tables are created. Instead, such errors prevent the application from preparing SQL statements that modify the content of the child or parent tables in ways that use the foreign keys. Errors reported when content is changed are "DML errors" and errors reported when the schema is changed are "DDL errors". So, in other words, misconfigured foreign key constraints that require looking at both the child and parent are DML errors. The English language error message for foreign key DML errors is usually "foreign key mismatch" but can also be "no such table" if the parent table does not exist. Foreign key DML errors are reported if:

• The parent table does not exist, or
• The parent key columns named in the foreign key constraint do not exist, or
• The parent key columns named in the foreign key constraint are not the primary key of the parent table and are not subject to a unique constraint using collating sequence specified in the CREATE TABLE, or
• The child table references the primary key of the parent without specifying the primary key columns and the number of primary key columns in the parent do not match the number of child key columns.

The easy solution would be to reorder the output of d1 export so that all tables are generated first, then all INSERT statements. That should make all foreign key violations deferrable, as there's no way for a dump to contain a DML error. But that doesn't solve the related issues.

2. Dropping tables in migrations cannot prevent ON DELETE CASCADE (workerd-issue prisma-issue)

Some table altering tasks in SQLite cannot be achieved without replacing it with a new table (by dropping - recreating the table) (like adding foreign keys, changing primary keys, updating column type). Dropping a table which has column referenced by other tables with ON DELETE CASCADE will delete all records of the child tables.

This is also an issue with pragma defer_foreign_keys=TRUE not being a substitute for PRAGMA foreign_keys=OFF;. Tools like prisma, aware of SQLite's limitations, generate migrations like the following to migrate between table schemas (from this reproduction, thank you @hrueger!):

-- Initial schema
CREATE TABLE "User" (
    "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    "email" TEXT NOT NULL
);
CREATE TABLE "Post" (
    "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    "title" TEXT NOT NULL,
    "authorId" INTEGER,
    CONSTRAINT "Post_authorId_fkey" FOREIGN KEY ("authorId") REFERENCES "User" ("id") ON DELETE SET NULL ON UPDATE CASCADE
);
-- Generated migration for adding 'name' to 'User' table:
PRAGMA defer_foreign_keys=ON;
PRAGMA foreign_keys=OFF;
CREATE TABLE "new_User" (
    "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
    "email" TEXT NOT NULL,
    "name" TEXT NOT NULL DEFAULT 'Anonymous'
);
INSERT INTO "new_User" ("email", "id") SELECT "email", "id" FROM "User";
DROP TABLE "User";
ALTER TABLE "new_User" RENAME TO "User";
CREATE UNIQUE INDEX "User_email_key" ON "User"("email");
PRAGMA foreign_keys=ON;
PRAGMA defer_foreign_keys=OFF;

It's not clear to me how much of the above is D1-specific (the absence of a BEGIN TRANSACTION is surely specific to us, for example), but since D1 always runs queries within transactions, the PRAGMA foreign_keys=OFF; statement has no effect (which is why we use defer_foreign_keys in the first place). The result is that D1 runs all ON DELETE clauses when you drop the old table, resulting in lost data.

3. No support for D1 export for databases containing FTS5 full-text search tables (workers-sdk issue)

We only support one kind of virtual table, FTS5, for full text search. But, when porting SQLite's .dump command we found that it would generate SQL requiring writable_schema, a pragma we don't support.

This input SQL:

CREATE TABLE documents (id INTEGER PRIMARY KEY, title TEXT NOT NULL, content TEXT NOT NULL);
CREATE VIRTUAL TABLE documents_fts USING FtS5(id, title, content, tokenize = porter);

Generates the following SQL using SQLite's native .dump command:

PRAGMA writable_schema=ON;
INSERT INTO sqlite_schema(type,name,tbl_name,rootpage,sql)VALUES('table','documents_fts','documents_fts',0,'CREATE VIRTUAL TABLE documents_fts USING FtS5(id, title, content, tokenize = porter)');
CREATE TABLE IF NOT EXISTS 'documents_fts_data'(id INTEGER PRIMARY KEY, block BLOB);
INSERT INTO documents_fts_data VALUES(1,X'');
INSERT INTO documents_fts_data VALUES(10,X'00000000000000');
CREATE TABLE IF NOT EXISTS 'documents_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS 'documents_fts_content'(id INTEGER PRIMARY KEY, c0, c1, c2);
CREATE TABLE IF NOT EXISTS 'documents_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
CREATE TABLE IF NOT EXISTS 'documents_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID;
INSERT INTO documents_fts_config VALUES('version',4);
PRAGMA writable_schema=OFF;

We chose to disallow D1 exports containing virtual tables until we could find a resolution, as we didn't want to allow writable_schema (it requires SQLITE_DBCONFIG_DEFENSIVE to be disabled). Our best option was to skip the create table statements for the generated tables (documents_fts_data, documents_fts_idx, documents_fts_content documents_fts_docsize and documents_fts_config) and instead of the INSERT INTO sqlite_schema call, instead recreate the CREATE VIRTUAL TABLE that created it. The result won't necessarily be identical, but should be functionally equivalent for normal use. But maybe we should explore alternatives.

Proposal: SQL unsafe mode for D1 migrations

The lack of PRAGMA foreign_keys=OFF stems from the fact that all workerd queries execute within a transaction, which is implicitly created whenever a SQL query includes a write statement. If we had an API that let us execute a pragma without this logic, we could support importing from unmodified SQL dumps.

I propose the following api, sql.unsafeExec, which instead of implicitly creating a transaction, requires the user to submit SQL with multiple statements:

sql.execUnsafe(`
  PRAGMA ...;
  BEGIN;
  ...
  COMMIT;
`)

unsafeExec enforces these constraints:

  • Within an .execUnsafe block, a different authorizer is invoked, allowing BEGIN/COMMIT/ROLLBACK, as well as SAVEPOINT/RELEASE
  • Any write statement outside of a transaction result in an error
  • Multiple sequential transactions in a single call are permitted

This doesn't give us writable_schema without disabling SQLITE_DBCONFIG_DEFENSIVE (which I don't know what the implications of that would be, so I'm not suggesting it here), but we could wire up D1 to call execUnsafe on migrations/import and allow tools like Prisma to manage the schema without needing so much special treatment for D1.

This solves problem 1 and 2, and gives us a potential way to solve problem 3 in the future. Thoughts?

@kentonv
Copy link
Member

kentonv commented Aug 1, 2024

No, I don't want to give applications the ability to execute transaction/savepoint statements directly, as this will make it difficult to impossible for the runtime code to keep track of the current state of the transaction stack, which it needs to do in order to implement implicit transactions and such.

Instead, perhaps we should provide a more explicit API for setting the foreign_keys pragma, so that it can reliably be set before the transaction begins.

@geelen
Copy link
Collaborator Author

geelen commented Aug 2, 2024

Right, well I missed something here. The lack of PRAGMA foreign_keys = OFF; support is not actually a workerd issue, it's specific to D1's usage of workerd. For example, the following does work:

state.blockConcurrencyWhile(async () => {
  sql.exec(`
    PRAGMA foreign_keys = OFF;
    CREATE TABLE IF NOT EXISTS "A" (
      "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
      "bId" INTEGER NOT NULL REFERENCES "B" ("id") ON DELETE RESTRICT ON UPDATE CASCADE
    );
    INSERT INTO A VALUES(1,1);
    CREATE TABLE IF NOT EXISTS "B" (
      "id" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT
    );
    INSERT INTO B VALUES(1);
  `)
})
state.blockConcurrencyWhile(async () => {
  sql.exec(`PRAGMA foreign_keys = ON;`)
})

I mistakenly thought that notifyWrite was invoked for all pragma statements, but we only use it for SAVEPOINT (invoked by transactionSync, which D1 uses for all queries).

Gut feel says the simplest thing would be if D1 detects a migration/import starting with PRAGMA foreign_keys = OFF; it does the required blockConcurrencyWhile incantations to run it without foreign key support, then turn foreign keys back on. There'll be no way to detect that pragma within a file, but I think that's ok for now.

@geelen
Copy link
Collaborator Author

geelen commented Aug 2, 2024

@kentonv seeing some strange behaviour here. In my test, the following all pass:

state.blockConcurrencyWhile(async () => {
  sql.exec(`PRAGMA foreign_keys = OFF;`)
  // the SQL that needs foreign_keys=OFF to work
  sql.exec(`CREATE A ...; INSERT INTO A ...; CREATE B ...;`)
})
state.blockConcurrencyWhile(async () => {
  sql.exec(`PRAGMA foreign_keys = OFF;`)
})
state.blockConcurrencyWhile(async () => {
  sql.exec(`CREATE A ...; INSERT INTO A ...; CREATE B ...;`)
})
state.blockConcurrencyWhile(async () => {
  sql.exec(`PRAGMA foreign_keys = OFF;`)
})
await scheduler.wait(1)
sql.exec(`CREATE A ...; INSERT INTO A ...; CREATE B ...;`)

But these fail with Parse error: no such table: main.B:

state.blockConcurrencyWhile(async () => {
  sql.exec(`PRAGMA foreign_keys = OFF;`)
})
sql.exec(`CREATE A ...; INSERT INTO A ...; CREATE B ...;`)
state.blockConcurrencyWhile(async () => {
  sql.exec(`PRAGMA foreign_keys = OFF;`)
})
storage.transactionSync(() => {
  sql.exec(`CREATE A ...; INSERT INTO A ...; CREATE B ...;`)
})

The lattermost was how I was going to implement this in D1. Is it obvious to you why that's not working?

@kentonv
Copy link
Member

kentonv commented Aug 2, 2024

blockConcurrencyWhile returns a promise. You have to await it. The inner callback is not called right away.

But if the inner callback doesn't do anything async anyway, then blockConcurrencyWhile() doesn't really accomplish anything.

@geelen
Copy link
Collaborator Author

geelen commented Aug 5, 2024

Oh of course, I'm so used to everything being synchronous. I've raised #2479 because I think that's the only workerd change we need to go and build this all in D1.

@geelen geelen closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants