Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape meta data not cleaned up properly on delete #1550

Closed
kevin-dp opened this issue Aug 21, 2024 · 1 comment · Fixed by #1562
Closed

Shape meta data not cleaned up properly on delete #1550

kevin-dp opened this issue Aug 21, 2024 · 1 comment · Fixed by #1562
Assignees
Labels

Comments

@kevin-dp
Copy link
Contributor

kevin-dp commented Aug 21, 2024

When we delete a shape we are forgetting to clean up some meta data somewhere because the old schema information re-appears if we later create a new shape for the same table.

To reproduce this problem, first create a table:

CREATE TABLE foo (c1 INT PRIMARY KEY); INSERT INTO foo VALUES (1), (2);

Now, fetch the shape (i'm using HTTPie, can be done with curl too):

$ HTTP GET "localhost:3000/v1/shape/foo" offset==-1
HTTP/1.1 200 OK
access-control-allow-methods: GET, POST, OPTIONS
access-control-allow-origin: *
access-control-expose-headers: *
cache-control: max-age=1, stale-while-revalidate=3
content-type: application/json; charset=utf-8
date: Wed, 21 Aug 2024 12:16:36 GMT
etag: 3833821-1724242596930:-1:0_0
server: ElectricSQL/0.3.3
transfer-encoding: chunked
x-electric-chunk-last-offset: 0_0
x-electric-schema: {"c1":{"type":"int4","not_null":true,"pk_index":0}}
x-electric-shape-id: 3833821-1724242596930
x-request-id: F-29gac4MT1aYc4AAAAk

[
    {
        "headers": {
            "operation": "insert"
        },
        "key": "\"public\".\"foo\"/\"1\"",
        "offset": "0_0",
        "value": {
            "c1": "1"
        }
    },
    {
        "headers": {
            "operation": "insert"
        },
        "key": "\"public\".\"foo\"/\"2\"",
        "offset": "0_0",
        "value": {
            "c1": "2"
        }
    },
    {
        "headers": {
            "control": "up-to-date"
        }
    }
]

Ok, we get the shape with the 2 rows. This is fine.
Now drop the table:

DROP TABLE foo;

Now, tell Electric to delete the shape:

$ HTTP DELETE "localhost:3000/v1/shape/foo"
HTTP/1.1 202 Accepted
cache-control: max-age=0, private, must-revalidate
content-length: 0
content-type: application/json; charset=utf-8
date: Wed, 21 Aug 2024 12:17:49 GMT
server: ElectricSQL/0.3.3
vary: accept-encoding
x-request-id: F-29kphNgnGWR_EAAAAj

Ok, the shape is deleted.
Let's recreate the table but with an extra column:

CREATE TABLE foo (c1 INT PRIMARY KEY, c2 TEXT);
INSERT INTO foo VALUES (9, 'bar');

Let's fetch that table:

$ HTTP GET "localhost:3000/v1/shape/foo" offset==-1
HTTP/1.1 200 OK
access-control-allow-methods: GET, POST, OPTIONS
access-control-allow-origin: *
access-control-expose-headers: *
cache-control: max-age=1, stale-while-revalidate=3
content-type: application/json; charset=utf-8
date: Wed, 21 Aug 2024 12:18:11 GMT
etag: 3833821-1724242692287:-1:0_0
server: ElectricSQL/0.3.3
transfer-encoding: chunked
x-electric-chunk-last-offset: 0_0
x-electric-schema: {"c1":{"type":"int4","not_null":true,"pk_index":0}}
x-electric-shape-id: 3833821-1724242692287
x-request-id: F-29l9xY4smjtZsAAABj

[
    {
        "headers": {
            "operation": "insert"
        },
        "key": "\"public\".\"foo\"/\"9\"",
        "offset": "0_0",
        "value": {
            "c1": "9"
        }
    },
    {
        "headers": {
            "control": "up-to-date"
        }
    }
]

Now, the returned data is wrong as the rows only include the c1 column and not the c2 column (also the schema only includes c1).

@alco alco added the bug label Aug 21, 2024
@kevin-dp kevin-dp self-assigned this Aug 21, 2024
@kevin-dp
Copy link
Contributor Author

kevin-dp commented Aug 21, 2024

This seems to be the reason for the bug:

  • when an HTTP request arrives, we create the shape
  • Shape.new loads the column info via the inspector: Inspector.load_column_info
  • The first time, the inspector loads this information from PG and stores it in ETS
  • All subsequent load_column_info calls fetch this information from ETS

Thus, we must ensure that the ETS entry is removed when a table is altered.

kevin-dp added a commit that referenced this issue Sep 2, 2024
Fixes #1550.
This PR extends the shape log collector to also clean the cached column
information from ETS when a relation changes. This ensures that when a
table is migrated, we don't re-use the old table information from ETS
but instead load it from Postgres and re-populate the cache with the new
column info.

**There's one corner case that this PR does not address:**
- Create table, insert data
- Sync a shape containing that table
- Drop the table
- Delete the shape
- Recreate the table but with a different schema
- Insert some data into the new table
- Sync a shape containing the newly recreated table

In the above scenario, when syncing the newly recreated table, we get
the new data but in the old schema (so we only get the columns that also
existed in the old schema). This is because Postgres logical replication
stream does not inform us when a table is dropped.

As a result, we can't detect that a table was dropped and thus don't
know that we need to clean the cached column information. It's only when
we get a Relation message that we know this. But Postgres only sends a
Relation message the first time the data in the table changes and
**we're subscribed to that table in the replication stream** (and that's
only after syncing the shape).

So, the data that was inserted before we synced the table in the last
step, does not lead to a Relation message. Only if we insert data into
the table after that sync step, will Postgres send a Relation message
that will make us clean the cached column information. But note that at
that point the row that was inserted in the previous step is already
stored in storage in the format of the old schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants