Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault when setting chunk replica identity #7434

Conversation

fabriziomello
Copy link
Contributor

@fabriziomello fabriziomello commented Nov 11, 2024

The segfault happened executing AlterTableInternal for setting the REPLICA IDENTITY during chunk creation (introduced by #5512) that is dispatched by a trigger invoked by a DDL statement.

Fixed it by using our internal ts_alter_table_with_event_trigger handler function to setup the event trigger context.

Fixes #7406

@fabriziomello fabriziomello self-assigned this Nov 11, 2024
@fabriziomello fabriziomello force-pushed the fix_segfault_when_setting_replica_identity_for_chunk branch from 2951ac2 to eea4b71 Compare November 11, 2024 21:14
Copy link

codecov bot commented Nov 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.04%. Comparing base (59f50f2) to head (2ba3df8).
Report is 588 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7434      +/-   ##
==========================================
+ Coverage   80.06%   81.04%   +0.97%     
==========================================
  Files         190      229      +39     
  Lines       37181    42514    +5333     
  Branches     9450    10625    +1175     
==========================================
+ Hits        29770    34455    +4685     
- Misses       2997     3749     +752     
+ Partials     4414     4310     -104     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fabriziomello fabriziomello force-pushed the fix_segfault_when_setting_replica_identity_for_chunk branch 2 times, most recently from 5897b74 to 33b1722 Compare November 12, 2024 15:54
@fabriziomello fabriziomello enabled auto-merge (rebase) November 12, 2024 18:53
Copy link
Contributor

@mkindahl mkindahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit comment is very focused on the code, so you might want to re-write it to explain the context so your future self can understand what the problem was.

This problem can occur when you insert into a hypertable from inside a trigger, so starting there might be a good idea, that is:

Insert into hypertable calls AlterTableInternal [why?]... if called from inside a trigger, this will not have currentCommand set up [why and why does it work if not in a trigger?]... we need to wrap the call in ...Start and ...End.

A few suggestions for adding comments and one more call to AlterTableInternal that you might want to deal with (but fine to submit as separate PR if you prefer that).

Approving since the code looks fine and I see no issues with the fix.

src/chunk.c Outdated
Comment on lines 902 to 916
ts_catalog_database_info_become_owner(ts_catalog_database_info_get(), &sec_ctx);
EventTriggerAlterTableStart((Node *) &cmd);
AlterTableInternal(chunk->table_id, list_make1(&cmd), false);
EventTriggerAlterTableEnd();
ts_catalog_restore_user(&sec_ctx);
table_close(ht_rel, NoLock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be called in a nested fashion since the start and end commands remember the previous value of currentCommand. You might want to add a comment explaining why the start and end are necessary though, for your future self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we have a function ts_alter_table_with_event_trigger to wrap the event trigger context and execute the AlterTableInternal that have the enough comments about why it is necessary.

src/chunk.c Show resolved Hide resolved
src/chunk.c Outdated
@@ -900,7 +900,9 @@ chunk_set_replica_identity(const Chunk *chunk)
}

ts_catalog_database_info_become_owner(ts_catalog_database_info_get(), &sec_ctx);
EventTriggerAlterTableStart((Node *) &cmd);
AlterTableInternal(chunk->table_id, list_make1(&cmd), false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument to AlterTableInternal should be an OID and it indeed looks like table_id is taken from the ObjectAddress for the chunk. You might want to double-check, but it seems to make no difference for this particular case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the chunk->table_id is the OID of the chunk relation returned by the ts_chunk_create_table that internally returns the ObjectAddress->objectId returned by DefineRelation so we're fine here.

@fabriziomello
Copy link
Contributor Author

The commit comment is very focused on the code, so you might want to re-write it to explain the context so your future self can understand what the problem was.

Ok.

This problem can occur when you insert into a hypertable from inside a trigger, so starting there might be a good idea, that is:

Will recheck but it was not so simple like this to reproduce the case.

Insert into hypertable calls AlterTableInternal [why?]...

Because of this PR #5515. But looking now at the implementation looks like we didn't did a good job because we always call the AlterTableInternal, so I think we should improve it a bit:

➜ git diff                                                         
diff --git a/src/chunk.c b/src/chunk.c
index c2535d368..89105ba4a 100644
--- a/src/chunk.c
+++ b/src/chunk.c
@@ -875,6 +875,14 @@ static void
 chunk_set_replica_identity(const Chunk *chunk)
 {
    Relation ht_rel = relation_open(chunk->hypertable_relid, AccessShareLock);
+
+   /* Do nothing if hypertable is using the default replica identity */
+   if (ht_rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT)
+   {
+       table_close(ht_rel, NoLock);
+       return;
+   }
+
    ReplicaIdentityStmt stmt = {
        .type = T_ReplicaIdentityStmt,
        .identity_type = ht_rel->rd_rel->relreplident,

if called from inside a trigger, this will not have currentCommand set up [why and why does it work if not in a trigger?]... we need to wrap the call in ...Start and ...End.

Will recheck!

A few suggestions for adding comments and one more call to AlterTableInternal that you might want to deal with (but fine to submit as separate PR if you prefer that).

The other one need to investigate if it is possible to raise the same problem.

@fabriziomello fabriziomello force-pushed the fix_segfault_when_setting_replica_identity_for_chunk branch 3 times, most recently from 49d64fd to 138d415 Compare November 13, 2024 20:19
The segfault happened executing `AlterTableInternal` for setting the
REPLICA IDENTITY during chunk creation (introduced by timescale#5512) that is
dispatched by a trigger invoked by a DDL statement.

Fixed it by using our internal `ts_alter_table_with_event_trigger`
handler function to setup the event trigger context.

Fixes timescale#7406
@fabriziomello fabriziomello force-pushed the fix_segfault_when_setting_replica_identity_for_chunk branch from 138d415 to 2ba3df8 Compare November 13, 2024 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: PG Segfault when ETL upsert triggers insert to empty hypertable
4 participants