-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk unique insertion, uniqueness with subset of args #590
Conversation
// JobArgs is an interface that should be implemented by the arguments to a job. | ||
// This definition duplicates the JobArgs interface in the river package so that | ||
// it can be used in other packages without creating a circular dependency. | ||
type JobArgs interface { | ||
// Kind returns a unique string that identifies the type of job. It's used to | ||
// determine which worker should work the job. | ||
Kind() string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to use this from dbunique
so I copied the interface into here. But then I ended up adding an unexported jobArgs
within that package anyway—maybe that's better than exposing a duplicate in a user-facing package like this?
Edit: Actually, I did need this from other places as well. It has to be available as part of riverdriver.JobInsertFastParams
as well as internal/dbunique
.
Can keep it here in rivertype
, or move to rivershared
. Lmk what seems better.
github.com/tidwall/gjson v1.17.3 | ||
github.com/tidwall/sjson v1.2.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gjson
and sjson
were super handy for being able to efficiently extract a subset of keys from a JSON payload and to reconstruct them into a new sorted subset payload. We could figure out a way to do this without them, but it would likely be slower (if using encoding/json
) or a lot more verbose (if building up our own JSON bytes).
They're also super small dependencies with minimal indirects, so I feel like it's not a big deal to pull them in. Plus they're also used in riverpro.
// Advisory locks are currently only used for the fallback/slow path of | ||
// unique job insertion where finalized states are included in a ByState | ||
// configuration. | ||
AdvisoryLockPrefix int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we complete the deprecation and eventual removal of the advisory lock based unique jobs, would you want to just nuke this config altogether? It's possible we could want to use advisory locks again at some point but it'd be easy enough to re-add.
JobInsertFast(ctx context.Context, params *JobInsertFastParams) (*JobInsertFastResult, error) | ||
JobInsertFastMany(ctx context.Context, params []*JobInsertFastParams) ([]*JobInsertFastResult, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relates to the API changes from #584 so we may want to align on the right change to make here.
// Args contains the raw underlying job arguments struct. It has already been | ||
// encoded into EncodedArgs, but the original is kept here for to leverage its | ||
// struct tags and interfaces, such as for use in unique key generation. | ||
Args rivertype.JobArgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to discussion from #584, this was added to facilitate tag reflection on the JobArgs
struct, even though it's otherwise unused (EncodedArgs
are already encoded at this point).
@@ -5,6 +5,7 @@ go 1.21 | |||
toolchain go1.23.0 | |||
|
|||
require ( | |||
github.com/jackc/pgx/v5 v5.7.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess this is what I get for using pgtype.bits
for the bitmask column. I'd be shocked if pq properly supports that, thoughts on what to do?
ON CONFLICT (unique_key) | ||
WHERE unique_key IS NOT NULL | ||
AND unique_states IS NOT NULL | ||
AND river_job_state_in_bitmask(unique_states, state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there were a better way but AFAICT this is the only way to indicate the conflict given the specific partial index in use here. It can't be a named CONSTRAINT
because it's a partial index.
WHEN 'available' THEN get_bit(bitmask, 7) | ||
WHEN 'cancelled' THEN get_bit(bitmask, 6) | ||
WHEN 'completed' THEN get_bit(bitmask, 5) | ||
WHEN 'discarded' THEN get_bit(bitmask, 4) | ||
WHEN 'pending' THEN get_bit(bitmask, 3) | ||
WHEN 'retryable' THEN get_bit(bitmask, 2) | ||
WHEN 'running' THEN get_bit(bitmask, 1) | ||
WHEN 'scheduled' THEN get_bit(bitmask, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the note on the Go bitmask code, Postgres bit positions are numbered left-to-right instead of a more conventional right-to-left.
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
88b4903
to
37c75c3
Compare
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
37c75c3
to
a73277e
Compare
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
32fe194
to
27f7288
Compare
As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.
3c932f6
to
c0de0ae
Compare
825d9ea
to
c034b7c
Compare
client_test.go
Outdated
Included bool `json:"included" unique:"true"` | ||
Excluded bool `json:"excluded"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had another design idea on this. Instead of unique:"true"
, what if this was something like river:"unique"
? There would be room to apply additional comma-separated options in that list, such as for batching or sequence keys.
I think I like that more than having a bunch of separate tags with vague names like unique
, batch
, etc. Those could be prefixed with river_
of course but that'd be long and redundant, plus not a particularly efficient way to declare boolean attrs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to river:"unique"
so better & more cleanly support future options. Seems less vague/generic too which is nice.
1d857e1
to
0b2268d
Compare
@@ -139,6 +148,50 @@ func TestJobScheduler(t *testing.T) { | |||
requireJobStateUnchanged(t, bundle.exec, retryableJob3) // still retryable | |||
}) | |||
|
|||
t.Run("MovesUniqueKeyConflictingJobsToDiscarded", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized I don't have corresponding test changes for the underlying query in the riverdrivertest
, lmk if you think these details should be duplicated over there.
conflicting_jobs AS ( | ||
SELECT DISTINCT unique_key | ||
FROM river_job | ||
WHERE unique_key IN ( | ||
SELECT unique_key | ||
FROM jobs_to_schedule | ||
WHERE unique_key IS NOT NULL | ||
AND unique_states IS NOT NULL | ||
) | ||
AND unique_states IS NOT NULL | ||
AND river_job_state_in_bitmask(unique_states, state) | ||
), | ||
updated_jobs AS ( | ||
UPDATE river_job | ||
SET state = 'available' | ||
FROM jobs_to_schedule | ||
WHERE river_job.id = jobs_to_schedule.id | ||
RETURNING river_job.id | ||
SET | ||
state = CASE WHEN cj.unique_key IS NULL THEN 'available'::river_job_state | ||
ELSE 'discarded'::river_job_state END, | ||
finalized_at = CASE WHEN cj.unique_key IS NOT NULL THEN @now::timestamptz | ||
ELSE finalized_at END | ||
FROM jobs_to_schedule jts | ||
LEFT JOIN conflicting_jobs cj ON jts.unique_key = cj.unique_key | ||
WHERE river_job.id = jts.id | ||
RETURNING river_job.id, state = 'discarded'::river_job_state AS conflict_discarded | ||
) | ||
SELECT * | ||
SELECT | ||
sqlc.embed(river_job), | ||
updated_jobs.conflict_discarded | ||
FROM river_job | ||
WHERE id IN (SELECT id FROM river_job_scheduled); | ||
JOIN updated_jobs ON river_job.id = updated_jobs.id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so this is one way of handling the scenario where somebody removes retryable
from their ByStates
list, and then inserts a new job while there's still a would-be dupe in the retryable
state. Essentially I tried to make this one query able to detect and handle that case, instead moving the conflicting jobs to be discarded
instead of retryable
.
Two questions:
- Instead of doing this within a larger query like this, we could attempt to detect this conflict in application code and make follow up queries to (a) clear out/update/discard the conflicting jobs separately, and (b) try the
JobSchedule
query again. - Should this set a metadata value on the discarded job so the transition is detectable?
Also open to other ways to handle this. If we wanted to allow other states to be removed from that list, it would involve similar handling and needing to define what the behavior would be when such a conflict arises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a metadata value of "unique_key_confict": "scheduler_discarded"
so we can at least see when this case has been triggered.
0b2268d
to
256e1ce
Compare
#600) This is partly extracted from #590. As detailed in that PR, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. The migration in that PR exposes an issue with our migration framework, which is that there are certain schema changes which must be committed before they can be referenced by subsequent schema changes. The example of this I found is that if you add a new value to an enum, you cannot later create a function that relies on that new enum value unless the new value has first been committed. Committing it within a DDL transaction does _not_ count—it must be a full commit. IMO this might mean that the entire idea of a `MigrateTx` API is not workable with certain schema change combinations. At worst, it can result in unpredictable failures depending on the exact sequence of changes and how many migrations are being run at once. As such, in this PR I've deprecated `MigrateTx` and adjusted the migrator so that it opens a new transaction for each individual migration with a commit between them. Migrator tests were changed to move away from `MigrateTx` and to a setup where they get a new clean schema for each test that's disposed at the end. This makes it possible to test the full sequence of database migrations with a clean slate each time. I believe this is the right long-term direction because it's the approach that other migration libraries use (Rails/ActiveRecord, [Sequel][1], Goose, etc). It also enables the potential for us to add the ability for individual migrations to opt _out_ of a having a wrapping transaction, which is essential if we ever want our default to be `CREATE INDEX CONCURRENTLY` rather than synchronous indexing (as it really should be for any at-scale system). [1]: (https://sequel.jeremyevans.net/rdoc/files/doc/migration_rdoc.html#label-Transactions)
256e1ce
to
f22c7b3
Compare
I believe this is all set. I've updated the changelog with notes on how this PR impacts users and what to watch out for during migration. Going to merge it and aim for a |
PR #590 introduced a more flexible unique job mechanism that allowed for the unique states to be customized, all while still benefiting from the performance improvements of using a unique index rather than an advisory lock. The retryable state is included in the default list of states, but can be removed if the user doesn't want to prevent an erroring job from blocking duplicate inserts. However this created an issue: when trying to schedule a retryable job (move it to available) it could potentially have a conflict with a duplicate unique job. To handle this, special logic was added to try to deal with this scenario for unique jobs, moving the conflicting row to discarded rather than available. Unfortunately this logic had issues and was insufficiently tested. There were a couple specific scenarios that caused issues: A unique job that was being scheduled because it was either inserted as scheduled or had errored and become retryable would actually be considered a conflict with itself because the query didn't properly exclude the row being scheduled. Attempting to schedule two duplicate retryable unique jobs at the same time would lead to a unique conflict because there was no mechanism to prevent this. The query changes in this PR address both of the above cases along with test coverage. The increased complexity is unfortunate, and we're probably nearing the limit of what should be dealt with in a single SQL query. If this still isn't complete I'm more inclined to fix the issues by catching these conflicts at the application level, explicitly moving the conflicting row(s) to discarded, and trying again. This can be looped with a backoff or recursed to ensure that progress keeps being made as individual conflicts get resolved. But hopefully that won't be necessary. Fixes #618.
Background
Our first two attempts at unique insertions have been limited:
The initial version relied on advisory locks, which were slow enough and had enough potential for conflicts that we restricted them to the single job insertion path. They were fairly flexible in terms of allowing a full customization of which states should be considered in the uniqueness check.
The second version of unique jobs brought us significantly improved performance using a database unique index on the hash of a computed unique key. However, it came with less flexibility: that code path would only be used when the user did not customize the list of unique states beyond our default set of states (everything but
cancelled
anddiscarded
).River queries were hardcoded to clear out the
unique_key
value if the job transitioned to a designated state (ex. 1 2). If a user wanted to ensure that a job is never duplicated even if there's already an identical job in either of those unsupported states, River would fall back to the slower/older advisory lock uniqueness path.This feature also was only available when inserting one job at a time, and because its unique index was prefixed with
kind
it couldn't be used across job kinds.An improved approach
In this PR, I've proofed out an improved design for unique jobs that I believe addresses all of the shortcomings of the previous designs, maintaining full flexibility while also maintaining the performance improvements of the database-level unique constraint design. The new design avoids design 2's need for a hardcoded list of allowed states by moving the list of unique states into the database on a per-row basis.
I initially looked at doing this with a
river_job_state[]
column, but that was both verbose and not particularly space-efficient. Instead I landed on abit(8)
bitmask column, which is included as part of the compound partial index for uniqueness. A custom SQL function allows that unique index to only apply to jobs when their state is within the allowed states as reflected in the bitmask, so when they are moved to another state they are also no longer considered in uniqueness guarantees. (Side note: GPT-o1 was actually quite helpful in ironing out some aspects of this design and helping me to reason about whether I had considered everything. It's super impressive ✨)Importantly, this design does not rely on advisory locks in any way, and in this PR the new path has been expanded to also work on the bulk insertion paths. Additionally, if for some reason users wish to ensure uniqueness across different job kinds, they can do that here using the
ExcludeKind
field onUniqueOpts
.kind
is no longer part of the new unique index to allow for this flexibility.Another nicety: this gets us very close to having a unified code path for all inserts without any caveats depending on how the job was inserted. 🔜 We should be able to get there as soon as we remove the old deprecated unique code paths.
Restrictions on state
This approach can be made to work for a totally custom list of states. However, there's some complexity and undefined edge cases with doing that, particularly for allowing people to remove
pending
,scheduled
,available
, orrunning
from the list. To reason this through, I prepared a mermaid state transition diagram (included in PR) which I believe illustrates all possible job state transitions. On a printed version of this I was able to categorize all the state changes to note which queries are responsible for them.The analysis of this is verbose, but if you're intrigued feel free to read it through.
Detailed analysis of states list and reasons for customizing
Default unique states
Since we first introduced unique jobs, this has been the default list of states that are checked for uniqueness:
These states ensure that uniqueness is enforced unless a job has been explicitly cancelled or has exhausted its errors.
The omission of
discarded
andcancelled
makes sense as a default (those states indicate something has gone wrong and maybe you want to be able to retry with a new job). However they are not the only scenario.Reasons for customizing this list
For the various states, there are reasons someone might want to add or remove a state from the list. I’ll go through each state here and talk about why someone might want to add/remove it from the list, or what complications that would introduce.
Pending
Included by default. I can't think of a use case that would benefit from removing this from the list.
If you remove it, we could have trouble moving jobs out of
pending
intoavailable
orscheduled
(if they’re enabled). Would need to have those workflow/sequence queries be able to skip rows that have conflicts, but they would just keep conflicting and being skipped on every run.Scheduled
Included by default. I can't think of a use case that would benefit from removing this from the list.
Complicated to allow removal. If you allow multiple dupes to be in
scheduled
state, but block them from proceeding toavailable
, then our scheduler maintenance process (or workflow scheduler) wouldn't be able to advance those jobs when it's time for them to run.Available
Included by default. The main use case I can think of to remove this and earlier states from the unique list is to facilitate a poor man's version of global concurrency limiting (limited to 1).
Dangerous to remove, because we can get multiple jobs in the available state but then only one of them can move to
running
without a conflict. Allowing this that will put a lot of stress on the job fetch query, which is the absolute last place we want to deal with repeated conflicts and row rewrites.Running
Included by default. I can't think of a use case that would benefit from removing this from the list.
It's dangerous to remove, because then you can insert and start working duplicate jobs, only to be unable to complete/retry/discad them (unless those other states are removed also).
Retryable
Included by default. A user may wish to allow a job to try once and may want to enforce uniqueness up until the point at which it has errored once, at which case they could want to allow dupes.
If removed, the scheduler may have trouble moving back to
available
. That could be dealt with by having the scheduler query mark the row as discarded if it can’t be moved back toavailable
due to a unique conflict.Completed
Included by default. Similar to
cancelled
anddiscarded
it may be desirable to remove this if you want to ensure there is only one “live” job at a time in some non-finalized state.No major downsides to removing it.
Cancelled (opt in)
Use case: I want to ensure that there is only one job of a given kind for a timeframe or set of args, even if it gets cancelled.
If you allow earlier states to be removed from the list, the cancellation and completer queries would need to handle this situation somehow and I'm not sure there's a "right" answer for what to do in that case.
Discarded (opt in)
Use case: I want to ensure that there is only one job of a given kind for a timeframe or set of args, even if it runs out of errors. This makes sure that job is never ever duplicated.
If you allow earlier states to be removed from the list, the cancellation and completer queries would need to handle this situation somehow and I'm not sure there's a "right" answer for what to do in that case.
A solution?
Most problems are resolved if with one rule: don’t allow removing the
pending
,available
,scheduled
, andrunning
states from the list. Here’s an analysis of the other states given this rule:If one removes
retryable
, there can be dupes in theretryable
state that can’t all be moved back to another state. ManualJobRetry
can just error in this case, whereas automated scheduling of retries can move the job todiscarded
if a conflict occurs.If one adds
cancelled
to the default list, that does not have any adverse impact, because there can be no dupes in any of the default states pluscancelled
. You will always still be able to retry that cancelled job, because none of the states it will go into can be occupied by a duplicate job.Likewise, adding
discarded
is not an issue for the same reasons.Finally, there’s no issue with removing
completed
, other than somebody being blocked from manually retrying an already-completed job. In that case we can just bubble up the error to the user.In short, I think it would make sense to restrict users from removing these 4 initial states from their
UniqueOpts.ByState
. I'm not sure of the best UX for that. We could simply error when you try to insert a job that breaks that rule (at least when using the multi insert / non-advisory code path). Or we could use a different way of customizing states besides a list.Bonus: opt in only specific args to uniqueness
Since it was first introduced, we've heard from users who want to consider only a subset of their job args when determining uniqueness. For example, I might want to make sure there is exactly one job of a particular type for a given
customer_id
, even though there may also be other args on the job. Once I had proofed out the new design, I scratched an itch to spike out this feature.As implemented here, when a job has been opted into uniqueness via
UniqueOpts
withByArgs
, it can optionally add ariver:"unique"
struct tag to specific fields that the user wants to be considered in uniqueness. For example, consider this struct:For unique jobs of this type, the uniqueness constraint will be applied on the following json:
JSON field names are respected as well if customized in the struct tags, and fields that are omitted from the encoded JSON will also be omitted from the uniqueness JSON. These arg keys are sorted alphabetically before being built into the uniqueness JSON to ensure a consistent result.
Transition Plan
While proofing out this design, I decided to avoid getting slowed down with legacy compatibility and stripped out the old uniqueness systems entirely. However I don't think we can do just merge it without considering how existing apps will transition between the systems, ideally without any downtime.
Also, the unique key calculation is necessarily different here than in the v2 design, because
kind
is now a part of the unique key instead of part of the index. So we can't easily migrate in-flight jobs from the v2 design to the v3 design. And v1 unique jobs don't even have aunique_key
populated at all, so they can't be transitioned cleanly either.I have some ideas:
v0.x.y
release series).unique_key
value when cancelling or discarding jobs, but only do so for rows withunique_states IS NULL
. This will allow v2 unique jobs to continue being processed in the same way.pending
,scheduled
,available
,running
), continue to fall back to the v1 design but log a deprecation warning. Make sure this legacy mode only works for the single-job insert paths and is explicitly blocked/errored when usingInsertMany
.unique_key
andunique_states
populated. No new jobs will be inserted with v2 mode because v3 is strictly a superset of it.Remaining issues
Decide how to implement the restricted list of unique states and refine its UX
Query changes to fix scheduler conflicts if
retryable
is removed from unique states (job won't be able to retry and must be discarded). Need tests for this too.Implementation changes to account for migration plan (restore deleted code)
Rename
unique:"true"
struct tag toriver:"unique"
for cleaner, more forward-compatible API designTests to ensure that jobs using the old unique designs continue to function, for now.
Go documentation
Website documentation
Consistent alphabetical sort order for all encoded args (or else reordering struct fields causes uniqueness changes)
Migration framework and tests don't play nicely with this latest migration. The reason is that migration 4 adds a
pending
state to theriver_job_state
enum, and this latest migration 6 attempts to use that value within an immutable SQL function. This results in the following error:The cause of this is that our current migration system and its tests will attempt to run all available migrations in a single transaction without committing between each one. I believe this might be part of why other migration systems (like Rails/ActiveRecord) use a separate transaction for each individual migration rather than running them all in one.
We may be able to solve this for compatible databases by opening a DDL sub transaction for each individual migration (as I've done in this PR to get tests passing and
testdbman
to work) but we should resolve that in a standalone PR before this one gets merged. In particular the one test still failing in this PR is one that tries to roll back and then re-run all migrations within a single test transaction.Maybe you want to take a crack at a standalone PR for migration framework changes @brandur?Fixed in Migrations: use a txn + commit for each migration, deprecate MigrateTx #600.