Bulk unique insertion, uniqueness with subset of args #590

bgentry · 2024-09-15T18:20:59Z

Background

Our first two attempts at unique insertions have been limited:

The initial version relied on advisory locks, which were slow enough and had enough potential for conflicts that we restricted them to the single job insertion path. They were fairly flexible in terms of allowing a full customization of which states should be considered in the uniqueness check.
The second version of unique jobs brought us significantly improved performance using a database unique index on the hash of a computed unique key. However, it came with less flexibility: that code path would only be used when the user did not customize the list of unique states beyond our default set of states (everything but cancelled and discarded).

River queries were hardcoded to clear out the unique_key value if the job transitioned to a designated state (ex. 1 2). If a user wanted to ensure that a job is never duplicated even if there's already an identical job in either of those unsupported states, River would fall back to the slower/older advisory lock uniqueness path.

This feature also was only available when inserting one job at a time, and because its unique index was prefixed with kind it couldn't be used across job kinds.

An improved approach

In this PR, I've proofed out an improved design for unique jobs that I believe addresses all of the shortcomings of the previous designs, maintaining full flexibility while also maintaining the performance improvements of the database-level unique constraint design. The new design avoids design 2's need for a hardcoded list of allowed states by moving the list of unique states into the database on a per-row basis.

I initially looked at doing this with a river_job_state[] column, but that was both verbose and not particularly space-efficient. Instead I landed on a bit(8) bitmask column, which is included as part of the compound partial index for uniqueness. A custom SQL function allows that unique index to only apply to jobs when their state is within the allowed states as reflected in the bitmask, so when they are moved to another state they are also no longer considered in uniqueness guarantees. (Side note: GPT-o1 was actually quite helpful in ironing out some aspects of this design and helping me to reason about whether I had considered everything. It's super impressive ✨)

Importantly, this design does not rely on advisory locks in any way, and in this PR the new path has been expanded to also work on the bulk insertion paths. Additionally, if for some reason users wish to ensure uniqueness across different job kinds, they can do that here using the ExcludeKind field on UniqueOpts. kind is no longer part of the new unique index to allow for this flexibility.

Another nicety: this gets us very close to having a unified code path for all inserts without any caveats depending on how the job was inserted. 🔜 We should be able to get there as soon as we remove the old deprecated unique code paths.

Restrictions on state

This approach can be made to work for a totally custom list of states. However, there's some complexity and undefined edge cases with doing that, particularly for allowing people to remove pending, scheduled, available, or running from the list. To reason this through, I prepared a mermaid state transition diagram (included in PR) which I believe illustrates all possible job state transitions. On a printed version of this I was able to categorize all the state changes to note which queries are responsible for them.

The analysis of this is verbose, but if you're intrigued feel free to read it through.

Detailed analysis of states list and reasons for customizing

Default unique states

Since we first introduced unique jobs, this has been the default list of states that are checked for uniqueness:

Pending
Available
Scheduled
Running
Retryable
Completed

These states ensure that uniqueness is enforced unless a job has been explicitly cancelled or has exhausted its errors.

The omission of discarded and cancelled makes sense as a default (those states indicate something has gone wrong and maybe you want to be able to retry with a new job). However they are not the only scenario.

Reasons for customizing this list

For the various states, there are reasons someone might want to add or remove a state from the list. I’ll go through each state here and talk about why someone might want to add/remove it from the list, or what complications that would introduce.

Pending

Included by default. I can't think of a use case that would benefit from removing this from the list.

If you remove it, we could have trouble moving jobs out of pending into available or scheduled (if they’re enabled). Would need to have those workflow/sequence queries be able to skip rows that have conflicts, but they would just keep conflicting and being skipped on every run.

Scheduled

Included by default. I can't think of a use case that would benefit from removing this from the list.

Complicated to allow removal. If you allow multiple dupes to be in scheduled state, but block them from proceeding to available, then our scheduler maintenance process (or workflow scheduler) wouldn't be able to advance those jobs when it's time for them to run.

Available

Included by default. The main use case I can think of to remove this and earlier states from the unique list is to facilitate a poor man's version of global concurrency limiting (limited to 1).

Dangerous to remove, because we can get multiple jobs in the available state but then only one of them can move to running without a conflict. Allowing this that will put a lot of stress on the job fetch query, which is the absolute last place we want to deal with repeated conflicts and row rewrites.

Running

Included by default. I can't think of a use case that would benefit from removing this from the list.

It's dangerous to remove, because then you can insert and start working duplicate jobs, only to be unable to complete/retry/discad them (unless those other states are removed also).

Retryable

Included by default. A user may wish to allow a job to try once and may want to enforce uniqueness up until the point at which it has errored once, at which case they could want to allow dupes.

If removed, the scheduler may have trouble moving back to available. That could be dealt with by having the scheduler query mark the row as discarded if it can’t be moved back to available due to a unique conflict.

Completed

Included by default. Similar to cancelled and discarded it may be desirable to remove this if you want to ensure there is only one “live” job at a time in some non-finalized state.

No major downsides to removing it.

Cancelled (opt in)

Use case: I want to ensure that there is only one job of a given kind for a timeframe or set of args, even if it gets cancelled.

If you allow earlier states to be removed from the list, the cancellation and completer queries would need to handle this situation somehow and I'm not sure there's a "right" answer for what to do in that case.

Discarded (opt in)

Use case: I want to ensure that there is only one job of a given kind for a timeframe or set of args, even if it runs out of errors. This makes sure that job is never ever duplicated.

If you allow earlier states to be removed from the list, the cancellation and completer queries would need to handle this situation somehow and I'm not sure there's a "right" answer for what to do in that case.

A solution?

Most problems are resolved if with one rule: don’t allow removing the pending, available, scheduled, and running states from the list. Here’s an analysis of the other states given this rule:

If one removes retryable, there can be dupes in the retryable state that can’t all be moved back to another state. Manual JobRetry can just error in this case, whereas automated scheduling of retries can move the job to discarded if a conflict occurs.

If one adds cancelled to the default list, that does not have any adverse impact, because there can be no dupes in any of the default states plus cancelled. You will always still be able to retry that cancelled job, because none of the states it will go into can be occupied by a duplicate job.

Likewise, adding discarded is not an issue for the same reasons.

Finally, there’s no issue with removing completed, other than somebody being blocked from manually retrying an already-completed job. In that case we can just bubble up the error to the user.

In short, I think it would make sense to restrict users from removing these 4 initial states from their UniqueOpts.ByState. I'm not sure of the best UX for that. We could simply error when you try to insert a job that breaks that rule (at least when using the multi insert / non-advisory code path). Or we could use a different way of customizing states besides a list.

Bonus: opt in only specific args to uniqueness

Since it was first introduced, we've heard from users who want to consider only a subset of their job args when determining uniqueness. For example, I might want to make sure there is exactly one job of a particular type for a given customer_id, even though there may also be other args on the job. Once I had proofed out the new design, I scratched an itch to spike out this feature.

As implemented here, when a job has been opted into uniqueness via UniqueOpts with ByArgs, it can optionally add a river:"unique" struct tag to specific fields that the user wants to be considered in uniqueness. For example, consider this struct:

type EmailJobArgs struct {
	EmailID string `json:"email_id" river:"unique"`
	Recipient string `json:"recipient" river:"unique"`
	Subject   string `json:"subject"`
	Body      string `json:"body"`
}

For unique jobs of this type, the uniqueness constraint will be applied on the following json:

{"email_id": 12345,"recipient":"[email protected]"}

JSON field names are respected as well if customized in the struct tags, and fields that are omitted from the encoded JSON will also be omitted from the uniqueness JSON. These arg keys are sorted alphabetically before being built into the uniqueness JSON to ensure a consistent result.

Transition Plan

While proofing out this design, I decided to avoid getting slowed down with legacy compatibility and stripped out the old uniqueness systems entirely. However I don't think we can do just merge it without considering how existing apps will transition between the systems, ideally without any downtime.

Also, the unique key calculation is necessarily different here than in the v2 design, because kind is now a part of the unique key instead of part of the index. So we can't easily migrate in-flight jobs from the v2 design to the v3 design. And v1 unique jobs don't even have a unique_key populated at all, so they can't be transitioned cleanly either.

I have some ideas:

Maintain both the v1 and v2 code paths temporarily (say, for one final v0.x.y release series).
Retain the bits of SQL queries which clear the unique_key value when cancelling or discarding jobs, but only do so for rows with unique_states IS NULL. This will allow v2 unique jobs to continue being processed in the same way.
For new jobs which specify a list of unique states that don't include all the required ones shown above (pending, scheduled, available, running), continue to fall back to the v1 design but log a deprecation warning. Make sure this legacy mode only works for the single-job insert paths and is explicitly blocked/errored when using InsertMany.
New jobs with a compatible set of states will use the v3 path and have both unique_key and unique_states populated. No new jobs will be inserted with v2 mode because v3 is strictly a superset of it.

Remaining issues

Decide how to implement the restricted list of unique states and refine its UX
Query changes to fix scheduler conflicts if retryable is removed from unique states (job won't be able to retry and must be discarded). Need tests for this too.
Implementation changes to account for migration plan (restore deleted code)
Rename unique:"true" struct tag to river:"unique" for cleaner, more forward-compatible API design
Tests to ensure that jobs using the old unique designs continue to function, for now.
Go documentation
Website documentation
Consistent alphabetical sort order for all encoded args (or else reordering struct fields causes uniqueness changes)
Migration framework and tests don't play nicely with this latest migration. The reason is that migration 4 adds a pending state to the river_job_state enum, and this latest migration 6 attempts to use that value within an immutable SQL function. This results in the following error:
```
ERROR: unsafe use of new value "pending" of enum type river_job_state (SQLSTATE 55P04)
```
The cause of this is that our current migration system and its tests will attempt to run all available migrations in a single transaction without committing between each one. I believe this might be part of why other migration systems (like Rails/ActiveRecord) use a separate transaction for each individual migration rather than running them all in one.

We may be able to solve this for compatible databases by opening a DDL sub transaction for each individual migration (as I've done in this PR to get tests passing and testdbman to work) but we should resolve that in a standalone PR before this one gets merged. In particular the one test still failing in this PR is one that tries to roll back and then re-run all migrations within a single test transaction. ~~Maybe you want to take a crack at a standalone PR for migration framework changes @brandur?~~ Fixed in Migrations: use a txn + commit for each migration, deprecate MigrateTx #600.

rivermigrate/river_migrate.go

bgentry · 2024-09-15T18:25:32Z

rivertype/river_type.go

+// JobArgs is an interface that should be implemented by the arguments to a job.
+// This definition duplicates the JobArgs interface in the river package so that
+// it can be used in other packages without creating a circular dependency.
+type JobArgs interface {
+	// Kind returns a unique string that identifies the type of job. It's used to
+	// determine which worker should work the job.
+	Kind() string
+}


Needed to use this from dbunique so I copied the interface into here. But then I ended up adding an unexported jobArgs within that package anyway—maybe that's better than exposing a duplicate in a user-facing package like this?

Edit: Actually, I did need this from other places as well. It has to be available as part of riverdriver.JobInsertFastParams as well as internal/dbunique.

Can keep it here in rivertype, or move to rivershared. Lmk what seems better.

internal/dbunique/unique_fields.go

bgentry · 2024-09-15T18:28:57Z

go.mod

+	github.com/tidwall/gjson v1.17.3
+	github.com/tidwall/sjson v1.2.5


gjson and sjson were super handy for being able to efficiently extract a subset of keys from a JSON payload and to reconstruct them into a new sorted subset payload. We could figure out a way to do this without them, but it would likely be slower (if using encoding/json) or a lot more verbose (if building up our own JSON bytes).

They're also super small dependencies with minimal indirects, so I feel like it's not a big deal to pull them in. Plus they're also used in riverpro.

bgentry · 2024-09-15T18:30:17Z

client.go

-	// Advisory locks are currently only used for the fallback/slow path of
-	// unique job insertion where finalized states are included in a ByState
-	// configuration.
-	AdvisoryLockPrefix int32


Once we complete the deprecation and eventual removal of the advisory lock based unique jobs, would you want to just nuke this config altogether? It's possible we could want to use advisory locks again at some point but it'd be easy enough to re-add.

bgentry · 2024-09-15T18:36:11Z

riverdriver/river_driver_interface.go

+	JobInsertFast(ctx context.Context, params *JobInsertFastParams) (*JobInsertFastResult, error)
+	JobInsertFastMany(ctx context.Context, params []*JobInsertFastParams) ([]*JobInsertFastResult, error)


This relates to the API changes from #584 so we may want to align on the right change to make here.

bgentry · 2024-09-15T18:37:17Z

riverdriver/river_driver_interface.go

+	// Args contains the raw underlying job arguments struct. It has already been
+	// encoded into EncodedArgs, but the original is kept here for to leverage its
+	// struct tags and interfaces, such as for use in unique key generation.
+	Args         rivertype.JobArgs


Related to discussion from #584, this was added to facilitate tag reflection on the JobArgs struct, even though it's otherwise unused (EncodedArgs are already encoded at this point).

bgentry · 2024-09-15T18:38:38Z

riverdriver/riverdatabasesql/go.mod

@@ -5,6 +5,7 @@ go 1.21
 toolchain go1.23.0

 require (
+	github.com/jackc/pgx/v5 v5.7.1


Hmm, I guess this is what I get for using pgtype.bits for the bitmask column. I'd be shocked if pq properly supports that, thoughts on what to do?

bgentry · 2024-09-15T18:39:27Z

riverdriver/riverpgxv5/internal/dbsqlc/river_job.sql

+ON CONFLICT (unique_key)
+    WHERE unique_key IS NOT NULL
+      AND unique_states IS NOT NULL
+      AND river_job_state_in_bitmask(unique_states, state)


I wish there were a better way but AFAICT this is the only way to indicate the conflict given the specific partial index in use here. It can't be a named CONSTRAINT because it's a partial index.

bgentry · 2024-09-15T18:40:28Z

riverdriver/riverpgxv5/migration/main/006_bulk_unique.up.sql

+        WHEN 'available' THEN get_bit(bitmask, 7)
+        WHEN 'cancelled' THEN get_bit(bitmask, 6)
+        WHEN 'completed' THEN get_bit(bitmask, 5)
+        WHEN 'discarded' THEN get_bit(bitmask, 4)
+        WHEN 'pending' THEN get_bit(bitmask, 3)
+        WHEN 'retryable' THEN get_bit(bitmask, 2)
+        WHEN 'running' THEN get_bit(bitmask, 1)
+        WHEN 'scheduled' THEN get_bit(bitmask, 0)


Per the note on the Go bitmask code, Postgres bit positions are numbered left-to-right instead of a more conventional right-to-left.

As detailed in #600, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. There are other examples of things which cannot be run in a transaction, such as `CREATE INDEX CONCURRENTLY`. While that specific one isn't solved here, moving away from a migrator that bundles migrations into a single transaction will also allow us to update our migration system to exclude certain migrations from transactions and i.e. add indexes concurrently.

bgentry · 2024-09-20T20:06:37Z

client_test.go

+			Included bool `json:"included" unique:"true"`
+			Excluded bool `json:"excluded"`


I had another design idea on this. Instead of unique:"true", what if this was something like river:"unique"? There would be room to apply additional comma-separated options in that list, such as for batching or sequence keys.

I think I like that more than having a bunch of separate tags with vague names like unique, batch, etc. Those could be prefixed with river_ of course but that'd be long and redundant, plus not a particularly efficient way to declare boolean attrs.

Changed to river:"unique" so better & more cleanly support future options. Seems less vague/generic too which is nice.

bgentry · 2024-09-21T19:04:41Z

internal/maintenance/job_scheduler_test.go

@@ -139,6 +148,50 @@ func TestJobScheduler(t *testing.T) {
 		requireJobStateUnchanged(t, bundle.exec, retryableJob3) // still retryable
 	})

+	t.Run("MovesUniqueKeyConflictingJobsToDiscarded", func(t *testing.T) {


I realized I don't have corresponding test changes for the underlying query in the riverdrivertest, lmk if you think these details should be duplicated over there.

bgentry · 2024-09-22T18:21:41Z

riverdriver/riverpgxv5/internal/dbsqlc/river_job.sql

+conflicting_jobs AS (
+    SELECT DISTINCT unique_key
+    FROM river_job
+    WHERE unique_key IN (
+            SELECT unique_key
+            FROM jobs_to_schedule
+            WHERE unique_key IS NOT NULL
+                AND unique_states IS NOT NULL
+        )
+        AND unique_states IS NOT NULL
+        AND river_job_state_in_bitmask(unique_states, state)
+),
+updated_jobs AS (
    UPDATE river_job
-    SET state = 'available'
-    FROM jobs_to_schedule
-    WHERE river_job.id = jobs_to_schedule.id
-    RETURNING river_job.id
+    SET
+        state        = CASE WHEN cj.unique_key IS NULL     THEN 'available'::river_job_state
+                            ELSE 'discarded'::river_job_state END,
+        finalized_at = CASE WHEN cj.unique_key IS NOT NULL THEN @now::timestamptz
+                            ELSE finalized_at END
+    FROM jobs_to_schedule jts
+    LEFT JOIN conflicting_jobs cj ON jts.unique_key = cj.unique_key
+    WHERE river_job.id = jts.id
+    RETURNING river_job.id, state = 'discarded'::river_job_state AS conflict_discarded
 )
-SELECT *
+SELECT
+    sqlc.embed(river_job),
+    updated_jobs.conflict_discarded
 FROM river_job
-WHERE id IN (SELECT id FROM river_job_scheduled);
+JOIN updated_jobs ON river_job.id = updated_jobs.id;


Ok, so this is one way of handling the scenario where somebody removes retryable from their ByStates list, and then inserts a new job while there's still a would-be dupe in the retryable state. Essentially I tried to make this one query able to detect and handle that case, instead moving the conflicting jobs to be discarded instead of retryable.

Two questions:

Instead of doing this within a larger query like this, we could attempt to detect this conflict in application code and make follow up queries to (a) clear out/update/discard the conflicting jobs separately, and (b) try the JobSchedule query again.

Should this set a metadata value on the discarded job so the transition is detectable?

Also open to other ways to handle this. If we wanted to allow other states to be removed from that list, it would involve similar handling and needing to define what the behavior would be when such a conflict arises.

Added a metadata value of "unique_key_confict": "scheduler_discarded" so we can at least see when this case has been triggered.

#600) This is partly extracted from #590. As detailed in that PR, there are certain combinations of schema changes which are not allowed to be run within the same transaction. The example we encountered with #590 is adding a new enum value, then using it in an immutable function during a subsequent migration. In Postgres, these must be separated by a commit. The migration in that PR exposes an issue with our migration framework, which is that there are certain schema changes which must be committed before they can be referenced by subsequent schema changes. The example of this I found is that if you add a new value to an enum, you cannot later create a function that relies on that new enum value unless the new value has first been committed. Committing it within a DDL transaction does _not_ count—it must be a full commit. IMO this might mean that the entire idea of a `MigrateTx` API is not workable with certain schema change combinations. At worst, it can result in unpredictable failures depending on the exact sequence of changes and how many migrations are being run at once. As such, in this PR I've deprecated `MigrateTx` and adjusted the migrator so that it opens a new transaction for each individual migration with a commit between them. Migrator tests were changed to move away from `MigrateTx` and to a setup where they get a new clean schema for each test that's disposed at the end. This makes it possible to test the full sequence of database migrations with a clean slate each time. I believe this is the right long-term direction because it's the approach that other migration libraries use (Rails/ActiveRecord, [Sequel][1], Goose, etc). It also enables the potential for us to add the ability for individual migrations to opt _out_ of a having a wrapping transaction, which is essential if we ever want our default to be `CREATE INDEX CONCURRENTLY` rather than synchronous indexing (as it really should be for any at-scale system). [1]: (https://sequel.jeremyevans.net/rdoc/files/doc/migration_rdoc.html#label-Transactions)

bgentry · 2024-09-22T20:11:15Z

I believe this is all set. I've updated the changelog with notes on how this PR impacts users and what to watch out for during migration. Going to merge it and aim for a v0.12.0-rc.1 prerelease to shake out any issues.

PR #590 introduced a more flexible unique job mechanism that allowed for the unique states to be customized, all while still benefiting from the performance improvements of using a unique index rather than an advisory lock. The retryable state is included in the default list of states, but can be removed if the user doesn't want to prevent an erroring job from blocking duplicate inserts. However this created an issue: when trying to schedule a retryable job (move it to available) it could potentially have a conflict with a duplicate unique job. To handle this, special logic was added to try to deal with this scenario for unique jobs, moving the conflicting row to discarded rather than available. Unfortunately this logic had issues and was insufficiently tested. There were a couple specific scenarios that caused issues: A unique job that was being scheduled because it was either inserted as scheduled or had errored and become retryable would actually be considered a conflict with itself because the query didn't properly exclude the row being scheduled. Attempting to schedule two duplicate retryable unique jobs at the same time would lead to a unique conflict because there was no mechanism to prevent this. The query changes in this PR address both of the above cases along with test coverage. The increased complexity is unfortunate, and we're probably nearing the limit of what should be dealt with in a single SQL query. If this still isn't complete I'm more inclined to fix the issues by catching these conflicts at the application level, explicitly moving the conflicting row(s) to discarded, and trying again. This can be looped with a backoff or recursed to ensure that progress keeps being made as individual conflicts get resolved. But hopefully that won't be necessary. Fixes #618.

bgentry requested a review from brandur September 15, 2024 18:20

bgentry commented Sep 15, 2024

View reviewed changes

This was referenced Sep 17, 2024

Add middleware system for jobs #584

Closed

Migrations: use a txn + commit for each migration, deprecate MigrateTx #600

Merged

bgentry changed the base branch from master to bg-migration-framework-ddl-tx-per-migration September 19, 2024 15:31

bgentry force-pushed the bg-bulk-unique-inserts branch from 88b4903 to 37c75c3 Compare September 19, 2024 15:33

bgentry force-pushed the bg-bulk-unique-inserts branch from 37c75c3 to a73277e Compare September 20, 2024 02:09

bgentry force-pushed the bg-bulk-unique-inserts branch from 32fe194 to 27f7288 Compare September 20, 2024 18:26

bgentry force-pushed the bg-migration-framework-ddl-tx-per-migration branch from 3c932f6 to c0de0ae Compare September 20, 2024 18:27

bgentry force-pushed the bg-bulk-unique-inserts branch 2 times, most recently from 825d9ea to c034b7c Compare September 20, 2024 20:03

bgentry commented Sep 20, 2024

View reviewed changes

bgentry force-pushed the bg-bulk-unique-inserts branch 6 times, most recently from 1d857e1 to 0b2268d Compare September 21, 2024 18:49

bgentry marked this pull request as ready for review September 21, 2024 18:55

bgentry commented Sep 21, 2024

View reviewed changes

bgentry commented Sep 22, 2024

View reviewed changes

bgentry force-pushed the bg-bulk-unique-inserts branch from 0b2268d to 256e1ce Compare September 22, 2024 19:13

Base automatically changed from bg-migration-framework-ddl-tx-per-migration to master September 22, 2024 19:33

bgentry added 5 commits September 22, 2024 14:34

add state machine mermaid chart

d7d21ed

new unique jobs implementation that works on bulk insert

2004586

support uniqueness for specific args via struct tags

f4fe02e

block bulk inserts using advisory lock uniqueness

fabc02b

handle unique conflicts from scheduler by discarding jobs

f22c7b3

bgentry force-pushed the bg-bulk-unique-inserts branch from 256e1ce to f22c7b3 Compare September 22, 2024 19:34

bgentry mentioned this pull request Sep 22, 2024

UniqueOpts with ByTags argument #368

Closed

add changelog entry

6c7f547

bgentry merged commit d977e10 into master Sep 22, 2024
14 checks passed

bgentry deleted the bg-bulk-unique-inserts branch September 22, 2024 20:11

This was referenced Sep 23, 2024

InsertManyFast doc fixes for uniqueness #609

Merged

Fix unique scheduling issues #619

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk unique insertion, uniqueness with subset of args #590

Bulk unique insertion, uniqueness with subset of args #590

bgentry commented Sep 15, 2024 •

edited

Loading

bgentry Sep 15, 2024 •

edited

Loading

bgentry Sep 15, 2024

bgentry Sep 15, 2024

bgentry Sep 15, 2024

bgentry Sep 15, 2024

bgentry Sep 15, 2024

bgentry Sep 15, 2024 •

edited

Loading

bgentry Sep 15, 2024

bgentry Sep 20, 2024

bgentry Sep 21, 2024

bgentry Sep 21, 2024

bgentry Sep 22, 2024

bgentry Sep 22, 2024

bgentry commented Sep 22, 2024

		github.com/tidwall/gjson v1.17.3
		github.com/tidwall/sjson v1.2.5

		JobInsertFast(ctx context.Context, params JobInsertFastParams) (JobInsertFastResult, error)
		JobInsertFastMany(ctx context.Context, params []JobInsertFastParams) ([]JobInsertFastResult, error)

		Included bool `json:"included" unique:"true"`
		Excluded bool `json:"excluded"`

Bulk unique insertion, uniqueness with subset of args #590

Bulk unique insertion, uniqueness with subset of args #590

Conversation

bgentry commented Sep 15, 2024 • edited Loading

Background

An improved approach

Restrictions on state

Default unique states

Reasons for customizing this list

Pending

Scheduled

Available

Running

Retryable

Completed

Cancelled (opt in)

Discarded (opt in)

A solution?

Bonus: opt in only specific args to uniqueness

Transition Plan

Remaining issues

bgentry Sep 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgentry Sep 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgentry commented Sep 22, 2024

bgentry commented Sep 15, 2024 •

edited

Loading

bgentry Sep 15, 2024 •

edited

Loading

bgentry Sep 15, 2024 •

edited

Loading