-
Notifications
You must be signed in to change notification settings - Fork 177
EP07 Database transaction API
Database transactions are currently handled internally to Index Drivers and not exposed to higher architectural levels.
This EP proposes an API to allow user-managed transactions. The API consists solely of new methods, and new optional arguments to existing methods where the default (i.e. not supplying the new arguments) corresponds to existing behaviour. This proposal is therefore fully backwards compatible.
This proposal is a more detailed writeup of an enhancement already proposed in the ODCv2 Road Map.
Paul Haesler (@SpacemanPaul)
- Under Discussion
- In Progress
- Completed
- Rejected
- Deferred
Currently, high-level indexing tools (e.g. in odc-tools
) cannot make use of database transactions to ensure index consistency and have limited efficiency of bulk indexing because each dataset must be indexed in a separate database transaction.
-
A new abstract
Transaction
class.- Has abstract methods
begin_transaction
,commit
, androllback
(intended for internal use - use as context manager preferred). - SHALL implement ContextHandler semantics (i.e.
__enter__
and__exit__
functions.) - SHALL carry awareness of whether the transaction it represents is active, committed or rolledback.
- SHALL carry methods to construct special exceptions that will be caught in
__exit__
and trigger a commit or rollback:commit_exception
androllback_exception
. - Implemented for
postgres
andpostgis
drivers (as wrappers around the existingPostgresDbAPI
/PostgisDbAPI
classes)
- Has abstract methods
-
A new abstract method on
AbstractIndex
:transaction()
- Returns a Transaction context handler with an active transaction.
- Implemented for
postgres
andpostgis
drivers - Raises an exception on Index Drivers that do not support transactions.
- A link to the Transaction is maintained in thread-local storage by the index driver.
- If there is an active transaction in thread-local storage, then all database access through that index driver from that thread is performed within the context of that transaction. (There is already a helper method) Code Examples:
# Context manager
with dc.index.transaction() as trans:
# Archive old datasets and add new ones in single transaction
dc.index.datasets.archive([old_ds1.id, old_ds2.id], transaction=trans)
dc.index.datasets.add(ds1)
dc.index.datasets.add(ds2)
# If execution gets to here, the transaction is committed.
# If an exception was raised by any of the above methods, the transaction is rolled back.
# Handling commits and rollbacks with exceptions
with dc.index.transaction() as trans:
# Archive old datasets and add new ones in single transaction
dc.index.datasets.archive([old_ds1.id, old_ds2.id])
dc.index.datasets.add(ds1)
if dont_need_ds2(ds1, ds2):
# Commit immediately and exit context handler.
raise trans.commit_exception("Shortcut commit")
elif problematic(ds1, ds2):
# Rollback immediately and exit context handler
raise trans.rollback_exception("Shortcut rollback")
elif really_problematic(ds1, ds2):
# Other exceptions trigger rollback as well, but
# are not caught by `__exit__` and so are propagated on
# up the stack.
raise ValueError("Uncaught exceptions trigger rollback too.")
dc.index.datasets.add(ds2)
# If execution gets to here, the transaction is committed.
# Manual low-level transaction management (not recommended, but supported)
trans = dc.index.transaction()
# Transaction is activated by ContextManager semantics - No `with` means we need to call begin manually.
trans.begin()
dc.index.datasets.update(ds1)
dc.index.datasets.update(ds2)
trans.rollback()
# ds1 and ds2 are NOT updated in the database
# No active thread-local transaction, so update() manages it's own internal transaction.
# (ds3 IS updated in the database)
dc.index.datasets.update(ds3)
# Manually start a new transaction:
trans.begin()
dc.index.datasets.update(ds4)
dc.index.datasets.update(ds5)
trans.commit()
# ds4 and ds5 ARE updated in the database
# Transaction is now inactive. No further cleanup required.
-
Robert Woodcock (@woodcockr)
- This looks good to me. Straightforward context and if need be manual handling. Provides the opportunity for the transaction benefit. Further optimisation can be performed through incremental changes to other code. I'm not overly familiar with all the database code in ODC these days but I am familiar with the impact of not having the transaction handling. Get's my vote.
-
@Kirill888 suggests using thread-local storage instead of the user managing the Transaction manually.
- merged
Approved (with @Kirill888's amendment, now merged into this EP) in Steering Council meeting September 2022.
- Paul Haesler (@SpacemanPaul)
Welcome to the Open Data Cube