Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mongo Server Side Transaction handles #6

Merged
merged 14 commits into from
Apr 10, 2019
Merged

Conversation

jameinel
Copy link
Member

This introduces the ability to StartTransaction/CommitTransaction/AbortTransaction from a mgo.Session object.

I haven't done any multithreaded testing yet, but I have confirmed the basic behavior around transactions and visibility with other sessions.

This is built on
#5
So the diff is a bit noisy right now.

This should allow supporting more versions of Mongo than the existing
mgo driver.
Copying over the filterDBs changes and EnsureIndex fixes from
globalsign/mgo.
Currently Abort is not supported, and it only supports Insert
operations, but the basic layout is plausible and seems like it will
work.
Interestingly, you can't see the documents that you've inserted in the
session that inserted them...
Need to pass in lsid and txnNumber to enable visibility of queries for
things that you just created.
Looks like we finally found the right attributes for where to put lsid
and txnNumber. We now see the right visibility for updates and
insertions. Still need to do Upsert/UpdateAll, etc.
Instead, all the places that were putting a transaction on the op before
submitting it, just use the code path that submits the request to deal
with the transaction.
session.go Outdated

func (s *Session) CommitTransaction() error {
if len(s.sessionId.Data) == 0 {
// XXX: error?, shouldn't commit if we never called s.ensureSessionId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd agree with this comment, you're in a wrong state if you hit this..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was waiting until we have actual tests covering these cases before doing the work, since it is often easy to accidentally not test the error paths.

session.go Outdated
return nil
}
if s.transaction == nil || !s.transaction.started {
// XXX: error?, no active transaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

session.go Outdated
return nil
}
if s.transaction.finished {
// XXX: logic error, we shouldn't be able to get here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

session.go Outdated
// XXX: logic error, we shouldn't be able to get here
return nil
}
// XXX: Python has a retry tracking 'retryable' errors around finishTransaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, sounds reasonable.

session.go Outdated
return nil
}
// XXX: do we want to track a transaction STATE, like they do in Python with states of:
// NONE, STARTING, IN_PROGRESS, ABORTED, COMMITTED, COMMITTED_EMPTY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does that give us?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mostly a case of "is it more understandable if you have a sequence of states" vs just booleans.

That said, I've even taken out the 'finished' state. Because while I could track that in the driver, there really isn't a reason to. The source of truth is going to be the Mongo Server, and it will happily tell you if you're using it wrong.
And all the cases where the driver was trying to 'be helpful' looking at them, were around misusing the driver (eg, concurrent requests during a transaction, which
a) Juju itself doesn't need
b) Doesn't feel very safe anyway, as it is messing with the concept of transaction
)

@@ -3222,6 +3256,10 @@ type findCmd struct {
OplogReplay bool `bson:"oplogReplay,omitempty"`
NoCursorTimeout bool `bson:"noCursorTimeout,omitempty"`
AllowPartialResults bool `bson:"allowPartialResults,omitempty"`
LSID bson.D `bson:"lsid,omitempty"`
TXNNumber int64 `bson:"txnNumber,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use a uint64 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual number is being transmitted as a bson.Long:
https://docs.mongodb.com/manual/reference/bson-types/
Which is a 64-bit integer. I don't know if uint64 maps correctly to that type. I do know that plain 'int' doesn't (I believe it maps to type 16, 32-bit integer).
I'd also say, if you ever have more than 9e18 transactions, other stuff is probably breaking. :)

if err := s.ensureSessionId(); err != nil {
return err
}
if s.transaction != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a potential data race here when StartTransaction gets called concurrently and you don't have an active txn: if two competing go-routines reach this if block at the same time they may both keep going and overwrite s.transaction.

A lock or CompareAndSwapPointers is probably needed here to be safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Session already has some Mutex objects, so I think I'll just use that. I was waiting until I felt comfortable with a test case that helps prove that it is threadsafe.

I think that fundamentally, StartTransaction is not a threadsafe action, because you aren't making any guarantees to the other threads as to whether they are in a transaction or not. but I want to make sure not to corrupt internal data structures if it ever did get called from multiple threads.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to run multiple transactions within a single session (concurrently)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't possible in the 'mongo' driver, as the transaction start/stop is hidden inside the transaction (which is what I'm doing in the Go driver).
The Python driver says it isn't possible.
I cheated a bit with the Go driver and hacked some stuff together and you get this:

... value *mgo.QueryError = &mgo.QueryError{Code:225, Message:"Cannot start transaction 1 on session 64b9481a-b555-41f8-a0a0-15e3ab449c60 - 47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU= because a newer transaction 2 has already started.", Assertion:false} ("Cannot start transaction 1 on session 64b9481a-b555-41f8-a0a0-15e3ab449c60 - 47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU= because a newer transaction 2 has already started.")

FWIW, even if it was possible to have multiple txns on a single session, having the txn at the session level would mean that if you need parallel txns, you can just create another session. (Session.Copy()).

The test case looked like:

func (s *S) TestStartTransactionTwice(c *C) {
	session, coll := s.setupTxnSessionAndCollection(c)
	defer session.Close()
	c.Assert(session.StartTransaction(), IsNil)
	txn1 := session.GetTransaction()
	c.Assert(coll.Insert(bson.M{"a": "a"}), IsNil)
	// c.Assert(session.StartTransaction(), ErrorMatches, "transaction already started")
	c.Assert(session.StartTransaction(), IsNil)
	c.Assert(coll.Insert(bson.M{"b": "b"}), IsNil)
	txn2 := session.GetTransaction()
	session.SetTransaction(txn2)
	c.Assert(session.CommitTransaction(), IsNil)
	session.SetTransaction(txn1)
	c.Assert(session.CommitTransaction(), IsNil)
	var res bson.M
	c.Assert(coll.Find(bson.M{"a": "a"}).Select(bson.M{"a": 1, "b": 1, "_id": 0}).One(&res), IsNil)
	c.Check(res, DeepEquals, bson.M{"a": "a"})
	c.Assert(coll.Find(bson.M{"b": "b"}).Select(bson.M{"a": 1, "b": 1, "_id": 0}).One(&res), IsNil)
	c.Check(res, DeepEquals, bson.M{"b": "b"})
}

session.go Outdated
}
// XXX: Python has a retry tracking 'retryable' errors around finishTransaction
err := s.finishTransaction("commitTransaction")
// TODO(jam): 2019-02-28 do we need a mutex around this since Insert/Update/etc are potentially different goroutines?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to hold a lock within the method just to be safe.

Here is an example that may cause things to blow up: for whatever (e.g logic bug) reason we get 2 go-routines calling Commit. One of them may reach L4930 (if s.transaction.finished) and get context-switched while the second one actually commits the txn and sets s.transaction = nil. This will cause an NPE when the first go-routine attempts to evaluate the is-finished predicate.

return nil
}

func (s *Session) AbortTransaction() error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about holding locks while code is executing in this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current code, there is a mutex around the operation, and all the logic is pulled into the common 'finishTransaction', which just takes whether it is finishing with a Commit or with an Abort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also got rid of 'txn.finished'. I considered doing a mutex on Transaction, so that we get Start exactly right (if you call StartTransaction, and then 2 threads each try to do the first operation, you might get 2 'startTransaction' messages sent to the Mongo server). But I also realized, we don't actually care, because we won't be doing that, and I'd rather have people get told 'you're doing it wrong' than trying to make it work.

Also handle if a Session is Closed with an active transaction.
Rework a lot of the Commit/Abort code paths, and make use of the
Session.m object to properly handle Start/Abort/Commit.

Needs tests around Insert/Update/Upsert/Remove/Find and
Start/Commit/Abort sequencing.
These don't actually work because while there may not be a *data* race
in memory, they race eachother for things like 'who is the first to
start the transaction'. They also potentially race around 'is the
transaction finished'.

This might be good enough, as we don't really want to try to make it
possible to do things multithreaded, we just don't want the memory
structures to get corrupted.

One caveat is that the TXN isn't actually started until the first
operation, so it may be that we want to support an outer routine calling
StartTransaction, but it forks off a bunch of helper routines that
are all issued inside the transaction. If we want that, then the best
thing is to probably put a Mutex on the 'transaction' object itself. And
then we need to hold the mutex while the query is being performed, so
that we know we really have issued 'startTransaction' on the remote
side.
@@ -1609,12 +1628,23 @@ func (s *Session) Clone() *Session {
// Close terminates the session. It's a runtime error to use a session
// after it has been closed.
func (s *Session) Close() {
s.m.RLock()
txnActive := s.transaction != nil && s.transaction.started

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latching txnActive here will reduce but not entirely prevent the possibility for 2 go-routines obtaining the same value and then both trying to call AbortTransaction().

How about changing this to use a write lock instead, then move the if txnActive block inside the critical section and call out to something like s.abortTransactionWhileHoldingLock? AbortTransaction can be modified to just grab the write lock and call the private method which does the real abort.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is currently non-trivial to hold the actual lock while doing this. Because calling Session.Run() needs to grab the write lock to set up the connection to the backend.
Session.Run() calls DB.Run() calls Session.acquireSocket taking the write lock and doing a lot of logic about what socket to talk to,
DB.Run then also calls DB.run() which again grabs a read-lock to handle some of the parameters.
In general, it isn't meant to be re-entrant and holding the lock for the whole time.
I could certainly do all the work to get it right, but for our purposes, we don't actually care. We don't really want to be doing multithreaded transaction work. All the code that is dealing with transactions that we will be setting up right now is going to be single-threaded. (If you need concurrency, then create another Session, and you'll even have separate TXNs available.)

This is mostly about making it nice for when you accidentally mess up your test case, and you have a proper defer Session.Close() but you forget to abort the transaction. Without this sort of patch, the test hangs. I assume because the socket doesn't close because it has an active transaction.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the clarification.

Clone doesn't copy the session or transaction, that way we don't have to
worry about txn updates, but also we certainly think that Copy() should
have its own transaction identity. And it makes sense that Clone should
too. Especially we don't want Copy during a txn to behave differently
than Copy outside of a transaction.

Get rid of txn.finished. Let the server decide if the transaction is
active.
@jameinel
Copy link
Member Author

!!build!!

12 similar comments
@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

3 similar comments
@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

!!build!!

@jameinel
Copy link
Member Author

I'm going to go ahead and merge this to master, since it shouldn't affect you if you aren't using txns.
$$merge$$

@jameinel
Copy link
Member Author

!!build!!

@jujubot jujubot merged commit ece5c27 into juju:master Apr 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants