Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: lock static query map [DET-3633, DET-4011] #1216

Merged
merged 2 commits into from
Sep 2, 2020
Merged

chore: lock static query map [DET-3633, DET-4011] #1216

merged 2 commits into from
Sep 2, 2020

Conversation

stoksc
Copy link
Contributor

@stoksc stoksc commented Sep 1, 2020

Description

This change locks concurrent access to the queries map in db.PgDB. Technically two concurrent queries could come in, both see the value doesn't exist, and both populate it (the entire critical section isn't locked), but they would overwrite each other with the same value and the write lock stops the go runtime from barfing over concurrent map writes. I think being able to use RLock for the read portion is worth this, since we'll almost never have writes.

Test Plan

Commentary (optional)

Checklist

  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.

Copy link
Member

@rb-determined-ai rb-determined-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed before, this is kind of a bandaid solution for what really is sort of a silly and unnecessary lazy-loading strategy that we have. I'm ok with doing it this way for now since it brings us to being functionally correct very quickly, but what is the long-term plan?

You mentioned using postgres stored procedures, how much work would that be? Do you think it would be worth it?

func (q *staticQueryMap) getOrLoad(queryName string) string {
q.RLock()
query, ok := q.queries[queryName]
q.RUnlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: should we be using defer to avoid situations where a panic of some sort happens, causing the unlock to never happen? (the question applies here and below)

Copy link
Contributor Author

@stoksc stoksc Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we defer q.RUnlock() then I think if !ok we could have a deadlock trying to acquire q.Lock(). We could defer q.Unlock() a couple lines below, but I don't see any way a panic could happen (since we're using the ok return from the map).

@stoksc
Copy link
Contributor Author

stoksc commented Sep 1, 2020

@rb-determined-ai

this is kind of a bandaid solution for what really is sort of a silly and unnecessary lazy-loading strategy that we have

yep.

I'm ok with doing it this way for now since it brings us to being functionally correct very quickly, but what is the long-term plan?

I'm doing this too, because since the new api uses this code path, it seems to be occurring more and more often. Going to bring the long term solution up for discussion.

You mentioned using postgres stored procedures, how much work would that be? Do you think it would be worth it?

Maybe I can take my tech debt week to clean up postgres.go, I'll run some ideas by #infra-ag.

@stoksc stoksc assigned rb-determined-ai and unassigned stoksc Sep 1, 2020
Copy link
Contributor

@neilconway neilconway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do we really think that this is simpler than eager-loading all the queries into the map during master startup?


func (q *staticQueryMap) getOrLoad(queryName string) string {
q.RLock()
query, ok := q.queries[queryName]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, as a practical matter, I'm not sure this optimization is warranted -- i.e.,

q.Lock()
defer q.Unlock()
query, ok := q.queries[queryName]
if !ok {
    query = string(etc.MustStaticFile(...))
    q.queries[queryName] = query
}
return query

would probably be fine in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that for this specific case with 10-20 writes and unbounded reads, this would be better. But I did benchmark them for fun and they were very close. I can change it since it's so much simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and changed this, since it's impact is minimal in comparison to the db query associated with each call.

Copy link
Member

@rb-determined-ai rb-determined-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Neil that it seems better to eagerly load the postgres queries when we create the PgDB, but if you have a lot on your plate, I'm not terribly concerned.

In any case, I think we should file a ticket for a longer-term solution and we should probably land this today, so this bug is fixed for the release.

@stoksc
Copy link
Contributor Author

stoksc commented Sep 1, 2020

question: do we really think that this is simpler than eager-loading all the queries into the map during master startup?

@neilconway I do, just because if we need to load them all at startup, we need to be supplied a list of all the queries we preload. So the initial change would require finding all the queries that are loaded this way and then adding any future queries to a list to preload. Forgetting this would be a runtime error; I'd expect to hit it in testing but also I'd rather not rely on that.

That said, I really don't like this pattern, I'm just trying to fix this as simply as possible, saving a larger refactor for after a discussion.

@neilconway
Copy link
Contributor

I'm just trying to fix this as simply as possible, saving a larger refactor for after a discussion.

sgtm.

not much perf is gained with the complicated rwmutex usage, so just chage to regular mutex
@stoksc stoksc merged commit 4b4449f into determined-ai:master Sep 2, 2020
@stoksc stoksc deleted the lock-static-query-map branch September 2, 2020 15:01
@stoksc stoksc changed the title chore: lock static query map [DET-3633] chore: lock static query map [DET-3633, DET-4011] Sep 3, 2020
@dannysauer dannysauer added this to the 0.13.2 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants