-
Notifications
You must be signed in to change notification settings - Fork 1.3k
There may be a problem with how we do DB migrations concurrently? #1491
Comments
I don't know what causes this, but I have seen this once and also could not reproduce it. However, I suspect this is a more general issue: there is something wrong with the way we handle migrations running concurrently. The management console doesn't do anything fancy or special here, it just acts as another frontend instance, effectively. The only difference is that we used to run one frontend instance in dev mode, but we're now running "two" (i.e., the frontend + the management console, each of which try to migrate the DB). This could be a larger issue / something we just haven't caught in prod / dogfood, or it could be something specific to local dev. I'm not sure, and also unsure how I would reproduce. I am rewording title + removing my direct assignment because anyone can look into this (I don't have any special knowledge here). |
I am tagging as release blocker because of my own uncertainty about what is causing this. If we can identify the cause or likely cause, we can remove release-blocker. |
I was just able to reproduce a few times (unreliably) so I do think it is a race condition somewhere. All the times I reproduced it were when I was starting with a clean DB (as expected since I don't have any migrations running locally otherwise). I do wonder if this is specifically a race condition with the first migration, which might explain why we are seeing this on dev now but have never seen it on prod before. It seems like the migrate tool should correctly handle race conditions in a steady state migration, but maybe it doesn't handle the first one well. cc @beyang @keegancsmith as potential owners to investigate this |
@keegancsmith since we dropped Postgres 11 upgrade, I am assigning this to you. |
Ok, I'll take a look. FYI I just commented on the postgres issue with motivation of why we should do it. |
I suspect that this is just a race condition with the initial creation of the schema migrations table on a fresh DB. The schema migrations table is exactly what prevents race conditions in the steady state, so maybe the two services are racing to create it? I am kind of surprised that this doesn't appear to be an issue when testing docker images. If this turns out to only impact dev and not customers then it is not release blocking. |
I am going to remove the release-blocker tag since nobody has seen this issue while testing our docker images. I can't explain why it would be an issue in dev and not in the docker image. |
It happened to me yesterday when testing the docker image. Maybe people aren't clearing state. Its a pretty bad issue. |
Happened to me just now after running |
Thanks for confirming. This is a bad release blocker then. |
I had a I then ran |
@keegancsmith: Can I take this off your plate? |
Related issue with Concourse who used the same migration library we do: concourse/concourse#2074 |
This issue seems like an exact match of what we're seeing: golang-migrate/migrate#55 |
This commit introduces an isolated atomic migrations test command to reliably reproduce these issues: - golang-migrate/migrate#55 - https://github.com/sourcegraph/sourcegraph/issues/1491
OK, I opened a PR with a test case upstream. It works with a high number of concurrent processes attempting to create the While we wait for the upstream contribution to be finalised, I'll change sourcegraph to use my branch. |
Note from Stephen: The issue here is caused by the fact that we now have two services in local dev trying to migrate the DB at the same time. This is in fact already happening in prod/dogfood, because multiple frontend instances are running. This could indicate something wrong with the way we do DB migrations concurrently but I don't know for sure.
Management console migration issue.
This happens first (when running ./dev/launch.sh)
Then next time I run ./dev/launch.sh I see
Dropping the database fixes the issue. This has happened to me twice this week but I don't know what steps reproduce it.
Full log
$ ./dev/launch.sh
Concurrently installing Yarn and Go dependencies...
Running
go install
from /Users/nick/dev/sourcegraphGo race detector disabled. You can enable it for specific commands by setting GORACED (e.g. GORACED=frontend,searcher or GORACED=all for all commands)
yarn install v1.12.3
warning ../../package.json: No license field
[1/5] 🔍 Validating package.json...
[2/5] 🔍 Resolving packages...
⠂ sourcegraph@link:packages/sourcegraph-extension-apigithub.com/sourcegraph/sourcegraph/cmd/frontend/graphqlbackend
[3/5] 🚚 Fetching packages...
github.com/sourcegraph/sourcegraph/cmd/frontend/internal/app/jscontext
github.com/sourcegraph/sourcegraph/cmd/frontend/docsite
github.com/sourcegraph/sourcegraph/cmd/frontend/registry
github.com/sourcegraph/sourcegraph/cmd/frontend/internal/httpapi
warning [email protected]: The engine "code" appears to be invalid.
[4/5] 🔗 Linking dependencies...
warning " > [email protected]" has unmet peer dependency "[email protected] - 3".
warning " > [email protected]" has unmet peer dependency "popper.js@^1.14.3".
warning " > [email protected]" has incorrect peer dependency "graphql@^0.6.0 || ^0.7.0 || ^0.8.0-b || ^0.9.0 || ^0.10.0 || ^0.11.0 || ^0.12.0".
warning " > [email protected]" has unmet peer dependency "prop-types@>=15.5.0".
warning "graphiql > [email protected]" has incorrect peer dependency "graphql@^0.10.0 || ^0.11.0".
warning "graphiql > codemirror-graphql > [email protected]" has incorrect peer dependency "graphql@^0.10.0 || ^0.11.0 || ^0.12.0 || ^0.13.0".
warning "graphiql > codemirror-graphql > graphql-language-service-interface > [email protected]" has incorrect peer dependency "graphql@^0.11.0 || ^0.12.0 || ^0.13.0".
warning "graphiql > codemirror-graphql > graphql-language-service-interface > [email protected]" has incorrect peer dependency "graphql@^0.10.0 || ^0.11.0 || ^0.12.0 || ^0.13.0".
warning "graphiql > codemirror-graphql > graphql-language-service-interface > [email protected]" has incorrect peer dependency "graphql@^0.10.0 || ^0.11.0 || ^0.12.0 || ^0.13.0".
warning "graphiql > codemirror-graphql > graphql-language-service-interface > [email protected]" has incorrect peer dependency "graphql@^0.10.0 || ^0.11.0 || ^0.12.0 || ^0.13.0".
warning "graphiql > codemirror-graphql > graphql-language-service-parser > [email protected]" has incorrect peer dependency "graphql@^0.10.0 || ^0.11.0".
warning "graphiql > codemirror-graphql > graphql-language-service-interface > graphql-config > [email protected]" has incorrect peer dependency "graphql@^0.11.0 || ^0.12.0 || ^0.13.0".
warning " > [email protected]" has unmet peer dependency "prop-types@*".
warning " > [email protected]" has incorrect peer dependency "react@^16.6.3".
warning " > [email protected]" has incorrect peer dependency "typescript@>= 2.3.1 < 3".
[5/5] 📃 Building fresh packages...
$ gulp generate && yarn --cwd shared -s run tslint:build-rules
[09:33:54] Requiring external module ts-node/register
Using mode development
[09:33:56] Using gulpfile ~/dev/sourcegraph/gulpfile.ts
[09:33:56] Starting 'generate'...
[09:33:56] Starting 'schema'...
[09:33:56] Starting 'graphQLTypes'...
[09:33:56] Finished 'schema' after 556 ms
[09:33:57] Finished 'graphQLTypes' after 1.16 s
[09:33:57] Finished 'generate' after 1.16 s
warning ../../../package.json: No license field
✨ Done in 13.27s.
~/dev/sourcegraph/cmd/management-console/web ~/dev/sourcegraph
Love Parcel? You can now donate to our open collective:
added 20 packages from 43 contributors, removed 4 packages, updated 134 packages and audited 9799 packages in 11.444s
found 0 vulnerabilities
~/dev/sourcegraph
Starting all binaries...
09:34:12 repo-updater | 2018/12/18 09:34:12 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
09:34:12 searcher | 2018/12/18 09:34:12 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
09:34:12 symbols | 2018/12/18 09:34:12 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
09:34:12 query-runner | 2018/12/18 09:34:12 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
09:34:12 github-proxy | 2018/12/18 09:34:12 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
09:34:12 gitserver | 2018/12/18 09:34:12 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
09:34:12 watch | Using chokidar.
09:34:12 web | [09:34:12] Requiring external module ts-node/register
09:34:13 management-console | 2018/12/18 09:34:13 pq: duplicate key value violates unique constraint "pg_type_typname_nsp_index" in line 0: CREATE TABLE "schema_migrations" (version bigint not null primary key, dirty boolean not null)
09:34:13 management-console | Terminating management-console
09:34:13 web | Terminating web
09:34:13 zoekt-indexserver | Terminating zoekt-indexserver
09:34:13 management-console-web | Terminating management-console-web
09:34:13 searcher | Terminating searcher
09:34:13 github-proxy | Terminating github-proxy
09:34:13 frontend | Terminating frontend
09:34:13 watch | Terminating watch
09:34:13 zoekt-webserver | Terminating zoekt-webserver
09:34:13 symbols | Terminating symbols
09:34:13 gitserver | Terminating gitserver
09:34:13 query-runner | Terminating query-runner
09:34:13 repo-updater | Terminating repo-updater
09:34:13 syntect_server | Terminating syntect_server
goreman: exit status 1
nick@Nicks-MacBook-Pro:sourcegraph $ ./dev/launch.sh
Concurrently installing Yarn and Go dependencies...
Running
go install
from /Users/nick/dev/sourcegraphGo race detector disabled. You can enable it for specific commands by setting GORACED (e.g. GORACED=frontend,searcher or GORACED=all for all commands)
yarn install v1.12.3
warning ../../package.json: No license field
[1/5] 🔍 Validating package.json...
[2/5] 🔍 Resolving packages...
success Already up-to-date.
$ gulp generate && yarn --cwd shared -s run tslint:build-rules
[11:28:05] Requiring external module ts-node/register
Using mode development
[11:28:07] Using gulpfile ~/dev/sourcegraph/gulpfile.ts
[11:28:07] Starting 'generate'...
[11:28:07] Starting 'schema'...
[11:28:07] Starting 'graphQLTypes'...
[11:28:08] Finished 'schema' after 584 ms
[11:28:08] Finished 'graphQLTypes' after 1.24 s
[11:28:08] Finished 'generate' after 1.24 s
warning ../../../package.json: No license field
✨ Done in 7.19s.
~/dev/sourcegraph/cmd/management-console/web ~/dev/sourcegraph
npm WARN management-console@ No description
npm WARN management-console@ No repository field.
npm WARN management-console@ No license field.
audited 9799 packages in 4.127s
found 0 vulnerabilities
╭───────────────────────────────────────────────────────────────╮
│ │
│ New minor version of npm available! 6.4.1 → 6.5.0 │
│ Changelog: https://github.com/npm/cli/releases/tag/v6.5.0 │
│ Run npm install -g npm to update! │
│ │
╰───────────────────────────────────────────────────────────────╯
~/dev/sourcegraph
Starting all binaries...
11:28:16 repo-updater | 2018/12/18 11:28:16 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
11:28:16 symbols | 2018/12/18 11:28:16 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
11:28:16 query-runner | 2018/12/18 11:28:16 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
11:28:16 github-proxy | 2018/12/18 11:28:16 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
11:28:16 gitserver | 2018/12/18 11:28:16 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
11:28:16 searcher | 2018/12/18 11:28:16 received error during background config update, err: unable to fetch new configuration: Post http://localhost:3090/.internal/configuration: dial tcp [::1]:3090: connect: connection refused
11:28:16 watch | Using chokidar.
11:28:16 management-console | 2018/12/18 11:28:16 Fatal error connecting to Postgres DB: Failed to migrate the DB. Please contact [email protected] for further assistance: Dirty database version 1503574972. Fix and force version.
11:28:16 management-console | Terminating management-console
11:28:16 management-console-web | Terminating management-console-web
11:28:16 query-runner | Terminating query-runner
11:28:16 searcher | Terminating searcher
11:28:16 frontend | Terminating frontend
11:28:16 syntect_server | Terminating syntect_server
11:28:16 zoekt-indexserver | Terminating zoekt-indexserver
11:28:16 zoekt-webserver | Terminating zoekt-webserver
11:28:16 repo-updater | Terminating repo-updater
11:28:16 github-proxy | Terminating github-proxy
11:28:16 web | Terminating web
11:28:16 gitserver | Terminating gitserver
11:28:16 symbols | Terminating symbols
11:28:16 watch | Terminating watch
goreman: exit status 1
The text was updated successfully, but these errors were encountered: