Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres fails to start when template database with collation version mismatch is needed at startup #318777

Closed
hrdl-github opened this issue Jun 10, 2024 · 8 comments

Comments

@hrdl-github
Copy link

Describe the bug

postgresql.service fails to start when databases are created during startup while template1's has a collation version mismatch. Observed with postgresql 15.7.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Install postgresql on an older nixos version, e.g. 23.11
  2. Update to 24.05
  3. Install a service that will cause a postgres database to be created during postgres startup, e.g. by services.roundcube.enable = true.

Expected behavior

Postgresql runs and roundcube database gets created

Additional context

Jun 10 11:41:31 rack systemd[1]: Starting PostgreSQL Server...
Jun 10 11:41:31 rack postgres[80547]: [80547] LOG:  starting PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
Jun 10 11:41:31 rack postgres[80547]: [80547] LOG:  listening on IPv6 address "::1", port 5432
Jun 10 11:41:31 rack postgres[80547]: [80547] LOG:  listening on IPv4 address "127.0.0.1", port 5432
Jun 10 11:41:31 rack postgres[80547]: [80547] LOG:  listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
Jun 10 11:41:31 rack postgres[80550]: [80550] LOG:  database system was shut down at 2024-06-10 11:27:18 GMT
Jun 10 11:41:31 rack postgres[80547]: [80547] LOG:  database system is ready to accept connections
Jun 10 11:41:31 rack postgres[80564]: [80564] ERROR:  template database "template1" has a collation version mismatch
Jun 10 11:41:31 rack postgres[80564]: [80564] DETAIL:  The template database was created using collation version 2.38, but the operating system provides version 2.39.
Jun 10 11:41:31 rack postgres[80564]: [80564] HINT:  Rebuild all objects in the template database that use the default collation and run ALTER DATABASE template1 REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
Jun 10 11:41:31 rack postgres[80564]: [80564] STATEMENT:  CREATE DATABASE "roundcube"
Jun 10 11:41:31 rack postgresql-post-start[80563]: ERROR:  template database "template1" has a collation version mismatch
Jun 10 11:41:31 rack postgresql-post-start[80563]: DETAIL:  The template database was created using collation version 2.38, but the operating system provides version 2.39.
Jun 10 11:41:31 rack postgresql-post-start[80563]: HINT:  Rebuild all objects in the template database that use the default collation and run ALTER DATABASE template1 REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.
Jun 10 11:41:31 rack systemd[1]: postgresql.service: Control process exited, code=exited, status=1/FAILURE

Workaround:

  1. Disable service that causes database creation
  2. nixos-rebuild switch, allowing postgres database to start
  3. Run ALTER DATABASE template1 REFRESH COLLATION VERSION via psql
  4. Reenable service

Notify maintainers

@thoughtpolice @danbst @globin @ivan @Ma27 @wolfgangwalther

Metadata

- system: `"x86_64-linux"`
 - host os: `Linux 6.6.32, NixOS, 24.05 (Uakari), 24.05.1135.9b5328b7f761`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - channels(root): `"nixos-24.05"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/nextcloud-failed-to-run-since-recent-update/46938/1

@SuperSandro2000
Copy link
Member

Please test #320323

@hrdl-github
Copy link
Author

I have changed the collation version, as described in my workaround, so I can no longer readily test this.

@SuperSandro2000
Copy link
Member

I basically do the same as you have described but for every database creation to make sure, it never fails.

@Ma27
Copy link
Member

Ma27 commented Jun 18, 2024

I'll close this now since I don't see anything actionable here. To quote https://www.postgresql.org/docs/current/sql-altercollation.html

[...] but it can happen in legitimate circumstances, such as when upgrading the operating system to a new major version

This is the case here. This is a manual upgrade step.

There are ideas how to do database upgrades here, but as long as we don't have such a mechanism in place, there's nothing we can do.

Hence, won't fix.

@DerDennisOP
Copy link
Contributor

DerDennisOP commented Jun 21, 2024

Why not make at least an addtional option to the postgresql package. I had this error as well and I totally would like my database to automatically REINDEX ... instead of breaking. It took me some time to fix this for something that shoudn't break on an update, thus making the system not reproducible anymore...

@DerDennisOP DerDennisOP reopened this Jun 21, 2024
@wolfgangwalther
Copy link
Contributor

It took me some time to fix this for something that shoudn't break on an update

Well, but the point is, that this should break. Otherwise you have silent corruption.

If we can find a way to do this conditionally only on an upgrade, that would be ok. But doing a REINDEX on every start of the database is not.

@Ma27
Copy link
Member

Ma27 commented Jun 21, 2024

I already explained why that's a bad idea and it was outlined why that's happening.

It took me some time to fix this for something that shoudn't break on an update

The reason why it breaks instead of giving just a warning is because of ensureDatabases which does a CREATE DATABASE on every startup of postgresql. It's btw the second time this whole ensure* antipattern is causing major headaches[1] on NixOS releases.

Anyways, there are ideas on how to properly tackle state management for postgresql[2]. As you can see re. the solutions proposed, there's no reasonable way to solve this with the current implementation which is one sign of it being an antipattern.

[1] #266270
[2] #206467 as a start

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants