Opening storage failed open DB in /home/tidb/deploy/prometheus2.0.0.data.metrics: Locked by other process #7444

fdsmax · 2018-08-20T13:41:10Z

hi, I have just deployed the latest TiDB today(20th Aug 2018) in our (CentOS 7) cluster through Ansible. Every step successfully completed but finally Prometheus did not start with 'Opening storage failed open DB in /home/tidb/deploy/prometheus2.0.0.data.metrics: Locked by other process' error.

Could you pl suggest me a fix?

prometheus.log under /home/tidb/deploy/log:

level=info ts=2018-08-20T13:26:57.272676241Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.1, branch=HEAD, revision=bc6058c81272a8d938c05e75607371284236aadc)"
level=info ts=2018-08-20T13:26:57.272719078Z caller=main.go:221 build_context="(go=go1.10, user=root@149e5b3f0829, date=20180314-14:15:45)"
level=info ts=2018-08-20T13:26:57.272732113Z caller=main.go:222 host_details="(Linux 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 tlyuk3 (none))"
level=info ts=2018-08-20T13:26:57.272741842Z caller=main.go:223 fd_limits="(soft=1000000, hard=1000000)"
level=info ts=2018-08-20T13:26:57.274849599Z caller=web.go:382 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-08-20T13:26:57.274812159Z caller=main.go:504 msg="Starting TSDB ..."
level=info ts=2018-08-20T13:26:57.275377658Z caller=main.go:398 msg="Stopping scrape discovery manager..."
level=info ts=2018-08-20T13:26:57.27540428Z caller=main.go:411 msg="Stopping notify discovery manager..."
level=info ts=2018-08-20T13:26:57.275416378Z caller=main.go:432 msg="Stopping scrape manager..."
level=info ts=2018-08-20T13:26:57.275435374Z caller=manager.go:460 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-08-20T13:26:57.275427416Z caller=main.go:394 msg="Scrape discovery manager stopped"
level=info ts=2018-08-20T13:26:57.275447659Z caller=manager.go:466 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-08-20T13:26:57.27545168Z caller=main.go:426 msg="Scrape manager stopped"
level=info ts=2018-08-20T13:26:57.275457946Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
level=info ts=2018-08-20T13:26:57.275465096Z caller=main.go:407 msg="Notify discovery manager stopped"
level=info ts=2018-08-20T13:26:57.275471446Z caller=main.go:573 msg="Notifier manager stopped"
level=error ts=2018-08-20T13:26:57.27691684Z caller=main.go:582 err="Opening storage failed open DB in /home/tidb/deploy/prometheus2.0.0.data.metrics: Locked by other process"
level=info ts=2018-08-20T13:26:57.276949884Z caller=main.go:584 msg="See you next time!"
level=info ts=2018-08-20T13:27:12.523126812Z caller=main.go:220 msg="Starting Prometheus" version="(version=2.2.1, branch=HEAD, revision=bc6058c81272a8d938c05e75607371284236aadc)"
level=info ts=2018-08-20T13:27:12.523175156Z caller=main.go:221 build_context="(go=go1.10, user=root@149e5b3f0829, date=20180314-14:15:45)"
level=info ts=2018-08-20T13:27:12.523188617Z caller=main.go:222 host_details="(Linux 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 tlyuk3 (none))"
level=info ts=2018-08-20T13:27:12.523198563Z caller=main.go:223 fd_limits="(soft=1000000, hard=1000000)"
level=info ts=2018-08-20T13:27:12.524247793Z caller=main.go:504 msg="Starting TSDB ..."
level=info ts=2018-08-20T13:27:12.524287548Z caller=web.go:382 component=web msg="Start listening for connections" address=:9090
level=info ts=2018-08-20T13:27:12.524652316Z caller=main.go:398 msg="Stopping scrape discovery manager..."
level=info ts=2018-08-20T13:27:12.52466487Z caller=main.go:411 msg="Stopping notify discovery manager..."
level=info ts=2018-08-20T13:27:12.524670404Z caller=main.go:432 msg="Stopping scrape manager..."
level=info ts=2018-08-20T13:27:12.524677821Z caller=manager.go:460 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-08-20T13:27:12.52468619Z caller=manager.go:466 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-08-20T13:27:12.524692146Z caller=notifier.go:512 component=notifier msg="Stopping notification manager..."
level=info ts=2018-08-20T13:27:12.524702167Z caller=main.go:394 msg="Scrape discovery manager stopped"
level=info ts=2018-08-20T13:27:12.524715107Z caller=main.go:426 msg="Scrape manager stopped"
level=info ts=2018-08-20T13:27:12.524714587Z caller=main.go:573 msg="Notifier manager stopped"
level=info ts=2018-08-20T13:27:12.524726938Z caller=main.go:407 msg="Notify discovery manager stopped"
level=error ts=2018-08-20T13:27:12.52529427Z caller=main.go:582 err="Opening storage failed open DB in /home/tidb/deploy/prometheus2.0.0.data.metrics: Locked by other process"
level=warn ts=2018-08-20T13:27:12.525317008Z caller=web.go:461 component=web msg="error serving gRPC" err="grpc: the server has been stopped"
level=info ts=2018-08-20T13:27:12.525323413Z caller=main.go:584 msg="See you next time!"

fdsmax · 2018-08-20T15:45:02Z

I have noticed a similar issue here discussed in the Prometheus thread and I am not sure if thats related to this one or a different issue. Any thoughts or suggestions @shenli @c4pt0r to fix this blocking issue? Thanks and regards.

fdsmax · 2018-08-20T16:28:43Z

hi @shenli and @c4pt0r , I have deleted the lock file present at /home/tidb/deploy/prometheus2.0.0.data.metrics and re-run the ansible installation without any changes and it worked perfectly. Not sure why it created the lock on first time install on a clean machine and all configurations set correctly. It may be worth figuring out if anyone else too come across the same issue. For now my issue is resolved and hence I am ok for this issue to be closed. Thanks and Cheers.

LinuxGit · 2018-08-21T04:22:30Z

Thanks for your feedback.
Prometheus will create pid-based lock file in data directory when starting by default. Prometheus also has a --storage.tsdb.no-lockfile Do not create lockfile in data directory. flag.
Ansible use systemd for process supervision. There's no errors if I start the prometheus service again.
If I start prometheus binary directly, it will report error: "Error starting web server: listen tcp :9090: bind: address already in use".
Only when I start prometheus binary directly and use another listen port and the same data directory, it will report error: Locked by other process''.
If you are still experiencing this issue, please reopen this issue.

fdsmax · 2018-08-21T12:38:46Z

Thank you for your note @LinuxGit . Just for note that its not clear why Prometheus got stuck when the current TIDB Ansible installation was done on a fresh servers in the clean cluster. We did not start prometheus binary manually or did not do anything else with manual steps; it was just a straightforward automated install on a clean cluster with already tested configuration. This is just for your note. We will keep this thread closed as we have resolved this with a manual intervention to remove the lock file to complete the automated TiDB Ansible install. Thanks and regards.

LinuxGit closed this as completed Aug 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opening storage failed open DB in /home/tidb/deploy/prometheus2.0.0.data.metrics: Locked by other process #7444

Opening storage failed open DB in /home/tidb/deploy/prometheus2.0.0.data.metrics: Locked by other process #7444

fdsmax commented Aug 20, 2018

fdsmax commented Aug 20, 2018

fdsmax commented Aug 20, 2018 •

edited

Loading

LinuxGit commented Aug 21, 2018

fdsmax commented Aug 21, 2018

Opening storage failed open DB in /home/tidb/deploy/prometheus2.0.0.data.metrics: Locked by other process #7444

Opening storage failed open DB in /home/tidb/deploy/prometheus2.0.0.data.metrics: Locked by other process #7444

Comments

fdsmax commented Aug 20, 2018

fdsmax commented Aug 20, 2018

fdsmax commented Aug 20, 2018 • edited Loading

LinuxGit commented Aug 21, 2018

fdsmax commented Aug 21, 2018

fdsmax commented Aug 20, 2018 •

edited

Loading