Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Bug report) Server crashing #3240

Open
sigaloid opened this issue Oct 25, 2022 · 7 comments
Open

(Bug report) Server crashing #3240

sigaloid opened this issue Oct 25, 2022 · 7 comments

Comments

@sigaloid
Copy link
Contributor

Trilium Version

0.56.1

What operating system are you using?

Other Linux

What is your setup?

Local + server sync

Operating System Version

Debian

Description

When running the server for a while, it will simply crash. There's nothing on the logs or anything like that, it just returns a 503 on the / route. It seems to happen to some users more than others. A simple restart fixes it. I'm not sure how to begin debugging this - maybe a crash log of some kind. Is there anything you can recommend for me to do to discover the root cause of this? Once it occurs, what can I do to locate the error or anything?

@zadam
Copy link
Owner

zadam commented Oct 26, 2022

Hi, this is probably something node.js related. Generally during a crash it should output something to stderr or stdout, perhaps outputting that to some file would help with debugging.

@sigaloid
Copy link
Contributor Author

There's no stderr and stdout looks normal - it just simply stops at some point. No errors printed at all before it stops

@zadam
Copy link
Owner

zadam commented Oct 28, 2022

Hmm, that's strange ... could perhaps OS be killing the process, maybe because it's out of memory or something?

@sigaloid
Copy link
Contributor Author

sigaloid commented Nov 3, 2022

The process is still using the same equivalent memory post-crash - does that out-rule OOM as a possibility?

Oddly enough it happens to multiple containers at once making me think it is an OOM somehow but I have only 60% of RAM used currently. I don't know how this can happen to multiple at once - they're properly container-ified of course.

Also, it seems to commonly happen to a few users - though I'd expect it to be users who have the largest databases, it mostly is not.

PS: There is now an error appearing consistently before the crash

node:events:505
      throw er; // Unhandled 'error' event
      ^

RangeError: Invalid WebSocket frame: MASK must be set
    at Receiver.getInfo (/usr/src/app/node_modules/ws/lib/receiver.js:289:16)
    at Receiver.startLoop (/usr/src/app/node_modules/ws/lib/receiver.js:136:22)
    at Receiver._write (/usr/src/app/node_modules/ws/lib/receiver.js:83:10)
    at writeOrBuffer (node:internal/streams/writable:389:12)
    at _write (node:internal/streams/writable:330:10)
    at Receiver.Writable.write (node:internal/streams/writable:334:10)
    at Socket.socketOnData (/usr/src/app/node_modules/ws/lib/websocket.js:1272:35)
    at Socket.emit (node:events:527:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
Emitted 'error' event on WebSocket instance at:
    at Receiver.receiverOnError (/usr/src/app/node_modules/ws/lib/websocket.js:1158:13)
    at Receiver.emit (node:events:527:28)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21) {
  code: 'WS_ERR_EXPECTED_MASK',
  [Symbol(status-code)]: 1002
}
// Seems to be after restart - process restarts server but nothing works
No USER_UID specified, leaving 1000
No USER_GID specified, leaving 1000
Slow query took 20ms: SELECT page_count * page_size / 1000 as size FROM pragma_page_count(), pragma_page_size()
DB size: X KB
Trusted reverse proxy: false
App HTTP server starting up at port X
{
  "appVersion": "0.56.2",
  "dbVersion": 197,
  "syncVersion": 26,
  "buildDate": "2022-10-27T22:41:15+02:00",
  "buildRevision": "6c37f2ce7117f5df03ce9e79897a5520e36701a0",
  "dataDirectory": "/home/node/trilium-data",
  "clipperProtocolVersion": "1.0",
  "utcDateTime": "2022-11-03T14:42:05.660Z"
}

This appears in the logs of every container that crashes.

More investigating and it seems related to this - but they mention ws 8.4.0 fixing the issue, which is the version used in Trilium.

websockets/ws/issues/1315

@zadam
Copy link
Owner

zadam commented Nov 5, 2022

0.56.2 uses ws 8.9.0 so that might be worth a try. I thought it could be perhaps related to the browser people use, but if it's happening at the same moment, then that's rather unlikely. Or perhaps some reverse proxy?

@sigaloid
Copy link
Contributor Author

sigaloid commented Nov 5, 2022

These containers are indeed running 0.56.2 - I wonder if there's some dependency relying on the old version, if that's possible.

The reverse proxy is my suspicion currently - or maybe docker healthcheck. Maybe some health check is being interpreted as a websockets connection accidentally, though I can't imagine how it would happen randomly.

I do suppose that this is not necessarily a bug caused by Trilium but I think the behavior caused (ie, server restarts and it is in invalid state, unable to respond to requests) could still be considered a bug.

@sigaloid
Copy link
Contributor Author

sigaloid commented Nov 8, 2022

Okay, I suspect this is unrelated to Trilium. It's definitely caused by another component in the stack.

Except for this: when a websockets connection is forcibly broken off and interrupted, the Trilium server will be in an invalid state that only a restart solves. No data corruption or anything, though. I'm unsure if I should report this bug because it is like saying "the server crashes when I hit it the wrong way" - maybe I shouldn't hit it like that. Regardless, I can say that this is not an underlying issue with Trilium crashing, just a symptom of another issue. Shall I file a bug for that websockets problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants