Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs,macos: Not possible to recover from fs.watch limit error #43267

Closed
gluxon opened this issue May 31, 2022 · 3 comments
Closed

fs,macos: Not possible to recover from fs.watch limit error #43267

gluxon opened this issue May 31, 2022 · 3 comments
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding. macos Issues and PRs related to the macOS platform / OSX.

Comments

@gluxon
Copy link
Contributor

gluxon commented May 31, 2022

Version

v18.2.0

Platform

Darwin hostname 21.5.0 Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64 x86_64

Subsystem

fs

What steps will reproduce the bug?

The program below recursively watches directories in the current working directory. While it could be written more efficiently with the { recursive: true } fs.watch option, this script does not use the recursive flag to make the repro easier.

import fs from "node:fs"
import path from "node:path"

async function main() {
  let didAnyWatcherError = false;

  const watchers = [];

  for await (const directory of getDirsRecursive(process.cwd())) {
    if (didAnyWatcherError) {
      break;
    }

    console.log(`Watching: ${directory}`);
    const watcher = fs.watch(directory, { persistent: true }, logChange);
    watchers.push(watcher);

    watcher.on("error", (err) => {
      // All watchers error after the first one errors.
      // It appeares all watchers error after the first one errors.
      if (didAnyWatcherError) {
        return;
      }

      didAnyWatcherError = true;

      console.error(`First watch failure on: ${directory}`);
      console.error(`Attempted to create ${watchers.length} total watchers before error`);
      console.error(err);
    });
  }

  if (didAnyWatcherError) {
    console.log(`Closing ${watchers.length} watchers. This may take a few minutes.`);
    while (watchers.length > 0) {
      const watcher = watchers.pop();
      watcher.close();
    }
    console.log("All existing watchers have now been closed.");

    // Try watching again:
    console.log(`Watching: ${process.cwd()}`);
    fs.watch(process.cwd());
  }
}

main();

async function* getDirsRecursive(dir) {
  for (const dirEntry of await fs.promises.readdir(dir, { withFileTypes: true })) {
    const dirPath = path.join(dir, dirEntry.name);
    if (dirEntry.isDirectory()) {
      yield dirPath;
      yield* getDirsRecursive(dirPath);
    }
  }
}

function logChange(event, fileName) {
  console.log(`${event}: ${fileName}`);
}

On my MacBook Pro, this crashes at ~4200±200 watchers.

Attempted to create 4178 total watchers before error
Error: EMFILE: too many open files, watch
    at FSWatcher._handle.onchange (node:internal/fs/watchers:204:21)
    at FSEvent.callbackTrampoline (node:internal/async_hooks:130:17) {
  errno: -24,
  syscall: 'watch',
  code: 'EMFILE',
  filename: null
}

For an easy repro with folders already created:

git clone [email protected]:gluxon/typescript-emfile-repro
npm run repro-node

How often does it reproduce? Is there a required condition?

This issue reproduces consistently on macOS.

What is the expected behavior?

It's expected that an EMFILE: too many open files error appears eventually, but it's surprising all existing watchers fail as a result.

In addition to existing watchers failing, it seems impossible to create new watchers even after closing the existing ones. This results in an error that's impossible to recover from.

What do you see instead?

  • All existing watchers throw an EMFILE error.
  • New watchers throw an EMFILE.
  • Closing existing watchers doesn't seem to help.

Additional information

This was first noticed over in microsoft/TypeScript#47546.

@gluxon gluxon changed the title Not possible to recover from fs.watch limit error on macOS fs,macos: Not possible to recover from fs.watch limit error May 31, 2022
@VoltrexKeyva VoltrexKeyva added fs Issues and PRs related to the fs subsystem / file system. macos Issues and PRs related to the macOS platform / OSX. labels May 31, 2022
@bnoordhuis
Copy link
Member

I'm aware of the problem but it's probably unfixable. It's been observed that once the user-space part of macOS's FSEvents.framework gets in a bad state, it never recovers.

(And since FSEvents.framework is a no-source black box, it's impossible to debug effectively.)

If you know of a fix or a workaround, please open a libuv PR and cc me.

@bnoordhuis bnoordhuis added the libuv Issues and PRs related to the libuv dependency or the uv binding. label Jun 1, 2022
@gluxon
Copy link
Contributor Author

gluxon commented Jun 3, 2022

I appreciate the fast response Ben, thanks! I suspected there was a libuv change required, but did not expect FSEvents.framework itself to be the root cause here.

Going to close this issue for now, since I agree FSEvents.framework is difficult to debug. I may open a disassembler and poke around, but don't think I'll have the bandwidth to do that in the near future.

@gluxon gluxon closed this as completed Jun 3, 2022
@gluxon
Copy link
Contributor Author

gluxon commented May 29, 2024

Coming back to this issue about 2 years later, I'm noticing 2 things.

In addition to existing watchers failing, it seems impossible to create new watchers even after closing the existing ones. This results in an error that's impossible to recover from.

This no longer seems to be true as of macOS Sonoma 14.5. After creating 4097 watchers, the EMFILE errors are still present, but closing enough watchers to get below 4096, I'm able to set up more watchers again.

I've made a few changes to the repro for this. Note that the watchers start error'ing on the 4097th watcher, but closing them all allows a newly spawned watcher to start again.

Screenshot 2024-05-28 at 5 06 09 PM

I've also noticed that the limit is 4096 exactly. After adding a sleep statement in-between fs.watch calls, the repro consistently throws on the 4097th watcher.

Hoping this comment is useful for anyone who finds this issue in the future. These errors no longer put Node.js in an unrecoverable state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. libuv Issues and PRs related to the libuv dependency or the uv binding. macos Issues and PRs related to the macOS platform / OSX.
Projects
None yet
Development

No branches or pull requests

3 participants