Add encoding charset sniffing #736

boarwell · 2021-10-02T05:00:11Z

Please ensure that your pull request fulfills these requirements:

The pull request is being made against the master branch
Tests for the changes have been added (for bug fixes / features)

What is the purpose of this pull request? (bug fix, enhancement, new feature,...)

Resolves #426

What changes did you make?

Added charset sniffing for text files.

Is there anything you'd like reviewers to focus on?

To sniff charset, text files need to be read in sync (fs.readFileSync), rather than in stream (fs.createReadStream). This change would cause more memory usage and thus, if a file exceeds buffer max length (about 4GB on 64-bit architectures¹), Node.js will throw an error.

~~However, I think there are not so many cases that a plain text file is over 4GB and this is acceptable for charset sniffing. What do you think about this?~~

UPDATE:
I added file size checking on 4392921, so the above problem is fixed.

Please provide testing instructions, if applicable:

npm run test

https://nodejs.org/api/buffer.html#buffer_buffer_constants_max_length ↩

boarwell · 2021-10-05T16:53:50Z

The CI test is failing in Node.js 10.x environments, but since 2021-04-30¹ version 10.x has been EOL. So, I created another PR (#739) to remove it from the CI environments.

Once that PR is merged, I will rebase this branch onto master and update this PR.

https://github.com/nodejs/Release#end-of-life-releases ↩

thornjad · 2021-10-05T22:35:21Z

What about these changes fail in Node 10? I'm not opposed to dropping support, but I don't want to without good reason

boarwell · 2021-10-06T04:07:37Z

@thornjad

What about these changes fail in Node 10? I'm not opposed to dropping support, but I don't want to without good reason

I see. That makes sense. I looked into it and found that the problem seemed to be where the buffer read into memory for sniffing was being reused to create the response stream. Commenting out the line made everything fine.

https://github.com/boarwell/http-server/blob/4392921610551087231db925d3ad65c8907e3e83/lib/core/index.js#L240

Some errors said that "Cannot set headers after they are sent to the client"¹, so it seems there may be differences in the event loop (or something like that) between v10.x and newer versions, but I'm not sure.

So, I think we have two options:

(Minor Update): Stop reusing loaded buffer. It will read files two times, the first is for sniffing, the second is for creating a response.
(Major Update): Drop v10.x, and leave the code as-is.

I think Option 1 may be better at this time because it doesn't matter to read text files twice in most cases and it can ship this as a minor update. But also, dropping support for unmaintained versions itself can be a good reason for a major update. What do you think?

https://github.com/http-party/http-server/pull/736/checks?check_run_id=3777027455 ↩

boarwell · 2021-10-07T03:03:45Z

Rebased onto master

thornjad · 2021-10-11T16:08:01Z

So, I think we have two options:

(Minor Update): Stop reusing loaded buffer. It will read files two times, the first is for sniffing, the second is for creating a response.
(Major Update): Drop v10.x, and leave the code as-is.

I think Option 1 may be better at this time because it doesn't matter to read text files twice in most cases and it can ship this as a minor update. But also, dropping support for unmaintained versions itself can be a good reason for a major update. What do you think?

Yeah I think I'm actually leaning toward dropping support if that's the case, I'd rather drop 10 than introduce reading files twice. You're right that it shouldn't matter too much, but less I/O is always better.

Some errors said that "Cannot set headers after they are sent to the client", so it seems there may be differences in the event loop (or something like that) between v10.x and newer versions, but I'm not sure.

That error message can (but not always) mean the server timed out, and so once the request finally completes, it tries to send a response which has already been sent. I've seen the same in Apache, it may mean we have some tests which are too slow.

thornjad · 2021-10-11T16:29:11Z

Node 10 dropped in master! Another merge/rebase should fix the tests!

boarwell · 2021-10-14T07:44:34Z

That error message can (but not always) mean the server timed out, and so once the request finally completes, it tries to send a response which has already been sent. I've seen the same in Apache, it may mean we have some tests which are too slow.

Thanks! This helped me understand what had happened. Though I rebased this branch onto master, there are still some errors on 12.x, macOS-latest¹. It says:

  394 passing (31s)
  4 failing

  1) test/cli.test.js setting mimeTypes via cli - directly test unfinished:
     Error: test unfinished
      at Object.<anonymous> (test/cli.test.js:84:1)

  2) test/cli.test.js setting mimeTypes via cli - directly test count !== plan:

      test count !== plan
      + expected - actual

      -1
      +4
      
  

  3) test/cli.test.js child test left in queue: t.test --proxy requires you to specify a protocol:
     child test left in queue: t.test --proxy requires you to specify a protocol
  

  4) test/cli.test.js test count !== plan:

      test count !== plan
      + expected - actual

      -4
      +3

According to the messages (eg. Error: test unfinished, child test left in queue and test count !== plan), it seems that some tests are failing due to timeouts as you mentioned above. I think I could fix this by reducing the test repetition times (like t.plan(4)). Do you have any idea?

https://github.com/http-party/http-server/pull/736/checks?check_run_id=3891594723 ↩

thornjad · 2021-10-18T16:42:03Z

Yeah those are some non-deterministic errors we're getting at random, usually re-running the tests fixes them. I've been trying to weed those out, including adding more plans. I've managed to make them better recently, but they still pop up. I reran tests for this PR and they're passing now.

But there's a conflict now to clear up, probably just need to recreate the package-lock again. Once we get this merged I'll probably make the next major release!

boarwell · 2021-10-19T05:00:06Z

Fixed the conflict and passed all tests! I'm looking forward to the next release!!

thornjad

Love it! This is a great improvement, thank you!!

boarwell force-pushed the encoding-charset branch from e0bbcbb to 7d576c9 Compare October 2, 2021 05:01

boarwell marked this pull request as ready for review October 2, 2021 05:28

boarwell force-pushed the encoding-charset branch from d0eceeb to 4392921 Compare October 2, 2021 15:16

thornjad mentioned this pull request Oct 5, 2021

Drop Node.js 10 support #739

Merged

2 tasks

thornjad added the major version Major, potentially breaking, change label Oct 5, 2021

boarwell force-pushed the encoding-charset branch from 4392921 to 373edef Compare October 7, 2021 03:00

thornjad added the enhancement label Oct 13, 2021

boarwell force-pushed the encoding-charset branch from 373edef to 732400c Compare October 14, 2021 07:12

thornjad mentioned this pull request Oct 15, 2021

Can I specify the response charset? #296

Open

boarwell added 3 commits October 19, 2021 13:43

Add encoding sniffing

b32b8a4

Add tests for charset

5dfc780

Fix encoding sniffing strategy

b0cb863

boarwell force-pushed the encoding-charset branch from 732400c to b0cb863 Compare October 19, 2021 04:47

thornjad approved these changes Oct 19, 2021

View reviewed changes

thornjad merged commit cf17c2f into http-party:master Oct 19, 2021

boarwell deleted the encoding-charset branch October 19, 2021 16:59

zbynek mentioned this pull request Jan 23, 2022

Error: Cannot set headers after they are sent to client, occurs when trying to serve index.html, only occurs in v14.0.0 #756

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add encoding charset sniffing #736

Add encoding charset sniffing #736

boarwell commented Oct 2, 2021 •

edited

Loading

boarwell commented Oct 5, 2021 •

edited

Loading

thornjad commented Oct 5, 2021

boarwell commented Oct 6, 2021 •

edited

Loading

boarwell commented Oct 7, 2021

thornjad commented Oct 11, 2021

thornjad commented Oct 11, 2021

boarwell commented Oct 14, 2021

thornjad commented Oct 18, 2021 •

edited

Loading

boarwell commented Oct 19, 2021

thornjad left a comment

Add encoding charset sniffing #736

Add encoding charset sniffing #736

Conversation

boarwell commented Oct 2, 2021 • edited Loading

Footnotes

boarwell commented Oct 5, 2021 • edited Loading

Footnotes

thornjad commented Oct 5, 2021

boarwell commented Oct 6, 2021 • edited Loading

Footnotes

boarwell commented Oct 7, 2021

thornjad commented Oct 11, 2021

thornjad commented Oct 11, 2021

boarwell commented Oct 14, 2021

Footnotes

thornjad commented Oct 18, 2021 • edited Loading

boarwell commented Oct 19, 2021

thornjad left a comment

Choose a reason for hiding this comment

boarwell commented Oct 2, 2021 •

edited

Loading

boarwell commented Oct 5, 2021 •

edited

Loading

boarwell commented Oct 6, 2021 •

edited

Loading

thornjad commented Oct 18, 2021 •

edited

Loading