fs.readFileSync(filename, 'utf8') doesn't strip BOM markers #1918

dobesv · 2011-10-21T11:43:25Z

Environment: cloud9ide.com, node version 0.4.5

If I read a file using fs.readFileSync(filename, 'utf8') that is encoded using UTF8 with BOM, the BOM is included in the resulting string.

I think the routine to decode UTF8 is supposed to automatically strip the BOM from the start of the stream before returning the string.

dobesv · 2011-10-21T11:51:03Z

Workaround:

body = body.replace(/^\uFEFF/, '');

After reading a UTF8 file where you are uncertain whether it may have a BOM marker in it.

koichik · 2011-10-21T13:56:58Z

If fs.readFileSync() strips the BOM automatically,

var text = fs.readFileSync('foo.tx', 'utf8');
fs.writeFileSync('foo.txt', text, 'utf8');

The BOM is lost...

dobesv · 2011-10-24T02:23:52Z

Hmm maybe it is something that was fixed in a more recent version of node.js?

koichik · 2011-10-24T13:01:43Z

No, I mean the BOM was lost from a file ('foo.txt') after fs.writeFileSync().
fs.writeFileSync() cannot add the BOM automatically because it depends on the application whether the BOM is necessary.
Therefore, I think that the BOM should not be removed automatically.

dobesv · 2011-11-04T04:07:17Z

@koichik - can you clarify why you closed this issue? If I read a utf-8 file into a string it should not have a BOM in it, that's simply how UTF-8 decoding works, the BOM is not included in the decoded string.

Applications that expect the BOM to be present can add it back on when they write out the file, or to preserve the BOM they can read/write the file as binary.

dobesv · 2011-11-04T04:34:02Z

OK I read a huge argument about this subject on the python mailing list and a bug report on the JVM systems and I see that it is more controversial than I had originally thought.

So, never mind ... looks like it's up to programmers to remove the BOM from UTF-8 files themselves.

What they did in python was interesting - they added a new encoding scheme called 'utf8-sig' which will strip the bom if present and emit a BOM when encoding to bytes. This allows the programmer to decide whether to use a BOM or not.

See http://docs.python.org/library/codecs.html:

"On encoding the utf-8-sig codec will write 0xef, 0xbb, 0xbf as the first three bytes to the file. On decoding utf-8-sig will skip those three bytes if they appear as the first three bytes in the file."

Do you think that approach would be acceptable for use in node?

koichik · 2011-11-04T07:15:14Z

You can easily write the utility (e.g. myfs.readUtf8FileSync()) in a user land.
So... I do not think that it is necessary to include 'utf8-sig' in Nodoe core.

MylesPenlington · 2012-05-24T02:35:54Z

If the ut8 file has a BOM, then in the latest node (0.6.18) it leaves the first characer of the string as unicode 65279 - which is 0xFE 0xFF - which is not what was read (that is the utf16 BOM?) - as the utf8 signature on a utf8 file is 0xef, 0xbb, 0xbf - so the current file reading does not really make sense at all.

dobesv · 2012-05-24T22:16:59Z

Hi Myles,

It is confusing but it makes sense in way; when you decode those three
bytes using the UTF decoding algorithm you get the 16-bit BOM as the first
single character.

On Wed, May 23, 2012 at 7:35 PM, MylesPenlington <
[email protected]

wrote:

If the ut8 file has a BOM, then in the latest node (0.6.18) it leaves the
first characer of the string as unicode 65279 - which is 0xFE 0xFF - which
is not what was read (that is the utf16 BOM?) - as the utf8 signature on a
utf8 file is 0xef, 0xbb, 0xbf - so the current file reading does not really
make sense at all.

Reply to this email directly or view it on GitHub:
#1918 (comment)

…s with fs.readFileSync with utf-8 encoding, but node.js keeps BOM in the returned string (see nodejs/node-v0.x-archive#1918) which is detected by jshint as unsafe characters.

tracker1 · 2014-03-24T19:46:04Z

https://www.npmjs.org/package/bomstrip

TimothyGu · 2015-02-03T20:45:43Z

Also https://www.npmjs.org/package/strip-bom

Taking the workaround specified here nodejs/node-v0.x-archive#1918

koichik closed this as completed Oct 24, 2011

koichik mentioned this issue Sep 24, 2012

Add 'utf8-sig' encoding option. #4039

Closed

peol mentioned this issue Oct 8, 2012

Make sure to remove BOM from file on compile SlexAxton/require-handlebars-plugin#61

Merged

nrkn mentioned this issue Mar 8, 2013

When bundling files encoded as UTF-8 with a byte order marker, BOMs are retained browserify/browserify#313

Closed

swijnands mentioned this issue Aug 9, 2013

Extraneous UTF-8 BOM markers are still visible after partials are loaded from disk. tj/consolidate.js#124

Closed

SLaks mentioned this issue May 14, 2014

Errors on byte order mark ASCII code 65279 postcss/postcss#46

Closed

palamccc mentioned this issue Jun 23, 2014

Byte Order Mark should be stripped. jonkemp/gulp-useref#34

Closed

akkumanova mentioned this issue Jul 29, 2014

Remove potential Unicode BOM spalger/gulp-jshint#65

Merged

studiochris mentioned this issue Aug 19, 2014

Add support for docx xml containing a BOM mwilliamson/mammoth.js#27

Closed

jimnoble mentioned this issue Sep 4, 2014

Strip BOM character from templates saved as 'UTF-8 with signature' ericf/express-handlebars#77

Closed

ashtuchkin mentioned this issue Sep 9, 2014

utf-8 to gbk is error ashtuchkin/iconv-lite#78

Closed

shinnn mentioned this issue Sep 27, 2014

Support fs.readFile options / Strip byte order mark azer/read-json#1

Merged

shinnn mentioned this issue Dec 16, 2014

mv try catch out of main function azer/read-json#3

Closed

dervus mentioned this issue Apr 9, 2015

Junk at the beginning of first tag nodeca/js-yaml#179

Closed

mousetraps mentioned this issue May 3, 2015

Problem reading JSON files microsoft/nodejstools#93

Closed

johndkane mentioned this issue May 8, 2015

Handling Unicode BOM node-config/node-config#215

Closed

exos mentioned this issue May 28, 2015

Save files with BOM? exos/node-webkit-fdialogs#3

Closed

jonoward mentioned this issue Jun 3, 2015

Byte Order Mark included in fulfilled string from SystemLoader.fetch when executed in node ModuleLoader/es-module-loader#388

Closed

dufrannea mentioned this issue Apr 15, 2016

Cannot read configuration when encoding is UTF8 with BOM. JoshuaKGoldberg/TSLint.MSBuild#17

Closed

This was referenced Apr 28, 2016

Including a file doesn't remove its unicode BOM haoxins/gulp-file-include#102

Open

Prefixing unicode byte order marks are stripped from included files wiledal/gulp-include#62

Merged

Tragetaschen mentioned this issue Jul 13, 2016

How is Universal going to handle UTF-8 with a BOM? angular/universal#476

Closed

8 tasks

rajkumar42 mentioned this issue Jul 18, 2016

VS Code fails to generate launch.json when there are ignorable errors in project.json dotnet/vscode-csharp#577

Closed

rajkumar42 added a commit to rajkumar42/omnisharp-vscode that referenced this issue Jul 18, 2016

fs.readFileSync(filename, 'utf8') doesn't strip BOM markers

2dff400

Taking the workaround specified here nodejs/node-v0.x-archive#1918

rajkumar42 mentioned this issue Jul 18, 2016

fs.readFileSync(filename, 'utf8') doesn't strip BOM markers dotnet/vscode-csharp#580

Merged

rajkumar42 added a commit to dotnet/vscode-csharp that referenced this issue Jul 18, 2016

fs.readFileSync(filename, 'utf8') doesn't strip BOM markers (#580)

2801e0f

Taking the workaround specified here nodejs/node-v0.x-archive#1918

mk-pmb mentioned this issue Jul 19, 2016

UTF with BOM issue browserify/brfs#69

Closed

art-in mentioned this issue Sep 5, 2016

Failed to parse tutorial config.json in utf8 BOM jsdoc/jsdoc#1256

Closed

bajtos mentioned this issue Sep 15, 2016

Remove Unicode BOM strongloop/strong-globalize#95

Merged

MFry mentioned this issue Dec 12, 2016

Cannot parse the config file jsdoc/jsdoc#1297

Closed

abernix mentioned this issue Dec 14, 2016

settings.json cannot parse a file with BOM (Byte-Of-Mark) e.g. UTF-8 meteor/meteor#5180

Closed

Blackbaud-BrandonJones mentioned this issue May 16, 2017

added a replace string to remove BOM from json file before parse blackbaud/skyux-builder#153

Merged

domenic mentioned this issue Jul 3, 2017

jsdom bug when a BOM is a present in an HTML file jsdom/jsdom#1898

Closed

itskdog mentioned this issue Jul 19, 2017

Detect and remove BOM on UTF-8 settings files CatBlock/catbot#1

Open

jheeffer mentioned this issue Apr 26, 2018

Default encoding UTF-8: with or without BOM? frictionlessdata/datapackage#613

Closed

riddla mentioned this issue May 11, 2018

Parsed data has keys in quotations? mholt/PapaParse#407

Closed

richmahn mentioned this issue Aug 24, 2018

Fix for parsing CEB ULB manifest.yaml file which has BOM marker unfoldingWord/tc-source-content-updater#21

Merged

Indigo744 mentioned this issue Dec 4, 2018

Render fails for SCSS file encoded in UTF8-BOM jgranstrom/sass-extract#40

Open

gardnerjr mentioned this issue Jan 16, 2019

Update JSON validation test to handle UTF-8 Byte-Order Mark (BOM) error microsoft/Application-Insights-Workbooks#89

Closed

gardnerjr mentioned this issue Feb 23, 2019

Users/echong/readme microsoft/Application-Insights-Workbooks#125

Merged

pachi mentioned this issue May 14, 2019

Eliminar BOM antes de validar energiacte/visorxml#79

Open

JuergenRB mentioned this issue Sep 27, 2021

Remove BOM if necessary thibault-vanderseypen/vsce-i18n-json-editor#5

Merged

duhmojo mentioned this issue Jan 19, 2022

UTF-8-BOM string parsing - header first name incorrectly enclosed in a double quote mholt/PapaParse#840

Closed

vjekob mentioned this issue May 27, 2022

Workspace issues vjekob/al-objid#36

Closed

Clonkex mentioned this issue Jul 6, 2022

skipEmptyLines unable to remove ZERO WIDTH SPACE mholt/PapaParse#917

Closed

richfitz mentioned this issue Aug 9, 2023

Allow bom markers mrc-ide/wodin#168

Merged

t3chguy mentioned this issue Jul 19, 2024

Element Desktop does not understand config.json saved with UTF8 BOM encoding element-hq/element-desktop#1788

Open

lyonsil mentioned this issue Aug 9, 2024

loading json contributions files fails if the file has a BOM at the start paranext/paranext-core#1039

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fs.readFileSync(filename, 'utf8') doesn't strip BOM markers #1918

fs.readFileSync(filename, 'utf8') doesn't strip BOM markers #1918

dobesv commented Oct 21, 2011

dobesv commented Oct 21, 2011

koichik commented Oct 21, 2011

dobesv commented Oct 24, 2011

koichik commented Oct 24, 2011

dobesv commented Nov 4, 2011

dobesv commented Nov 4, 2011

koichik commented Nov 4, 2011

MylesPenlington commented May 24, 2012

dobesv commented May 24, 2012

tracker1 commented Mar 24, 2014

TimothyGu commented Feb 3, 2015

fs.readFileSync(filename, 'utf8') doesn't strip BOM markers #1918

fs.readFileSync(filename, 'utf8') doesn't strip BOM markers #1918

Comments

dobesv commented Oct 21, 2011

dobesv commented Oct 21, 2011

koichik commented Oct 21, 2011

dobesv commented Oct 24, 2011

koichik commented Oct 24, 2011

dobesv commented Nov 4, 2011

dobesv commented Nov 4, 2011

koichik commented Nov 4, 2011

MylesPenlington commented May 24, 2012

dobesv commented May 24, 2012

tracker1 commented Mar 24, 2014

TimothyGu commented Feb 3, 2015