-
Notifications
You must be signed in to change notification settings - Fork 7.3k
fs.readFileSync(filename, 'utf8') doesn't strip BOM markers #1918
Comments
Workaround: body = body.replace(/^\uFEFF/, ''); After reading a UTF8 file where you are uncertain whether it may have a BOM marker in it. |
If var text = fs.readFileSync('foo.tx', 'utf8');
fs.writeFileSync('foo.txt', text, 'utf8'); The BOM is lost... |
Hmm maybe it is something that was fixed in a more recent version of node.js? |
No, I mean the BOM was lost from a file ('foo.txt') after |
@koichik - can you clarify why you closed this issue? If I read a utf-8 file into a string it should not have a BOM in it, that's simply how UTF-8 decoding works, the BOM is not included in the decoded string. Applications that expect the BOM to be present can add it back on when they write out the file, or to preserve the BOM they can read/write the file as binary. |
OK I read a huge argument about this subject on the python mailing list and a bug report on the JVM systems and I see that it is more controversial than I had originally thought. So, never mind ... looks like it's up to programmers to remove the BOM from UTF-8 files themselves. What they did in python was interesting - they added a new encoding scheme called 'utf8-sig' which will strip the bom if present and emit a BOM when encoding to bytes. This allows the programmer to decide whether to use a BOM or not. See http://docs.python.org/library/codecs.html:
Do you think that approach would be acceptable for use in node? |
You can easily write the utility (e.g. |
If the ut8 file has a BOM, then in the latest node (0.6.18) it leaves the first characer of the string as unicode 65279 - which is 0xFE 0xFF - which is not what was read (that is the utf16 BOM?) - as the utf8 signature on a utf8 file is 0xef, 0xbb, 0xbf - so the current file reading does not really make sense at all. |
Hi Myles, It is confusing but it makes sense in way; when you decode those three On Wed, May 23, 2012 at 7:35 PM, MylesPenlington <
|
…s with fs.readFileSync with utf-8 encoding, but node.js keeps BOM in the returned string (see nodejs/node-v0.x-archive#1918) which is detected by jshint as unsafe characters.
Taking the workaround specified here nodejs/node-v0.x-archive#1918
Taking the workaround specified here nodejs/node-v0.x-archive#1918
Environment: cloud9ide.com, node version 0.4.5
If I read a file using fs.readFileSync(filename, 'utf8') that is encoded using UTF8 with BOM, the BOM is included in the resulting string.
I think the routine to decode UTF8 is supposed to automatically strip the BOM from the start of the stream before returning the string.
The text was updated successfully, but these errors were encountered: