-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle seeds with utf-8 BOM (#1177) #1452
Conversation
Fixes some neat windows behavior where the default code page can be cp1252 Add BOM unit test
5803728
to
d74e37d
Compare
core/dbt/clients/agate_helper.py
Outdated
@@ -41,4 +42,7 @@ def as_matrix(table): | |||
|
|||
|
|||
def from_csv(abspath): | |||
return agate.Table.from_csv(abspath, column_types=DEFAULT_TYPE_TESTER) | |||
with dbt.compat.open_file(abspath) as fp: | |||
if fp.read(1) != u'\ufeff': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try using codecs.BOM_UTF8
instead of the hardcoded value here?
https://docs.python.org/2/library/codecs.html#codecs.BOM_UTF8
https://docs.python.org/3/library/codecs.html#codecs.BOM_UTF8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm explicitly opening the value as unicode (that's what open_file
does), so examining the unicode code point rather than the byte sequence is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not suggesting that your comparison is wrong, just that you could replace the magic value u'\ufeff'
with codecs.BOM_UTF8.decode('utf-8')
Fixes #1177
Instead of passing agate a filepath, open the file at that path, look at the first byte, and seek back if it's not a BOM. Regardless, pass the file handle to agate instead.
That way when agate gets the file handle, if there is a BOM, it won't see it.