Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip BOM from Excel-generated UTF-8 CSV files #1177

Closed
mplovepop opened this issue Dec 6, 2018 · 2 comments · Fixed by #1452
Closed

Strip BOM from Excel-generated UTF-8 CSV files #1177

mplovepop opened this issue Dec 6, 2018 · 2 comments · Fixed by #1452
Assignees

Comments

@mplovepop
Copy link

Issue

UTF-8 CSV files generated by Excel have a byte order mark (BOM) which is included in the column title when using dbt seed.

Issue description

Save Excel file as UTF-8 CSV on Microsoft Excel for Mac. Run dbt seed on the resulting file.

Results

The first column name includes the unprintable BOM character which makes queries very difficult to write. I would expect the BOM to be stripped from the column name.

I was able to work around this by changing the file encoding in Emacs and re-saving the file.

System information

The output of dbt --version:

installed version: 0.12.1
   latest version: 0.12.1

Up to date!

The operating system you're running on: MacOS Mojave 10.14.1

The python version you're using (probably the output of python --version): 3.7.1

Steps to reproduce

  1. Using Excel, save file as UTF-8 CSV.
  2. Run dbt seed
  3. select <column> from <table>, column cannot be found because of the BOM.
@ryantuck
Copy link

ryantuck commented Dec 6, 2018

I (fortunately) don't have an excel-generated csv lying around to test this on, but had written down some notes about this before.

I think the following enables you to read excel-generated csv files as expected, for what it's worth:

>>> import codecs
>>> import csv
>>> with codecs.open('excel_output.csv', encoding='utf-8-sig') as f:
...     reader = csv.reader(f)
...     rows = [row for row in reader]

@sarinb
Copy link

sarinb commented Jul 10, 2020

I (fortunately) don't have an excel-generated csv lying around to test this on, but had written down some notes about this before.

I think the following enables you to read excel-generated csv files as expected, for what it's worth:

>>> import codecs
>>> import csv
>>> with codecs.open('excel_output.csv', encoding='utf-8-sig') as f:
...     reader = csv.reader(f)
...     rows = [row for row in reader]

@ryantuck - Thanks a bunch! I was stuck on a similar issue for hours and your comment solved it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants