Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1832 fix load csv option encoding. #1833

Merged
merged 1 commit into from
Sep 1, 2022

Conversation

AdmondGuo
Copy link
Contributor

root cause

加载csv文件时,为了完成skipNLines这个功能,文件被重新落盘一次。例如skipNLines=1,落盘的文件名就是
image
后续所有skipNLines=1的文件都将读取这个文件,而不是源文件。
在这个过程中,使用了 InputStreamReader 以及 FSDataOutputStream,他们会按照系统默认charset读取/写入文件。
导致encoding失效。

dev design

对 encoding 进行特殊处理。

  1. 读取/写入文件时,指定 charset(encoding)
  2. 将 encoding 也拼接到文件名中,由skipNLines和encoding两个参数共同决落盘文件名。

test evidence

测试了常见的文件编码
image
生成了如下的中间文件
image
文件的读取正常
image

@allwefantasy allwefantasy merged commit 689e08f into byzer-org:master Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants