[FEA] Validate nvcomp-3.0 with spark rapids plugin #9461

jbrennan333 · 2023-10-17T20:15:15Z

In 23.10, cuDF is using nvcomp-2.6.1. In 23.12, we would like to move to nvcomp-3.0.x.
We need to run tests with the spark rapids plugin to ensure the updated snappy/zstd compressors/decompressors still produce correct data, ensure compression is equal to or better than with 2.6.1 and also measure any performance impact when running NDS benchmarks.

A sample validation plan is in issue #3037.

PR in cuDF for testing with nvcomp-3.0.x: rapidsai/cudf#13815
Rapids-CMake PR: rapidsai/rapids-cmake#451

jbrennan333 · 2023-10-24T14:19:47Z

Initial testing on desktop.

Run compression/decompression integration tests
NDS2.0 Data Conversion SNAPPY - scale 100 (desktop)
- convert from raw data to parquet with no compression
- convert from raw data to parquet/snappy using CPU
- convert from raw data to parquet/snappy using GPU
- compare sizes of all three
- verify data matches between CPU parquet/snappy and GPU parquet/snappy
NDS2.0 Power Run - scale 100 (desktop) on CPU using parquet/snappy data generated by CPU.
NDS2.0 Power Run - scale 100 (desktop) on CPU using parquet/snappy data generated by GPU.
NDS2.0 Power Run - scale 100 (desktop) on GPU using parquet/snappy data generated by CPU.
NDS2.0 Power Run - scale 100 (desktop) on GPU using parquet/snappy data generated by GPU.
- Compare results from these four runs.
NDS2.0 Data Conversion SNAPPY - scale 100 (desktop)
- convert from raw data to orc with no compression
- convert from raw data to orc/snappy using CPU
- convert from raw data to orc/snappy using GPU
- compare sizes of all three
- verify data matches between CPU snappy and GPU snappy
NDS2.0 Power Run - scale 100 (desktop) on CPU using orc/snappy data generated by CPU.
NDS2.0 Power Run - scale 100 (desktop) on CPU using orc/snappy data generated by GPU.
NDS2.0 Power Run - scale 100 (desktop) on GPU using parquet/snappy data generated by CPU.
NDS2.0 Power Run - scale 100 (desktop) on GPU using parquet/snappy data generated by GPU.
- Compare results from these four runs
NDS2.0 Data Conversion ZSTD - scale 100 (desktop)
- convert from raw data to parquet with no compression
- convert from raw data to parquet/snappy using CPU
- convert from raw data to parquet/snappy using GPU
- compare sizes of all three
verify data matches between CPU parquet/zstd and GPU parquet/zstd
NDS2.0 Power Run - scale 100 (desktop) on CPU using parquet/zstd data generated by CPU.
NDS2.0 Power Run - scale 100 (desktop) on CPU using parquet/zstd data generated by GPU.
NDS2.0 Power Run - scale 100 (desktop) on GPU using parquet/zstd data generated by CPU.
NDS2.0 Power Run - scale 100 (desktop) on GPU using parquet/zstd data generated by GPU.
- Compare results from these four runs.
NDS2.0 Data Conversion ORC/ZSTD - scale 100 (desktop)
- convert from raw data to orc with no compression
- convert from raw data to orc/zstd using CPU
- convert from raw data to orc/zstd using GPU
- compare sizes of all three
- verify data matches between CPU zstd and GPU zstd
NDS2.0 Power Run - scale 100 (desktop) on CPU using orc/zstd data generated by CPU.
NDS2.0 Power Run - scale 100 (desktop) on CPU using orc/zstd data generated by GPU.
NDS2.0 Power Run - scale 100 (desktop) on GPU using orc/zstd data generated by CPU.
NDS2.0 Power Run - scale 100 (desktop) on GPU using orc/zstd data generated by GPU.
- Compare results from these four runs

jbrennan333 · 2023-10-25T14:11:16Z

After converting nds raw data to parquet/snappy with cpu/gpu, and comparing the resulting data, I found differences in one of the tables (customer). This was using a 23.12 snapshot build with nvcomp-3.0.3 I am going to see if I can repro with the same build with nvcomp-2.6.1, to indicate whether this might be an issue in cudf vs nvcomp.

jbrennan333 · 2023-10-27T14:49:42Z

After converting nds raw data to parquet/snappy with cpu/gpu, and comparing the resulting data, I found differences in one of the tables (customer). This was using a 23.12 snapshot build with nvcomp-3.0.3 I am going to see if I can repro with the same build with nvcomp-2.6.1, to indicate whether this might be an issue in cudf vs nvcomp.

This turned out to be caused by a bug in nds_transcode.py, which was reading ISO-8859 encoded files as UTF8. So the international characters were coming through as invalid UTF8 characters, and GPU was handling writing these invalid characters differently than cpu (pass-thru vs converting to an unknown character code).
NVIDIA/spark-rapids-benchmarks#170
#9560

jbrennan333 · 2023-10-27T14:53:01Z

During initial testing on desktop, I found that the output produced for query98 using parquet/zstd was unreadable with CPU in spark. In spark-3.2.1 is was reporting a corrupted page, and in spark-3.4.1 it was reading a bogus length, leading it to read beyond the limits of the file. I was able to isolate the bad page and share it with Eric Schmidt, who was able to find the bug in nvcomp. I have verified that his fix resolves the problem. He is going to include it in a 3.0.4 release.
Note that this was appearing as a compatibility issue, because newer versions of zstd (command line utility) were decompressing the bad page successfully.

https://gitlab-master.nvidia.com/GPUDB/nvcomp/-/issues/541

jbrennan333 · 2023-11-16T22:48:52Z

nvcomp-3.0.4 has been pulled into cudf/spark-rapids builds, and additional work to validate is being done by another team, so I am going to close this.

jbrennan333 added the feature request New feature or request label Oct 17, 2023

jbrennan333 self-assigned this Oct 17, 2023

sameerz added the task Work required that improves the product but is not user facing label Oct 24, 2023

jbrennan333 closed this as completed Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Validate nvcomp-3.0 with spark rapids plugin #9461

[FEA] Validate nvcomp-3.0 with spark rapids plugin #9461

jbrennan333 commented Oct 17, 2023

jbrennan333 commented Oct 24, 2023 •

edited

Loading

jbrennan333 commented Oct 25, 2023

jbrennan333 commented Oct 27, 2023

jbrennan333 commented Oct 27, 2023 •

edited

Loading

jbrennan333 commented Nov 16, 2023

[FEA] Validate nvcomp-3.0 with spark rapids plugin #9461

[FEA] Validate nvcomp-3.0 with spark rapids plugin #9461

Comments

jbrennan333 commented Oct 17, 2023

jbrennan333 commented Oct 24, 2023 • edited Loading

jbrennan333 commented Oct 25, 2023

jbrennan333 commented Oct 27, 2023

jbrennan333 commented Oct 27, 2023 • edited Loading

jbrennan333 commented Nov 16, 2023

jbrennan333 commented Oct 24, 2023 •

edited

Loading

jbrennan333 commented Oct 27, 2023 •

edited

Loading