Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cuco::static_set in JSON tree algorithm #13928

Merged

Conversation

karthikeyann
Copy link
Contributor

Description

In JSON tree algorithms of JSON reader, cuco static_map is used as a set. This PR replaces it with static_set.
No tests are changed. No significant runtime changes.
Addresses part of #12261

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@karthikeyann karthikeyann added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. 4 - Needs Review Waiting for reviewer to review or respond 4 - Needs cuIO Reviewer improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 21, 2023
@karthikeyann karthikeyann requested a review from a team as a code owner August 21, 2023 14:52
@karthikeyann karthikeyann self-assigned this Aug 21, 2023
@karthikeyann
Copy link
Contributor Author

Benchmark Comparison:
['before.json', 'after.json']

nested_json_gpu_parser

[0] Quadro GV100

string_size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
2^20 4.880 ms 4.49% 4.898 ms 5.69% 18.864 us 0.39% PASS
2^21 4.895 ms 9.06% 4.913 ms 4.03% 17.902 us 0.37% PASS
2^22 5.317 ms 10.41% 5.321 ms 12.66% 3.876 us 0.07% PASS
2^23 6.422 ms 0.50% 6.412 ms 0.34% -10.405 us -0.16% PASS
2^24 9.930 ms 14.53% 9.787 ms 14.77% -143.676 us -1.45% PASS
2^25 15.002 ms 11.11% 15.006 ms 11.65% 3.727 us 0.02% PASS
2^26 22.675 ms 6.45% 22.183 ms 9.47% -492.229 us -2.17% PASS
2^27 40.288 ms 4.58% 40.130 ms 3.07% -158.362 us -0.39% PASS
2^28 76.542 ms 1.90% 75.338 ms 2.95% -1204.252 us -1.57% PASS
2^29 145.540 ms 2.20% 141.689 ms 1.44% -3851.651 us -2.65% FAIL
2^30 280.676 ms 1.14% 273.563 ms 0.40% -7112.382 us -2.53% FAIL

nested_json_gpu_parser_depth

[0] Quadro GV100

depth string_size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
2^1 2^20 4.794 ms 3.81% 4.844 ms 13.07% 50.668 us 1.06% PASS
2^2 2^20 4.806 ms 13.63% 4.821 ms 0.32% 15.013 us 0.31% PASS
2^3 2^20 9.694 ms 0.28% 9.728 ms 0.26% 33.102 us 0.34% FAIL
2^4 2^20 11.520 ms 0.38% 11.549 ms 0.32% 29.233 us 0.25% PASS
2^1 2^22 5.921 ms 0.33% 5.944 ms 0.37% 23.099 us 0.39% FAIL
2^2 2^22 5.932 ms 0.39% 5.930 ms 0.64% -1.492 us -0.03% PASS
2^3 2^22 10.719 ms 0.27% 10.749 ms 0.29% 30.260 us 0.28% FAIL
2^4 2^22 12.800 ms 0.27% 12.836 ms 0.23% 36.276 us 0.28% FAIL
2^1 2^24 11.261 ms 10.01% 11.321 ms 10.55% 59.724 us 0.53% PASS
2^2 2^24 11.306 ms 10.39% 11.296 ms 10.66% -9.278 us -0.08% PASS
2^3 2^24 15.109 ms 0.90% 15.232 ms 4.67% 122.651 us 0.81% PASS
2^4 2^24 18.559 ms 3.69% 18.820 ms 5.51% 260.069 us 1.40% PASS
2^1 2^26 30.345 ms 4.05% 30.058 ms 3.64% -287.165 us -0.95% PASS
2^2 2^26 30.467 ms 4.22% 30.053 ms 3.81% -413.853 us -1.36% PASS
2^3 2^26 35.087 ms 2.81% 35.038 ms 3.47% -49.738 us -0.14% PASS
2^4 2^26 44.072 ms 3.11% 43.698 ms 3.43% -373.460 us -0.85% PASS
2^1 2^28 105.235 ms 2.31% 103.592 ms 1.38% -1643.400 us -1.56% FAIL
2^2 2^28 105.108 ms 1.54% 103.375 ms 1.34% -1732.831 us -1.65% FAIL
2^3 2^28 131.532 ms 1.10% 126.514 ms 1.18% -5017.671 us -3.81% FAIL
2^4 2^28 163.348 ms 1.29% 160.052 ms 1.21% -3296.169 us -2.02% FAIL
2^1 2^30 402.572 ms 0.74% 396.564 ms 0.50% -6008.528 us -1.49% FAIL
2^2 2^30 403.850 ms 0.71% 396.296 ms 0.32% -7553.944 us -1.87% FAIL
2^3 2^30 511.063 ms 0.19% 489.892 ms 0.49% -21171.784 us -4.14% FAIL
2^4 2^30 631.286 ms 0.30% 610.291 ms 0.44% -20995.106 us -3.33% FAIL

json_read_data_type

[0] Quadro GV100

data_type io Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
FLOAT DEVICE_BUFFER 769.726 ms 0.21% 741.506 ms 0.07% -28219.922 us -3.67% FAIL
DECIMAL DEVICE_BUFFER 848.174 ms 0.16% 826.489 ms 0.14% -21684.851 us -2.56% FAIL
STRING DEVICE_BUFFER 334.214 ms 0.48% 321.141 ms 0.36% -13072.504 us -3.91% FAIL
LIST DEVICE_BUFFER 246.600 ms 0.79% 247.035 ms 0.76% 435.057 us 0.18% PASS
STRUCT DEVICE_BUFFER 936.482 ms 0.25% 905.342 ms 0.35% -31139.343 us -3.33% FAIL

json_read_io

[0] Quadro GV100

io Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
FILEPATH 549.848 ms 1.97% 554.247 ms 1.72% 4.398 ms 0.80% PASS
HOST_BUFFER 427.906 ms 0.05% 426.857 ms 0.11% -1049.323 us -0.25% FAIL
DEVICE_BUFFER 278.015 ms 0.50% 280.157 ms 0.21% 2.142 ms 0.77% FAIL

Summary

  • Total Matches: 43
    • Pass (diff <= min_noise): 23
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 20

@karthikeyann karthikeyann changed the title Use static_set in JSON tree algorithm Use cuco::static_set in JSON tree algorithm Aug 21, 2023
@bdice
Copy link
Contributor

bdice commented Aug 21, 2023

The memory usage should change as a result of this, right? (By not having to store both keys and values)

@karthikeyann
Copy link
Contributor Author

karthikeyann commented Aug 21, 2023

The memory usage should change as a result of this, right? (By not having to store both keys and values)

It makes no difference in peak memory usage of JSON reader, because this part of code does not take the highest memory usage.
But in that particular section of code, it may have reduced usage.

Details

Benchmark Results

nested_json_gpu_parser

[0] Quadro GV100

string_size Samples CPU Time Noise GPU Time Noise Elem/s bytes_per_second peak_memory_usage
2^20 = 1048576 1x 5.325 ms inf% 5.319 ms inf% 197.138M 197137516 8.201 MiB
2^21 = 2097152 1x 5.204 ms inf% 5.198 ms inf% 403.472M 403472256 16.164 MiB
2^22 = 4194304 1x 6.203 ms inf% 6.196 ms inf% 676.969M 676968835 32.090 MiB
2^23 = 8388608 1x 7.006 ms inf% 7.000 ms inf% 1.198G 1198427349 63.942 MiB
2^24 = 16777216 1x 9.618 ms inf% 9.612 ms inf% 1.745G 1745433173 127.645 MiB
2^25 = 33554432 1x 14.918 ms inf% 14.912 ms inf% 2.250G 2250143707 255.051 MiB
2^26 = 67108864 1x 24.056 ms inf% 24.049 ms inf% 2.790G 2790480400 509.864 MiB
2^27 = 134217728 1x 43.231 ms inf% 43.225 ms inf% 3.105G 3105067902 1019.489 MiB
2^28 = 268435456 1x 80.947 ms inf% 80.942 ms inf% 3.316G 3316411689 1.991 GiB
2^29 = 536870912 1x 150.893 ms inf% 150.888 ms inf% 3.558G 3558083870 3.982 GiB
2^30 = 1073741824 1x 280.867 ms inf% 280.863 ms inf% 3.823G 3823012817 7.963 GiB

nested_json_gpu_parser_depth

[0] Quadro GV100

depth string_size Samples CPU Time Noise GPU Time Noise Elem/s bytes_per_second peak_memory_usage
2^1 = 2 2^20 = 1048576 1x 4.872 ms inf% 4.867 ms inf% 215.464M 215461301 9.353 MiB
2^2 = 4 2^20 = 1048576 1x 4.818 ms inf% 4.812 ms inf% 217.920M 217917251 9.353 MiB
2^3 = 8 2^20 = 1048576 1x 9.756 ms inf% 9.751 ms inf% 107.542M 107537957 8.883 MiB
2^4 = 16 2^20 = 1048576 1x 11.506 ms inf% 11.500 ms inf% 91.187M 91180015 9.301 MiB
2^1 = 2 2^22 = 4194304 1x 5.917 ms inf% 5.911 ms inf% 709.560M 709552031 36.695 MiB
2^2 = 4 2^22 = 4194304 1x 10.150 ms inf% 10.144 ms inf% 413.475M 413469808 36.695 MiB
2^3 = 8 2^22 = 4194304 1x 15.130 ms inf% 15.124 ms inf% 277.333M 277327103 34.815 MiB
2^4 = 16 2^22 = 4194304 1x 12.886 ms inf% 12.881 ms inf% 325.623M 325622078 36.487 MiB
2^1 = 2 2^24 = 16777216 1x 10.795 ms inf% 10.789 ms inf% 1.555G 1554989405 146.063 MiB
2^2 = 4 2^24 = 16777216 1x 10.893 ms inf% 10.888 ms inf% 1.541G 1540890587 146.063 MiB
2^3 = 8 2^24 = 16777216 1x 15.193 ms inf% 15.188 ms inf% 1.105G 1104654900 138.542 MiB
2^4 = 16 2^24 = 16777216 1x 18.558 ms inf% 18.553 ms inf% 904.319M 904309708 145.234 MiB
2^1 = 2 2^26 = 67108864 1x 29.362 ms inf% 29.356 ms inf% 2.286G 2286068158 583.537 MiB
2^2 = 4 2^26 = 67108864 1x 29.405 ms inf% 29.399 ms inf% 2.283G 2282669096 583.537 MiB
2^3 = 8 2^26 = 67108864 1x 33.626 ms inf% 33.621 ms inf% 1.996G 1996061478 553.448 MiB
2^4 = 16 2^26 = 67108864 1x 42.606 ms inf% 42.601 ms inf% 1.575G 1575302932 580.215 MiB
2^1 = 2 2^28 = 268435456 1x 117.928 ms inf% 117.923 ms inf% 2.276G 2276371219 2.279 GiB
2^2 = 4 2^28 = 268435456 1x 112.866 ms inf% 112.861 ms inf% 2.378G 2378465548 2.279 GiB
2^3 = 8 2^28 = 268435456 1x 132.348 ms inf% 132.343 ms inf% 2.028G 2028335758 2.161 GiB
2^4 = 16 2^28 = 268435456 1x 157.249 ms inf% 157.245 ms inf% 1.707G 1707121026 2.266 GiB
2^1 = 2 2^30 = 1073741824 1x 396.339 ms inf% 396.336 ms inf% 2.709G 2709167443 9.114 GiB
2^2 = 4 2^30 = 1073741824 1x 395.608 ms inf% 395.606 ms inf% 2.714G 2714168334 9.114 GiB
2^3 = 8 2^30 = 1073741824 1x 489.010 ms inf% 489.009 ms inf% 2.196G 2195751585 8.644 GiB
2^4 = 16 2^30 = 1073741824 1x 607.419 ms inf% 607.419 ms inf% 1.768G 1767712462 9.062 GiB

json_read_data_type

[0] Quadro GV100

data_type io Samples CPU Time Noise GPU Time Noise bytes_per_second peak_memory_usage encoded_file_size
FLOAT DEVICE_BUFFER 1x 736.726 ms inf% 736.726 ms inf% 728725276 15.066 GiB 1.796 GiB
DECIMAL DEVICE_BUFFER 1x 828.822 ms inf% 828.823 ms inf% 647750935 14.779 GiB 1.791 GiB
STRING DEVICE_BUFFER 1x 325.747 ms inf% 325.742 ms inf% 1648147628 7.610 GiB 944.307 MiB
LIST DEVICE_BUFFER 1x 246.325 ms inf% 246.320 ms inf% 2179565997 9.880 GiB 1.158 GiB
STRUCT DEVICE_BUFFER 1x 905.458 ms inf% 905.459 ms inf% 592926575 15.515 GiB 1.688 GiB

json_read_io

[0] Quadro GV100

io Samples CPU Time Noise GPU Time Noise bytes_per_second peak_memory_usage encoded_file_size
FILEPATH 1x 548.927 ms inf% 548.927 ms inf% 978037858 10.416 GiB 1.224 GiB
HOST_BUFFER 1x 423.777 ms inf% 423.775 ms inf% 1266877007 10.416 GiB 1.224 GiB
DEVICE_BUFFER 1x 277.196 ms inf% 277.192 ms inf% 1936819745 10.416 GiB 1.224 GiB

cuco::experimental::extent{compute_hash_table_size(num_nodes)},
cuco::empty_key<cudf::size_type>{empty_node_index_sentinel},
d_equal,
cuco::experimental::linear_probing<1, hasher_type>{d_hashed_cache},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using linear_probing here? Do we have other options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used the same probing used in distinct_count. What other probing options are there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@PointKernel PointKernel Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can switch between linear_probing and double_hashing. In general, linear_probing delivers better performance for set/map and double_hashing is preferred for multiset/multimap.

@karthikeyann
Copy link
Contributor Author

After replacing static_map with static_set in "node type + field name hashing".
['before.json', 'after2.json']

nested_json_gpu_parser

[0] Quadro GV100

string_size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
2^20 4.880 ms 4.49% 4.611 ms 0.37% -268.940 us -5.51% FAIL
2^21 4.895 ms 9.06% 4.774 ms 0.32% -120.961 us -2.47% FAIL
2^22 5.317 ms 10.41% 5.190 ms 11.42% -126.517 us -2.38% PASS
2^23 6.422 ms 0.50% 6.204 ms 0.38% -217.953 us -3.39% FAIL
2^24 9.930 ms 14.53% 9.615 ms 18.52% -315.873 us -3.18% PASS
2^25 15.002 ms 11.11% 13.463 ms 11.93% -1538.816 us -10.26% PASS
2^26 22.675 ms 6.45% 20.569 ms 5.89% -2105.785 us -9.29% FAIL
2^27 40.288 ms 4.58% 37.852 ms 5.66% -2436.356 us -6.05% FAIL
2^28 76.542 ms 1.90% 67.596 ms 2.47% -8946.637 us -11.69% FAIL
2^29 145.540 ms 2.20% 129.453 ms 0.18% -16087.117 us -11.05% FAIL
2^30 280.676 ms 1.14% 254.863 ms 0.24% -25813.133 us -9.20% FAIL

nested_json_gpu_parser_depth

[0] Quadro GV100

depth string_size Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
2^1 2^20 4.794 ms 3.81% 4.758 ms 1.03% -35.237 us -0.74% PASS
2^2 2^20 4.806 ms 13.63% 4.759 ms 0.33% -47.073 us -0.98% FAIL
2^3 2^20 9.694 ms 0.28% 9.639 ms 0.50% -55.950 us -0.58% FAIL
2^4 2^20 11.520 ms 0.38% 11.431 ms 0.21% -89.160 us -0.77% FAIL
2^1 2^22 5.921 ms 0.33% 5.803 ms 0.35% -117.992 us -1.99% FAIL
2^2 2^22 5.932 ms 0.39% 5.807 ms 0.33% -124.324 us -2.10% FAIL
2^3 2^22 10.719 ms 0.27% 10.606 ms 0.46% -113.268 us -1.06% FAIL
2^4 2^22 12.800 ms 0.27% 12.689 ms 0.33% -110.459 us -0.86% FAIL
2^1 2^24 11.261 ms 10.01% 10.507 ms 1.91% -754.056 us -6.70% FAIL
2^2 2^24 11.306 ms 10.39% 10.493 ms 1.58% -812.631 us -7.19% FAIL
2^3 2^24 15.109 ms 0.90% 15.620 ms 10.60% 511.498 us 3.39% FAIL
2^4 2^24 18.559 ms 3.69% 18.562 ms 5.30% 2.206 us 0.01% PASS
2^1 2^26 30.345 ms 4.05% 29.219 ms 4.78% -1126.209 us -3.71% PASS
2^2 2^26 30.467 ms 4.22% 29.164 ms 4.85% -1303.629 us -4.28% FAIL
2^3 2^26 35.087 ms 2.81% 34.047 ms 4.49% -1040.550 us -2.97% FAIL
2^4 2^26 44.072 ms 3.11% 41.409 ms 3.47% -2662.539 us -6.04% FAIL
2^1 2^28 105.235 ms 2.31% 100.108 ms 0.65% -5127.800 us -4.87% FAIL
2^2 2^28 105.108 ms 1.54% 100.133 ms 0.50% -4975.085 us -4.73% FAIL
2^3 2^28 131.532 ms 1.10% 120.932 ms 0.74% -10599.756 us -8.06% FAIL
2^4 2^28 163.348 ms 1.29% 152.227 ms 1.03% -11120.952 us -6.81% FAIL
2^1 2^30 402.572 ms 0.74% 377.403 ms 0.33% -25169.766 us -6.25% FAIL
2^2 2^30 403.850 ms 0.71% 377.667 ms 0.33% -26182.859 us -6.48% FAIL
2^3 2^30 511.063 ms 0.19% 461.863 ms 0.39% -49200.109 us -9.63% FAIL
2^4 2^30 631.286 ms 0.30% 585.737 ms 0.50% -45548.761 us -7.22% FAIL

json_read_data_type

[0] Quadro GV100

data_type io Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
FLOAT DEVICE_BUFFER 769.726 ms 0.21% 710.914 ms 0.24% -58812.134 us -7.64% FAIL
DECIMAL DEVICE_BUFFER 848.174 ms 0.16% 811.799 ms 0.22% -36374.756 us -4.29% FAIL
STRING DEVICE_BUFFER 334.214 ms 0.48% 313.744 ms 0.68% -20469.883 us -6.12% FAIL
LIST DEVICE_BUFFER 246.600 ms 0.79% 241.710 ms 0.49% -4889.477 us -1.98% FAIL
STRUCT DEVICE_BUFFER 936.482 ms 0.25% 854.513 ms 0.48% -81968.657 us -8.75% FAIL

json_read_io

[0] Quadro GV100

io Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
FILEPATH 549.848 ms 1.97% 549.053 ms 2.05% -795.637 us -0.14% PASS
HOST_BUFFER 427.906 ms 0.05% 420.848 ms 0.05% -7058.240 us -1.65% FAIL
DEVICE_BUFFER 278.015 ms 0.50% 272.230 ms 0.49% -5784.574 us -2.08% FAIL

Summary

  • Total Matches: 43
    • Pass (diff <= min_noise): 7
    • Unknown (infinite noise): 0
    • Failure (diff > min_noise): 36

Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks!

@karthikeyann karthikeyann changed the title Use cuco::static_set in JSON tree algorithm Use cuco::static_set in JSON tree algorithm Aug 22, 2023
@karthikeyann karthikeyann changed the title Use cuco::static_set in JSON tree algorithm Use cuco::static_set in JSON tree algorithm Aug 25, 2023
@karthikeyann karthikeyann changed the title Use cuco::static_set in JSON tree algorithm Use cuco::static_set in JSON tree algorithm Aug 27, 2023
@karthikeyann karthikeyann added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond 4 - Needs cuIO Reviewer labels Aug 27, 2023
@karthikeyann
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit aba001c into rapidsai:branch-23.10 Aug 28, 2023
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants