[DOCS] Partitioning user guide and small doc fixes #2717

jaychia · 2024-08-23T21:55:27Z

Closes: #840

codspeed-hq · 2024-08-23T22:10:15Z

CodSpeed Performance Report

Merging #2717 will not alter performance

_{Comparing jay/partitioning-guide (670e564) with main (9df6beb)}

Summary

✅ 10 untouched benchmarks

kevinzwang

Looks great! just had two small comments on the wording

kevinzwang · 2024-08-23T22:58:48Z

docs/source/user_guide/poweruser/partitioning.rst

+
+1. **Have Enough Partitions**: our general recommendation for high throughput and maximal resource utilization is to have *at least* ``2 x TOTAL_NUM_CPUS`` partitions, which allows Daft to fully saturate your CPUs.
+2. **More Partitions**: if you are observing memory issues (excessive spilling or out-of-memory (OOM) issues) then you may choose to increase the number of partitions. This increases the amount of overhead in your system, but improves overall memory stability (since each partition will be smaller).
+3. **Fewer Partitions**: if you are observing a large amount of overhead (especially during shuffle operations such as joins and sorts), then you may choose to decrease the number of partitions. This decreases the amount of overhead in the system, at the cost of using more memory (since each partition will be larger).


Maybe some description of how to measure overhead (vs maybe just an operation that is expensive)?

kevinzwang · 2024-08-23T22:59:56Z

docs/source/user_guide/poweruser/partitioning.rst

+---------------------------
+
+Daft will automatically use certain heuristics to determine the number of partitions for you when you create a DataFrame. When reading data from files (e.g. Parquet, CSV or JSON),
+each file is by default one partition on its own, but Daft will also perform splitting of partitions (for files that are egregiously large) and coalescing of partitions (for small files)


I think "by default" is a little misleading here as Daft by default does scan task splitting and merging.

Jay Chia added 4 commits August 23, 2024 11:37

[DOCS] Partitioning docs

57cfb56

Add more info

c040b97

Add small doc fixes

e0752a9

Fix more issues

67204fd

github-actions bot added the documentation Improvements or additions to documentation label Aug 23, 2024

Finish up

5f23111

Fix up docs

fd7175b

jaychia requested a review from kevinzwang August 23, 2024 22:23

kevinzwang approved these changes Aug 23, 2024

View reviewed changes

Update partitioning.rst

670e564

jaychia enabled auto-merge (squash) August 23, 2024 23:22

jaychia merged commit 3647b26 into main Aug 23, 2024
46 checks passed

jaychia deleted the jay/partitioning-guide branch August 23, 2024 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOCS] Partitioning user guide and small doc fixes #2717

[DOCS] Partitioning user guide and small doc fixes #2717

jaychia commented Aug 23, 2024 •

edited

Loading

codspeed-hq bot commented Aug 23, 2024 •

edited

Loading

kevinzwang left a comment

kevinzwang Aug 23, 2024

kevinzwang Aug 23, 2024

[DOCS] Partitioning user guide and small doc fixes #2717

[DOCS] Partitioning user guide and small doc fixes #2717

Conversation

jaychia commented Aug 23, 2024 • edited Loading

codspeed-hq bot commented Aug 23, 2024 • edited Loading

CodSpeed Performance Report

Merging #2717 will not alter performance

Summary

kevinzwang left a comment

Choose a reason for hiding this comment

kevinzwang Aug 23, 2024

Choose a reason for hiding this comment

kevinzwang Aug 23, 2024

Choose a reason for hiding this comment

jaychia commented Aug 23, 2024 •

edited

Loading

codspeed-hq bot commented Aug 23, 2024 •

edited

Loading