[docs] Add PyTorch loaders article release #1214

pablo-gar · 2024-06-28T20:16:14Z

Images don't render in markdown, they are encoded for the Myst parser

docs/articles/2024/20240702-pytorch.md

Co-authored-by: Emanuele Bezzi <[email protected]>

docs/articles/2024/20240702-pytorch.md

ebezzi · 2024-06-28T21:08:59Z

docs/articles/2024/20240702-pytorch.md

+
+We have made improvements to the loaders to reduce the amount of data transformations required from data fetching to model training. One such important change is to encode the expression data as a dense matrix immediately after the data is retrieved from disk/cloud. 
+
+In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter. 


The parameter is method, but I believe @ryan-williams wanted to change it?

Co-authored-by: Emanuele Bezzi <[email protected]>

codecov · 2024-06-28T23:09:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.17%. Comparing base (fc0281b) to head (a33479a).
Report is 10 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1214      +/-   ##
==========================================
+ Coverage   91.11%   91.17%   +0.06%     
==========================================
  Files          77       77              
  Lines        5922     5963      +41     
==========================================
+ Hits         5396     5437      +41     
  Misses        526      526

Flag	Coverage Δ
unittests	`91.17% <ø> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/articles/2024/20240702-pytorch.md

Co-authored-by: Emanuele Bezzi <[email protected]>

ebezzi · 2024-07-09T01:03:22Z

docs/articles/2024/20240709-pytorch.md

+
+*Published:* *July 9th, 2024*
+
+*By:* *[Emanuele Bezzi](mailto:[email protected]), [Pablo Garcia-Nieto](mailto:[email protected]), [Prathap Sridharan](mailto:[email protected]), [Ryan Williams](mailto:[email protected])*


Ryan's email is wrong. Worth checking if they're ok with adding the email here though?

Thanks, I saw that #1228 addresses this 🙏 (and yes, I'm ok / appreciate being listed!)

ryan-williams · 2024-07-10T20:11:55Z

Thanks for this! One note: the wrong circle is labeled "default" in docs/articles/2024/20240709-pytorch-fig-benchmark.png:

From the interactive plot, here's a gif showing the 2 data points corresponding to 2048 chunks of size 64 (one from a g4dn.4xlarge, one from a g4dn.8xlarge):

The circle currently labeled "default" actually corresponds to 1024 chunks of size 128.

I'm not sure how significant the speed difference is between the g4dn.{4x,8x}large nodes, with the default configuration, given N=1. I can generate a few more samples, if you like.

pablo-gar added 3 commits June 26, 2024 17:26

Add pytorch article

148cb86

update article

af077a1

add article

6325439

pablo-gar requested a review from ebezzi June 28, 2024 20:16

add author

a87b65b

ebezzi reviewed Jun 28, 2024

View reviewed changes

docs/articles/2024/20240702-pytorch.md Outdated Show resolved Hide resolved

pablo-gar requested a review from prathapsridharan June 28, 2024 20:47

pablo-gar and others added 2 commits June 28, 2024 13:48

Editorial

cadef7a

Co-authored-by: Emanuele Bezzi <[email protected]>

lint

e7d398c

ebezzi reviewed Jun 28, 2024

View reviewed changes

pablo-gar and others added 3 commits June 28, 2024 14:13

editorial

cbacf9e

Co-authored-by: Emanuele Bezzi <[email protected]>

editorial

5c7e5bf

lint

a33479a

ebezzi reviewed Jul 3, 2024

View reviewed changes

docs/articles/2024/20240702-pytorch.md Outdated Show resolved Hide resolved

ebezzi reviewed Jul 3, 2024

View reviewed changes

docs/articles/2024/20240702-pytorch.md Outdated Show resolved Hide resolved

ebezzi reviewed Jul 3, 2024

View reviewed changes

docs/articles/2024/20240702-pytorch.md Outdated Show resolved Hide resolved

pablo-gar and others added 4 commits July 8, 2024 16:44

editorial

db019c4

Co-authored-by: Emanuele Bezzi <[email protected]>

final edits

bdbdff8

update date

cce2337

update date

273ffc3

pablo-gar enabled auto-merge (squash) July 9, 2024 00:27

pablo-gar requested a review from ebezzi July 9, 2024 00:27

ebezzi approved these changes Jul 9, 2024

View reviewed changes

pablo-gar merged commit b83055a into main Jul 9, 2024
14 of 15 checks passed

pablo-gar deleted the pablo-gar/add-loaders-article branch July 9, 2024 01:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Add PyTorch loaders article release #1214

[docs] Add PyTorch loaders article release #1214

pablo-gar commented Jun 28, 2024 •

edited

Loading

ebezzi Jun 28, 2024

codecov bot commented Jun 28, 2024

ebezzi Jul 9, 2024

ryan-williams Jul 10, 2024 •

edited

Loading

ryan-williams commented Jul 10, 2024


		We have made improvements to the loaders to reduce the amount of data transformations required from data fetching to model training. One such important change is to encode the expression data as a dense matrix immediately after the data is retrieved from disk/cloud.

		In our benchmarks, we found that densifying data increases training speed ~3X while maintaining relatively constant memory usage (Figure 3). However we still allow users to decide whether to process the expression data in sparse or dense format via the #TODO ask ebezzi to include name of parameter.


		Published: July 9th, 2024

		By: [Emanuele Bezzi](mailto:[email protected]), [Pablo Garcia-Nieto](mailto:[email protected]), [Prathap Sridharan](mailto:[email protected]), [Ryan Williams](mailto:[email protected])

[docs] Add PyTorch loaders article release #1214

[docs] Add PyTorch loaders article release #1214

Conversation

pablo-gar commented Jun 28, 2024 • edited Loading

ebezzi Jun 28, 2024

Choose a reason for hiding this comment

codecov bot commented Jun 28, 2024

Codecov Report

ebezzi Jul 9, 2024

Choose a reason for hiding this comment

ryan-williams Jul 10, 2024 • edited Loading

Choose a reason for hiding this comment

ryan-williams commented Jul 10, 2024

pablo-gar commented Jun 28, 2024 •

edited

Loading

ryan-williams Jul 10, 2024 •

edited

Loading