Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch DataLoader #499

Merged
merged 35 commits into from
May 30, 2023
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
fa1024c
add PyTorch DataLoader support
atolopko-czi May 18, 2023
2080d40
add dataloader factory method
atolopko-czi May 18, 2023
95ba2bb
tests for experiment_dataloader() and fixes
atolopko-czi May 18, 2023
abcc380
instance var init fix
atolopko-czi May 19, 2023
cb37af4
rm explicit timeout param handling
atolopko-czi May 19, 2023
2e4b0bd
minor fixes, comments
atolopko-czi May 19, 2023
895fcb5
initial pytorch notebook for LR model training, using AnnData
atolopko-czi May 22, 2023
e33af14
swap order of obs and labels in DataLoader rows/batches
atolopko-czi May 22, 2023
dd55c7d
support access to ExperimentAccessQuery from ExperimentDataPipe
atolopko-czi May 23, 2023
60e0b9d
pytorch_lr_classifier example now functional
atolopko-czi May 23, 2023
1084482
refactoring
atolopko-czi May 23, 2023
d766785
take Experiment obj instead of URI
atolopko-czi May 23, 2023
fdeb52c
fix multiprocessing pickling
atolopko-czi May 23, 2023
c66bdb3
pytorch example code tweaks
atolopko-czi May 23, 2023
080e5ec
use logging instead of prints
atolopko-czi May 23, 2023
70aec84
use real census in pytorch example
atolopko-czi May 23, 2023
e0968b1
add pip requirements for PyTorch DataLoader
atolopko-czi May 23, 2023
ad0356e
support split and shuffle via PyTorch DataPipes
atolopko-czi May 23, 2023
891684c
TODO
atolopko-czi May 23, 2023
503a7dc
install experimental pip requirements optionally
atolopko-czi May 23, 2023
b51b4a9
move pytorch.py to `experimental.ml` sub-package
atolopko-czi May 23, 2023
94778ea
refactor var name
atolopko-czi May 23, 2023
7f617f2
fix GHA builds
atolopko-czi May 24, 2023
f6c0f78
invert sparse_X option to dense_X
atolopko-czi May 24, 2023
6313ee9
update example pytest code for sparse_X option
atolopko-czi May 24, 2023
4bbb3b8
run experimental tests conditionally (#501)
atolopko-czi May 24, 2023
c027b7d
Merge branch 'main' into atol/pytorch-dataloader
atolopko-czi May 24, 2023
1c3a736
docs
atolopko-czi May 25, 2023
8cfc59b
use API's default buffer_bytes value
atolopko-czi May 25, 2023
8ce019f
refactoring
atolopko-czi May 25, 2023
553423d
refactoring
atolopko-czi May 25, 2023
51ba4bb
add explicit scikit-learn pip dependency
atolopko-czi May 25, 2023
ce3abe7
update TODO
atolopko-czi May 25, 2023
067893b
Merge branch 'main' into atol/pytorch-dataloader
atolopko-czi May 25, 2023
78040c5
Merge branch 'main' into atol/pytorch-dataloader
atolopko-czi May 30, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions api/python/cellxgene_census/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ dependencies= [
"tiledb",
]

[project.optional-dependencies]
experimental = [
"torch==2.0.1",
atolopko-czi marked this conversation as resolved.
Show resolved Hide resolved
"torchdata==0.6.1"
]

[project.urls]
homepage = "https://github.com/chanzuckerberg/cellxgene-census"
repository = "https://github.com/chanzuckerberg/cellxgene-census"
Expand Down
15 changes: 12 additions & 3 deletions api/python/cellxgene_census/src/cellxgene_census/_open.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,17 @@

def _open_soma(locator: CensusLocator, context: Optional[soma.options.SOMATileDBContext] = None) -> soma.Collection:
"""Private. Merge config defaults and return open census as a soma Collection/context."""
s3_region = locator.get("s3_region")
context = _build_soma_tiledb_context(locator.get("s3_region"), context)
return soma.open(locator["uri"], mode="r", soma_type=soma.Collection, context=context)


def _build_soma_tiledb_context(
s3_region: Optional[str] = None, context: Optional[soma.options.SOMATileDBContext] = None
) -> soma.options.SOMATileDBContext:
"""
Private. Build a SOMATileDBContext with sensible defaults. If user-defined context is provided, only update the
`vfs.s3.region` only.
"""

if not context:
# if no user-defined context, cellxgene_census defaults take precedence over SOMA defaults
Expand All @@ -50,8 +60,7 @@ def _open_soma(locator: CensusLocator, context: Optional[soma.options.SOMATileDB
tiledb_config = context.tiledb_ctx.config()
tiledb_config["vfs.s3.region"] = s3_region
context = context.replace(tiledb_config=tiledb_config)

return soma.open(locator["uri"], mode="r", soma_type=soma.Collection, context=context)
return context


def open_soma(
Expand Down
Loading