Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: adding support for sharded precomputed #35

Merged
merged 29 commits into from
Jan 10, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
1bb974d
WIP
xgui3783 Sep 24, 2023
35d7e7f
fixed minishard order. MVP status
xgui3783 Oct 6, 2023
cd37194
maint: minor refactor of sharded precomp
xgui3783 Oct 6, 2023
ae3ce57
feat: add read capability to shardedfileaccessor
xgui3783 Oct 9, 2023
cf813c3
feat: preshift bits
xgui3783 Oct 10, 2023
5628fd2
fix flake8
xgui3783 Oct 10, 2023
a17f96c
maint: moved shardspec validation
xgui3783 Oct 10, 2023
ded4199
fix: Literal type for < py3.8
xgui3783 Oct 10, 2023
10b8640
test: fixed py3.5 namedtuple
xgui3783 Oct 10, 2023
fc6d8c4
maint: drop py3.5
xgui3783 Oct 11, 2023
2b0ae91
add tests
xgui3783 Oct 30, 2023
22916f2
fix test
xgui3783 Oct 30, 2023
d95c3f8
fix tests for <=py3.8
xgui3783 Oct 30, 2023
b622ebf
added more tests
xgui3783 Oct 30, 2023
33f34fc
added more tests coverage
xgui3783 Oct 30, 2023
5e4f2a3
fix: r/w shard, doc
xgui3783 Nov 2, 2023
adee8d9
docs: added examples, ack, server
xgui3783 Nov 2, 2023
12e501a
fix: in_is_sharded needs explicit shard spec
xgui3783 Nov 2, 2023
22711ae
fix: sharding path decision
xgui3783 Nov 2, 2023
c3bd3d0
update tests, simplified scale construction
xgui3783 Nov 6, 2023
7da6e0f
fix: tests
xgui3783 Nov 6, 2023
8b50048
fix script test
xgui3783 Nov 6, 2023
209bce9
fix test
xgui3783 Nov 6, 2023
2ab8f19
fix test
xgui3783 Nov 6, 2023
1487ab1
feat: add kwargs to allow for shard on disk/in mem
xgui3783 Nov 27, 2023
53e8042
fix: test
xgui3783 Nov 27, 2023
d7f324f
misc fixes
xgui3783 Jan 10, 2024
38ba143
fix copyright headers
ylep Jan 10, 2024
7f1d4b1
fix a Sphinx warning
ylep Jan 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions .github/workflows/tox.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,13 @@ jobs:
include:
- runs-on: 'ubuntu-20.04'
python-version: '3.6'
- runs-on: 'ubuntu-20.04'
python-version: '3.5'
runs-on: ${{ matrix.runs-on }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
with:
lfs: true
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: PIP cache
Expand Down
6 changes: 6 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,9 @@ This repository uses `pre-commit`_ to ensure that all committed code follows min

.. _Neuroglancer: https://github.com/google/neuroglancer
.. _pre-commit: https://pre-commit.com/


Acknowledgments
===============

`cloud-volume <https://github.com/seung-lab/cloud-volume>_` (BSD 3-Clause licensed) for compressed morton code and shard/minishard mask implementation.
82 changes: 81 additions & 1 deletion docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ two Nifti files based on the JuBrain human brain atlas, as published in version
Note that you need to use `git-lfs <https://git-lfs.github.com/>`_ in order to
see the contents of the NIfTI files (otherwise you can download them `from the
repository on Github
<https://github.com/HumanBrainProject/neuroglancer-scripts/tree/master/JuBrain>`_.
<https://github.com/HumanBrainProject/neuroglancer-scripts/tree/master/examples>`_.)

Conversion of the grey-level template image (MNI Colin27 T1 MRI)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Expand Down Expand Up @@ -152,3 +152,83 @@ BigBrain is a very large image (6572 × 7404 × 5711 voxels) reconstructed from
white_right_327680.gii \
classif/
link-mesh-fragments --no-colon-suffix mesh_labels.csv classif/


Conversion of the grey-level template image (sharded precomputed)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

.. code-block:: sh

volume-to-precomputed \
--generate-info \
--sharding 1,1,0 \
colin27T1_seg.nii.gz \
colin27T1_seg_sharded

At this point, you need to edit ``colin27T1_seg_sharded/info_fullres.json`` to set
``"data_type": "uint8"``. This is needed because ``colin27T1_seg.nii.gz`` uses
a peculiar encoding, with slope and intercept set in the NIfTI header, even
though only integers between 0 and 255 are encoded.

.. code-block:: sh

generate-scales-info colin27T1_seg_sharded/info_fullres.json colin27T1_seg_sharded/
volume-to-precomputed \
--sharding 1,1,0 \
colin27T1_seg.nii.gz \
colin27T1_seg_sharded/
compute-scales colin27T1_seg_sharded/


.. _Conversion of Big Brain to sharded precomputed format:

Big Brain (20um) has been converted to neuroglancer precomputed format, and
accessible at
https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit.
Using this as the source volume, a sharded volume will be created.

.. code-block:: sh
mkdir sharded_bigbrain/
curl --output sharded_bigbrain/info \
https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit/info

At this point, sharded_bigbrain/info was edited to contain the desired sharding
specification. For a smaller scale test run, 20um and 40um scales can be
removed.

.. code-block:: diff

{
"type": "image",
"data_type": "uint8",
"num_channels": 1,
"scales": [
{
"chunk_sizes": [[64,64,64]],
"encoding": "raw",
"key": "20um",
"resolution": [21166.6666666666666, 20000, 21166.6666666666666],
"size": [6572, 7404, 5711],
- "voxel_offset": [0, 0, 0]
+ "voxel_offset": [0, 0, 0],
+ "sharding": {
+ "@type": "neuroglancer_uint64_sharded_v1",
+ "data_encoding": "gzip",
+ "hash": "identity",
+ "minishard_bits": 2,
+ "minishard_index_encoding": "gzip",
+ "preshift_bits": 0,
+ "shard_bits": 2
+ }
},
// ...truncated for brevity
]
}

Start the conversion process.

.. code-block:: sh

convert-chunks \
https://neuroglancer.humanbrainproject.eu/precomputed/BigBrainRelease.2015/8bit \
./sharded_bigbrain/
2 changes: 1 addition & 1 deletion docs/script-usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ OUTSIDE_VALUE] volume_filename dest_url``.

You may want to use :ref:`convert-chunks <convert-chunks>` in a second step, to
further compres your dataset with JPEG or ``compressed_segmentation``
encoding).
encoding.


Converting image volumes
Expand Down
47 changes: 47 additions & 0 deletions docs/serving-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,50 @@ following Apache configuration (e.g. put it in a ``.htaccess`` file):
AddEncoding x-gzip .gz
AddType application/octet-stream .gz
</IfModule>


Serving sharded data
====================


Content-Encoding
----------------

Sharded data must be served without any `Content-Encoding header
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding>_`.


HTTP Range request
------------------

Sharded data must be served by a webserver that supports `Range header
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range>_`.

For development uses, python's bundled SimpleHTTPServer `does not support
this <https://github.com/python/cpython/issues/86809>_`. Recommended
alternatives are:

- `http-server (NodeJS)<https://www.npmjs.com/package/http-server>_`

- `RangeHTTPServer(Python) <https://github.com/danvk/RangeHTTPServer>_`

For production uses, most modern static web servers supports range requests.
The below is a list of web servers that were tested and works with sharded
volumes.

- nginx 1.25.3

- httpd 2.4.58

- caddy 2.7.5

In addition, most object storage also supports range requests without
additional configurations.


Enable Access-Control-Allow-Origin header
-----------------------------------------

`Access-Control-Allow-Origin
<https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Origin>_`
will need to be enabled if the volume is expected to be accessed cross origin.
5 changes: 1 addition & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
[build-system]
# We need support for entry_points in setup.cfg, which needs setuptools>=51.0.0
# according to the setuptools documentation. However, in my testing it works
# with version 50.3.2 which is the last to retain Python 3.5 compatibility.
requires = [
"setuptools>=50.3.2",
"setuptools>=51.0.0",
"wheel",
]
build-backend = "setuptools.build_meta"
50 changes: 50 additions & 0 deletions script_tests/test_scripts.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,56 @@ def test_all_in_one_conversion(examples_dir, tmpdir):
# with --mmap / --load-full-volume


def test_sharded_conversion(examples_dir, tmpdir):
input_nifti = examples_dir / "JuBrain" / "colin27T1_seg.nii.gz"
# The file may be present but be a git-lfs pointer file, so we need to open
# it to make sure that it is the actual correct file.
try:
gzip.open(str(input_nifti)).read(348)
except OSError as exc:
pytest.skip("Cannot find a valid example file {0} for testing: {1}"
.format(input_nifti, exc))

output_dir = tmpdir / "colin27T1_seg_sharded"
assert subprocess.call([
"volume-to-precomputed",
"--generate-info",
"--sharding", "1,1,0",
str(input_nifti),
str(output_dir)
], env=env) == 4 # datatype not supported by neuroglancer

with open(output_dir / "info_fullres.json", "r") as fp:
fullres_info = json.load(fp=fp)
with open(output_dir / "info_fullres.json", "w") as fp:
fullres_info["data_type"] = "uint8"
json.dump(fullres_info, fp=fp, indent="\t")

assert subprocess.call([
"generate-scales-info",
str(output_dir / "info_fullres.json"),
str(output_dir)
], env=env) == 0
assert subprocess.call([
"volume-to-precomputed",
"--sharding", "1,1,0",
str(input_nifti),
str(output_dir)
], env=env) == 0
assert subprocess.call([
"compute-scales",
"--downscaling-method=stride", # for test speed
str(output_dir)
], env=env) == 0

all_files = [f"{dirpath}/{filename}" for dirpath, _, filenames
in os.walk(output_dir)
for filename in filenames]

assert len(all_files) == 7, ("Expecting 7 files, but got "
f"{len(all_files)}.\n{all_files}")


def test_slice_conversion(tmpdir):
# Prepare dummy slices
path_to_slices = tmpdir / "slices"
Expand Down
3 changes: 1 addition & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ classifiers =
Intended Audience :: Science/Research
License :: OSI Approved :: MIT License
Programming Language :: Python :: 3
Programming Language :: Python :: 3.5
Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Expand All @@ -28,7 +27,7 @@ keywords = neuroimaging
package_dir =
= src
packages = find:
python_requires = ~=3.5
python_requires = ~=3.6
install_requires =
nibabel >= 2
numpy >= 1.11.0
Expand Down
49 changes: 46 additions & 3 deletions src/neuroglancer_scripts/accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
"""

import urllib.parse
import json

__all__ = [
"get_accessor_for_url",
Expand All @@ -35,15 +36,57 @@
r = urllib.parse.urlsplit(url)
if r.scheme in ("", "file"):
from neuroglancer_scripts import file_accessor
from neuroglancer_scripts import sharded_base
flat = accessor_options.get("flat", False)
gzip = accessor_options.get("gzip", True)
compresslevel = accessor_options.get("compresslevel", 9)
pathname = _convert_split_file_url_to_pathname(r)
return file_accessor.FileAccessor(pathname, flat=flat, gzip=gzip,
compresslevel=compresslevel)

accessor = file_accessor.FileAccessor(pathname, flat=flat, gzip=gzip,
compresslevel=compresslevel)
is_sharding = False
if accessor_options.get("sharding"):
is_sharding = True
if not is_sharding:
try:
info = json.loads(accessor.fetch_file("info"))
if sharded_base.ShardedAccessorBase.info_is_sharded(info):
is_sharding = True
except (DataAccessError, json.JSONDecodeError):
# In the event that info does not exist
# Or info is malformed
# Fallback to default behavior
...

if is_sharding:
from neuroglancer_scripts import sharded_file_accessor
return sharded_file_accessor.ShardedFileAccessor(pathname)

return accessor

elif r.scheme in ("http", "https"):
from neuroglancer_scripts import http_accessor
return http_accessor.HttpAccessor(url)
from neuroglancer_scripts import sharded_base
accessor = http_accessor.HttpAccessor(url)

is_sharding = False
if "sharding" in accessor_options:
is_sharding = True

Check warning on line 74 in src/neuroglancer_scripts/accessor.py

View check run for this annotation

Codecov / codecov/patch

src/neuroglancer_scripts/accessor.py#L74

Added line #L74 was not covered by tests
if not is_sharding:
try:
info = json.loads(accessor.fetch_file("info"))
if sharded_base.ShardedAccessorBase.info_is_sharded(info):
is_sharding = True
except (DataAccessError, json.JSONDecodeError):
# In the event that info does not exist
# Or info is malformed
# Fallback to default behavior
...

if is_sharding:
from neuroglancer_scripts import sharded_http_accessor
return sharded_http_accessor.ShardedHttpAccessor(url)
return accessor
else:
raise URLError("Unsupported URL scheme {0} (must be file, http, or "
"https)".format(r.scheme))
Expand Down
4 changes: 4 additions & 0 deletions src/neuroglancer_scripts/dyadic_pyramid.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,10 +148,14 @@ def downscale_info(scale_level):


def compute_dyadic_scales(precomputed_io, downscaler):
from neuroglancer_scripts import sharded_file_accessor
for i in range(len(precomputed_io.info["scales"]) - 1):
compute_dyadic_downscaling(
precomputed_io.info, i, downscaler, precomputed_io, precomputed_io
)
if isinstance(precomputed_io.accessor,
sharded_file_accessor.ShardedFileAccessor):
precomputed_io.accessor.close()


def compute_dyadic_downscaling(info, source_scale_index, downscaler,
Expand Down
18 changes: 15 additions & 3 deletions src/neuroglancer_scripts/scripts/scale_stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,27 @@
for scale in info["scales"]:
scale_name = scale["key"]
size = scale["size"]

shard_info = "Unsharded"
shard_spec = scale.get("sharding")
sharding_num_directories = None
if shard_spec:
shard_bits = shard_spec.get("shard_bits")
shard_info = f"Sharded: {shard_bits}bits"
sharding_num_directories = 2 ** shard_bits + 1

Check warning on line 33 in src/neuroglancer_scripts/scripts/scale_stats.py

View check run for this annotation

Codecov / codecov/patch

src/neuroglancer_scripts/scripts/scale_stats.py#L31-L33

Added lines #L31 - L33 were not covered by tests

for chunk_size in scale["chunk_sizes"]:
size_in_chunks = [(s - 1) // cs + 1 for s,
cs in zip(size, chunk_size)]
num_chunks = np.prod(size_in_chunks)
num_directories = size_in_chunks[0] * (1 + size_in_chunks[1])
num_directories = (
sharding_num_directories
if sharding_num_directories is not None
else size_in_chunks[0] * (1 + size_in_chunks[1]))
size_bytes = np.prod(size) * dtype.itemsize * num_channels
print("Scale {}, chunk size {}:"
print("Scale {}, {}, chunk size {}:"
" {:,d} chunks, {:,d} directories, raw uncompressed size {}B"
.format(scale_name, chunk_size,
.format(scale_name, shard_info, chunk_size,
num_chunks, num_directories,
readable_count(size_bytes)))
total_size += size_bytes
Expand Down
5 changes: 5 additions & 0 deletions src/neuroglancer_scripts/scripts/volume_to_precomputed.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ def parse_command_line(argv):
help="input value that will be mapped to the maximum "
"output value")

group.add_argument("--sharding", type=str, default=None,
help="enable sharding. Value must be int,int,int, "
"representing minishard encoding bits, shard encoding"
"bits and preshift bits respectively.")

neuroglancer_scripts.accessor.add_argparse_options(parser)
neuroglancer_scripts.chunk_encoding.add_argparse_options(parser,
allow_lossy=False)
Expand Down
Loading
Loading