Skip to content

Volume contrast limits

Sean Martin edited this page Oct 17, 2024 · 2 revisions

Volume contrast limits

Overview

See precompute/contrast_limits.py for the main code. The overall process is as follows:

  1. Load the data from the zarr file.
  2. Optionally clip a small portion of the data along the z-axis around the central slice.
    1. By default, this is used for the gradient of the cumulative distribution function (CDF) method.
    2. But for the Gaussian mixture model (GMM) method, the full volume is used.
  3. Randomly sample a number of voxels from the clipped data. This is useful to speed up the calculation of the contrast limits, and also to alleviate the impact of very high intensity outliers.
  4. Calculate the contrast limits using this optionally clipped and sampled data as input.

Public API

The public API is available in the contrast_limits.py file. The main function is compute_contrast_limits, which takes the following parameters:

def compute_contrast_limits(
    data: da.Array | np.ndarray,
    method: Literal["gmm", "cdf"] = "gmm",
    z_radius: int | None | Literal["auto", "computed"] = "auto",
    downsampling_ratio: float | None = 0.3,
) -> tuple[float, float]:
    ...
  • data: The input data, which can be a Dask array or a NumPy array.
  • method: The method to use for calculating the contrast limits. Can be either "gmm" or "cdf".
  • z_radius: The radius to clip the data along the z-axis from the centre. Currently set to "auto" which allows more slices to be used for the GMM than the CDF method.
  • downsampling_ratio: The ratio to downsample the data by. This is used to speed up the calculation of the contrast limits. The default is 0.3, which means that 30% of the data is randomly sampled. The sampling is performed after the optional clipping along the z-axis.

Example

Download a tomogram from the API, calculate contrast limits and generate a Neuroglancer state file.

import argparse
import json
from pathlib import Path

from cryoet_data_portal import Client, Tomogram
from neuroglancer import viewer_state
from neuroglancer.url_state import to_url
from scipy.spatial.transform import Rotation

from cryoet_data_portal_neuroglancer.io import load_omezarr_data
from cryoet_data_portal_neuroglancer.precompute.contrast_limits import (
    compute_contrast_limits,
)
from cryoet_data_portal_neuroglancer.state_generator import combine_json_layers, generate_image_layer

# Set up logging - level is info
# logging.basicConfig(level=logging.INFO, force=True)

OUTPUT_FOLDER = "/media/starfish/LargeSSD/data/cryoET/data/FromAPI"

id_to_path_map = {
    773: "773/Position_513.zarr",
}

id_to_source_map = {
    773: (
        "https://files.cryoetdataportal.cziscience.com/10004/Position_513/Tomograms/VoxelSpacing7.560/CanonicalTomogram/Position_513.zarr",
        (756e-10, 756e-10, 756e-10),
    ),
}


def grab_tomogram(id_: int, zarr_path: Path):
    client = Client()
    if not zarr_path.exists():
        zarr_path.mkdir(parents=True, exist_ok=True)
        tomogram = Tomogram.get_by_id(client, id_)
        tomogram.download_omezarr(str(zarr_path.parent.resolve()))


def run_contrast_limit_calculations_from_api(input_data_path):
    data = load_omezarr_data(input_data_path, resolution_level=-1, persist=False)
    data_shape = data.shape
    data_size_dict = {"z": data_shape[0], "y": data_shape[1], "x": data_shape[2]}

    limits_dict = {}
    # Ensure the main public API is working
    gmm_limits = compute_contrast_limits(
        data,
        method="gmm",
    )
    cdf_limits = compute_contrast_limits(
        data,
        method="cdf",
    )
    limits_dict["gmm"] = gmm_limits
    limits_dict["cdf"] = cdf_limits
    limits_dict["2d"] = gmm_limits
    limits_dict["size"] = data_size_dict
    return limits_dict


def create_state(id_, contrast_limit_dict, output_folder):
    source, scale = id_to_source_map[id_]
    # One state layer for each contrast limit method
    layers_list = []
    ignored_keys = ["size", "closest_method", "distance_to_human", "2d"]

    # set the most promising contrast limits as visible, rest not
    for key, limits in contrast_limit_dict.items():
        if key in ignored_keys:
            continue
        is_visible = key == "gmm"
        layer_info = generate_image_layer(
            source=source,
            scale=scale,
            size=contrast_limit_dict["size"],
            name=f"{id_}_{key}",
            twodee_contrast_limits=limits,
            threedee_contrast_limits=limits,
            has_volume_rendering_shader=True,
            volume_rendering_is_visible=True,
            is_visible=is_visible,
        )
        layer_info["_projectionScale"] = 2000
        layers_list.append(layer_info)
    json_state = combine_json_layers(
        layers_list,
        scale,
        projection_quaternion=Rotation.from_euler(seq="xyz", angles=(0, 0, 0), degrees=True).as_quat(),
        show_axis_lines=False,
    )
    with open(output_folder / f"{id_}_state.json", "w") as f:
        json.dump(json_state, f, indent=4)
    return json_state


def main(output_folder):
    url_list = []
    for id_, path in id_to_path_map.items():
        path = Path(output_folder) / path
        grab_tomogram(id_, path)
        limits = run_contrast_limit_calculations_from_api(path)
        state = create_state(id_, limits, Path(output_folder) / "results")
        viewer_state_obj = viewer_state.ViewerState(state)
        url_from_json = to_url(
            viewer_state_obj,
            prefix="https://neuroglancer-demo.appspot.com/",
        )
        url_list.append(url_from_json)
    print(f"Wrote {len(url_list)} urls to {Path(output_folder) / 'urls.txt'}")
    with open(Path(output_folder) / "urls.txt", "w") as f:
        f.write("\n\n\n".join(url_list))


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--output_folder", type=str, default=OUTPUT_FOLDER)
    args, _ = parser.parse_known_args()
    main(args.output_folder)
See JSON state for this setup
{
  "dimensions": {
    "x": [
      7.56e-8,
      "m"
    ],
    "y": [
      7.56e-8,
      "m"
    ],
    "z": [
      7.56e-8,
      "m"
    ]
  },
  "position": [
    128,
    128,
    56
  ],
  "crossSectionScale": 1,
  "projectionOrientation": [
    0.3826834261417389,
    0,
    0,
    0.9238795042037964
  ],
  "projectionScale": 2000,
  "layers": [
    {
      "type": "image",
      "source": {
        "url": "zarr://https://files.cryoetdataportal.cziscience.com/10004/Position_513/Tomograms/VoxelSpacing7.560/CanonicalTomogram/Position_513.zarr",
        "transform": {
          "outputDimensions": {
            "z": [
              7.56e-8,
              "m"
            ],
            "y": [
              7.56e-8,
              "m"
            ],
            "x": [
              7.56e-8,
              "m"
            ]
          },
          "inputDimensions": {
            "z": [
              7.56e-8,
              "m"
            ],
            "y": [
              7.56e-8,
              "m"
            ],
            "x": [
              7.56e-8,
              "m"
            ]
          }
        }
      },
      "tab": "rendering",
      "opacity": 0.51,
      "shader": "#uicontrol invlerp contrast\n#uicontrol bool invert_contrast checkbox\n#uicontrol invlerp contrast_3D(clamp=false)\n#uicontrol bool invert_contrast_3D checkbox\n#uicontrol bool hide_values_outside_limits_3D checkbox\n\nfloat get_contrast() {\n  float value = invert_contrast ? 1.0 - contrast() : contrast();\n  return value;\n}\nfloat get_contrast_3D() {\n  float value = invert_contrast_3D ? 1.0 - contrast_3D() : contrast_3D();\n  value = (hide_values_outside_limits_3D && value > 1.0) ? 0.0 : clamp(value, 0.0, 1.0);\n  return value;\n}\n\nvoid main() {\n  float outputValue;\n  if (VOLUME_RENDERING) {\n    outputValue = get_contrast_3D();\n    emitIntensity(outputValue);\n  } else {\n    outputValue = get_contrast();\n  }\n  emitGrayscale(outputValue);\n}",
      "shaderControls": {
        "contrast": {
          "range": [
            -21046.951172091438,
            22181.530212800113
          ],
          "window": [
            -29692.64744906975,
            30827.226489778426
          ]
        },
        "contrast_3D": {
          "range": [
            -21046.951172091438,
            22181.530212800113
          ],
          "window": [
            -107503.91394187455,
            108638.49298258321
          ]
        },
        "invert_contrast_3D": true
      },
      "volumeRendering": "on",
      "volumeRenderingGain": -7.8,
      "volumeRenderingDepthSamples": 128,
      "name": "773_gmm"
    },
    {
      "type": "image",
      "source": {
        "url": "zarr://https://files.cryoetdataportal.cziscience.com/10004/Position_513/Tomograms/VoxelSpacing7.560/CanonicalTomogram/Position_513.zarr",
        "transform": {
          "outputDimensions": {
            "z": [
              7.56e-8,
              "m"
            ],
            "y": [
              7.56e-8,
              "m"
            ],
            "x": [
              7.56e-8,
              "m"
            ]
          },
          "inputDimensions": {
            "z": [
              7.56e-8,
              "m"
            ],
            "y": [
              7.56e-8,
              "m"
            ],
            "x": [
              7.56e-8,
              "m"
            ]
          }
        }
      },
      "tab": "rendering",
      "opacity": 0.51,
      "shader": "#uicontrol invlerp contrast\n#uicontrol bool invert_contrast checkbox\n#uicontrol invlerp contrast_3D(clamp=false)\n#uicontrol bool invert_contrast_3D checkbox\n#uicontrol bool hide_values_outside_limits_3D checkbox\n\nfloat get_contrast() {\n  float value = invert_contrast ? 1.0 - contrast() : contrast();\n  return value;\n}\nfloat get_contrast_3D() {\n  float value = invert_contrast_3D ? 1.0 - contrast_3D() : contrast_3D();\n  value = (hide_values_outside_limits_3D && value > 1.0) ? 0.0 : clamp(value, 0.0, 1.0);\n  return value;\n}\n\nvoid main() {\n  float outputValue;\n  if (VOLUME_RENDERING) {\n    outputValue = get_contrast_3D();\n    emitIntensity(outputValue);\n  } else {\n    outputValue = get_contrast();\n  }\n  emitGrayscale(outputValue);\n}",
      "shaderControls": {
        "contrast": {
          "range": [
            -31457.933670043945,
            63952.12646484375
          ],
          "window": [
            -50539.94569702148,
            83034.13849182129
          ]
        },
        "contrast_3D": {
          "range": [
            -31457.933670043945,
            63952.12646484375
          ],
          "window": [
            -222278.05393981934,
            254772.24673461914
          ]
        },
        "invert_contrast_3D": true
      },
      "volumeRendering": "on",
      "volumeRenderingGain": -7.8,
      "volumeRenderingDepthSamples": 128,
      "name": "773_cdf",
      "visible": false
    }
  ],
  "showAxisLines": false,
  "showSlices": false,
  "selectedLayer": {
    "visible": true,
    "layer": "773_gmm"
  },
  "crossSectionBackgroundColor": "#000000",
  "layout": "4panel"
}

Contrast limit methods

Two methods are available for calculating the contrast limits:

Gaussian mixture model (GMM)

The GMM method helps in determining contrast limits by fitting the data to a Gaussian Mixture Model (GMM) with varying numbers of components, then selecting the optimal component amount based on the Bayesian Information Criterion (BIC).

The overall idea is that in the ideal case, we have three groups of voxels in the data: background, signal, and noise. The GMM method helps in identifying the signal group and setting the contrast limits around it.

However, this may not always fit well, and as such BIC is used to determine the best fit among the models with 1, 2, or 3 components. The model with the lowest BIC is selected.

Steps

  1. Fitting the GMM:
    1. Perform GMM fitting on the data with 1, 2, or 3 components.
    2. Use the BIC score to determine the best fit among the models and select the one with the lowest BIC.
  2. Defining the Contrast Limits:
    1. Identify the Gaussian component nearest to the data mean, referred to as G.
    2. Move outwards from the mean of G on both sides to define the contrast limits, using the standard deviation of G.
  3. Tuning for Multiple Components:
    1. If the model has only one Gaussian, use a smaller range around the mean for contrast limits, as the data may lack discernible patterns.
    2. For models with more components, you can expand the range because the signal is better isolated, allowing you to include more of the data.
    3. The exact tuning values are ideally obtained through further experimentation. This has been performed already, though limited labeled examples restrict tuning precision. See the optimize methods in the contrast_limits.py file for ideas.

Cumulative distribution function (CDF) gradient

This method combines two approaches to calculate contrast limits based on the cumulative distribution function (CDF) and its gradient.

The overall idea here is that the centre of the contrast limits is set at the peak of the gradient of the CDF. The gradient is used to determine the start and end regions of the data, and the contrast limits are set based on the gradient values in these regions. This involves estimating the gradient thresholds based on the mean gradient values in the start and end regions. And then looking for the first points where the gradient exceeds the starting threshold and falls below the ending threshold after reaching the peak.

Steps

  1. Compute CDF and Gradient:
    1. Calculate the CDF of the data and compute its gradient.
    2. Identify the first Nf number of samples (start region) and the last Ne number of samples (end region) where the gradient rises/falls above or below 30% of the peak value.
  2. Gradient Threshold Calculation:
    1. Compute the mean of the gradient over the identified start and end regions (Nf and Ne).
    2. Set the starting and ending gradient thresholds based on these mean values.
  3. Determine Contrast Limits, the contrast limits are defined as:
    1. The first point where the gradient exceeds the starting threshold.
    2. The first point where the gradient falls below the ending threshold after reaching the gradient peak.
  4. The 30% Peak Threshold: Hyper-parameter tuning suggests 30% as a good threshold for the gradient peak. In the future, this value could potentially be automatically determined based on the data distribution.