Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion factors from um to nanometre added in load_gwy in io.py #659

Closed
wants to merge 11 commits into from
Closed

Conversation

iobataya
Copy link

@iobataya iobataya commented Sep 9, 2023

Some of .gwy files raise the following ValueError in io.py to fail to read.

ValueError: Units 'um' have not been added for .gwy files
Please add an SI to nanometre conversion factor for these units in _gwy_read_component in io.py

Conversion factors for 'mm' and 'um' were added. (My files contain only 'um' unit, but 'mm' was also added just in case.)

Some of .gwy files raise ValueError in io.py.

ValueError: Units 'um' have not been added for .gwy files
# XY, Z units correction
* units for px_to_nm to multiply data array should be picked up from 'si_unit_z' instetad of 'si_unit_xy'.
* Non-square image should be considered. Probably, px_to_nm_z, px_to_nm_x, px_to_nm_y will be necessary.

# Channel index
Data key of gwy sometimes starts from 1 or larger. (ex. '/4/data') It happens when image is cropped on Gwyddion, and it remains after saving even other channels are removed.

A loop to scan indices was added. And it guesses whether if the channel is topography or not by si_unit_z.
Unit is set during scanning of channel index.
@SylviaWhittle
Copy link
Collaborator

Hi @iobataya , thank you for looking into this issue.
My initial go at writing code to load .gwy files was rushed and I had planned to improve it when I had some free time, but you seem to have done it for me, thank you!

If you need any assistance with getting the tests to work, then let me know :)

@iobataya
Copy link
Author

Hi @SylviaWhittle, thank you for the comment. That was my issue with the Jupyter Notebook. I'm going to work on it within my forked repository for a while. I'd like to see if we can get an np.ndarray from a .gwy file on the Notebook smoothly.

For now, I'm trying,

  1. To implement auto detection of a topography channel when it has an unexpected key index like '/4/data'. This needs a Gwyddion file after its channels were edited/arranged. Saved data has such keys even if it is the first channel.
  2. To handle non-square images. This may require two scalings, x_pixel_to_nm_scaling and y_pixel_to_nm_scaling.

iobataya and others added 2 commits September 12, 2023 06:43
# transposed np.ndarray corrected
_gwy_read_component returns transposed dimensions of (xres,yres), this corrected to (yres, xres)

# load_gwy updated
* load_gwy can find multichannel data starting from any index (ex. 4/data)
* load_gwy can guess whether if it is height channel by unitstr
* load_gwy supports m,mm,um for z-unit
* px_to_nm_x, px_to_nm_y were added for future usage, stored in img_dict
Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @iobataya really appreciate it.

I've made some comments in line that are perhaps helpful.

I had a thought that making the conv_factors into a dictionary that is read from the topostats/default_config.yaml which would have the benefit of making it easy for users to add additional conversion factors if required. This shouldn't block merging though and we can make this a subsequent issue to make it configurable.

Currently though the Continuous Integration that runs the test suite fails because of the reliance on the regex package that I've highlighted and there are two options for dealing with this that I've suggested.

On the topic of tests I was wondering if you are familiar with writing unittests? We use the pytest framework (rather than the standard library unittest) and try and keep our test coverage > 80%.

It would be nice to have some tests covering the use of these different conversion factors if you are familiar with writing tests? If not don't worry we can create a separate issue for this.

topostats/io.py Outdated Show resolved Hide resolved
topostats/io.py Outdated Show resolved Hide resolved
topostats/io.py Outdated

re = r"\/(\d+)\/data$"
components = list(image_data_dict.keys())
conv_factors = {"m": 1e9, "um": 1e6, "mm": 1e3}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could perhaps be defined as a dictionary outside of the function, or perhaps even moved to default_config.yaml which may make it easier for users to add additional conversion factors without having to hack at the code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree that such the dict should not be hard-coded. If default_config.yaml can manage units conversion, TopoStats can support additional units by other SPM-techniques.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Externalized Scaling units to configuration with Scale class

There are hard-coded strings of unit and conversion factors in io.py. I tried to externalize the strings and factors. The values are configurable by default_configuration.yaml. User can also add other units like "V" or "N" in configuration.yaml. Such conversion factors can be used for various signals from SPM in the future. (ex. invOLS or spring constants)

loading:
  channel: Height # Channel to pull data from in the data files.
  scale:
    nm:
      {m: 1e9, mm: 1e6, um: 1e3, nm: 1.0} # factors to convert to nm

The configuration is read by a created small class "Scale". The class has conversion factors accessible by two str keys, unit_to and unit_from.

self.scale = Scale(config["loading"]["scale"])
unit_from = "um"
factor = self.scale.get_factor("nm", unit_from)

This scale object can hold scaling factors of image as well as that of physical units. So it's useful to support non-square images (ex. 512 x 256). In load_gwy, the scale object holds "px_to_nm_x" and "px_to_nm_y" in the scale object, and the object is added as img_dict["scale"]. Users can access the scale object after loading. (see also add_to_dict)

# example (simplified from actual code)
m2nm = self.scale.get_factor("nm", "m")
self.scale.add_factor("nm", "px_to_nm", xreal * m2nm / image.shape[1])
self.scale.add_factor("nm", "px_to_nm_x", xreal * m2nm / image.shape[1])
self.scale.add_factor("nm", "px_to_nm_y", yreal * m2nm / image.shape[0])

Currently, load function returns a tuple of 'image' and 'pixel_to_nm'. When other functions (load_jpk, load_spm,..) implement this img_dict["scale"], load function can just a single variable, 'image' and the scale object hold scaling factors for non-square images and more.

topostats/io.py Outdated Show resolved Hide resolved
iobataya and others added 5 commits September 15, 2023 18:58
io.py modified
 - Conversion from gwy.DataField to np.ndarray was corrected.
   (dimension was transposed)
 - Scale class added to conversion of units and holds conversion factors
   - Initializable by configuration YAML file
   - This object is copied to img_dict of data
      accessible scan.img_dict[filename]["scale"]
      px_to_nm_x, px_to_nm_y values in img_dict from Gwy

Testing and Configurations
 - Resource files (file_landscape.gwy, file.square.gy) added.
 - conftest.py modified to read file.gwy and file_landscape.gwy
 - Test for Scale class
    - test_scale_config was added.
    - test_scale in test_io.py was added.
- "scale" configuration was added to default_config.yaml
@iobataya
Copy link
Author

On the topic of tests I was wondering if you are familiar with writing unittests? We use the pytest framework (rather than the standard library unittest) and try and keep our test coverage > 80%.

It would be nice to have some tests covering the use of these different conversion factors if you are familiar with writing tests? If not don't worry we can create a separate issue for this.

I added codes with tests for pytest. I slightly changed conftest.py. And I added synthetic gwy files in resources for tests. In my environment, pytest of test_io.py passes.

However, I have a few things unclear for testing.
#1. I'm not sure how to estimate whole coverage containing the modified io.py. I tried some, I got errors from test_validation.py.
#2. I couldn't access a configuration dict of topostats in the test framework. I wrote as follows in order to pass pytest, but yaml should be read multiple times. Probably, good fixture definition can solve, but I couldn't find the way.

default_config = pkg_resources.open_text(__package__, "default_config.yaml").read()
config = yaml.safe_load(default_config)
self.scale = Scale(config["loading"]["scale"])

I'd appreciate your advice for those.

@ns-rse
Copy link
Collaborator

ns-rse commented Sep 18, 2023

Hi @iobataya

Thanks for taking on the task of addressing the feedback really appreciate it.

To address your questions....

Coverage

Coverage if reported after running tests as we have the pytest-cov included in the dev dependencies and so if you were to pip install -e /[dev] it would install the plugin and coverage would be reported. What I should really do is add codecov to the CI/CD pipeline so its reported in PRs too (I've created #668 and will address that).

I'll look at why tests are currently failing on this branch later this week.

Configuration

You've correctly clocked that configurations are loaded by pytest from tests/conftest.py, the pytest framework looks for this file and loads fixtures and other objects automatically.

In conftest.py the topostats/default_config.yaml gets loaded once. We then create a series of fixtures, one for each major section of the configuration. The loading section gets subset out into its own dictionary on line 102-106 and is then available to use as a fixture called loading_config in tests.

This means you don't need to repeat the snippet of code that loads the topostats/default_config.yaml and can instead, as you've commented, use the loading_config fixture.

I'll leave more comments in line.

if match == None:
# How can we get current configuration["loading"] ?
# default_config.yaml is used to get conf["loading"]["scale"]
default_config = pkg_resources.open_text(__package__, "default_config.yaml").read()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than loading the config again we have structured classes to take arguments and use the same argument name in the YAML configuration and the parameters are passed in using Pythons **kwargs functionality as the names match.

Thus to get the loading dictionary we need to add an argument to the __init__ method of the LoadScans and on line 123 of topostats/run_topostats.py this is loaded automatically and instead the line below can be modified to extract the scale.

# default_config.yaml is used to get conf["loading"]["scale"]
default_config = pkg_resources.open_text(__package__, "default_config.yaml").read()
config = yaml.safe_load(default_config)
self.scale = Scale(config["loading"]["scale"])
Copy link
Collaborator

@ns-rse ns-rse Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above suggestion we can move this to the __init__ method (as PyLint will complain about self. being defined outside of __init__.

The values will be in the argument scale so something like...

    def __init__(
        self,
        img_paths: list,
        channel: str,
        scale: dict,
    ):
    ...
    self.scale = scale

...would suffice and the value will cascade through.

Would need to add details of the additional parameter to the docstring too.

@@ -449,6 +452,49 @@ def convert_basename_to_relative_paths(df: pd.DataFrame):
return df


class Scale:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this and am wondering if making this a class is perhaps over-complicating things?

Scaling is a simple multiplication really, its just a case of selecting the correct scaling factor based on the units used in the scans and what is desired.

We can have sensible defaults defined in the configuration which also allows users to extend the possible scalings and could also be setup to allow the selection of the desired scaling using a simple function to select the correct options from the dictionary.

I realise this is a fairly poorly formed idea and lacks code but what do you think @iobataya ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ns-rse ,
Since the loading dict is available in LoadScans by receiving it in its constructor now, the config dict will cover the factors without the Scale class. I agree that Scale class is too much and not necessary.

According to that direction, I'm planning to make change next as follows.

  • to make LoadScans receives loading_config dict.
  • to make load_gwy get the factors as dict from the configuration YAML
  • to eliminate Scale class.
  • to handle xres/yres for gwy correctly (square, landscape)
  • to copy the scaling factors (units, px_in_nm, px_in_nm_x, px_in_nm_y) in img_dict of image in add_dict method

I'll apply codes and test codes in my forked repository to check at first in this month. I'll try run tests with cov module of pytest.

I'd appreciate your ideas and advice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @iobataya

This all sounds eminently reasonable and bulks out my above comments with more solid ideas, thank you.

I appreciate this is somewhat divergent to your original PR and am grateful of you taking the time to work through these things.

A couple of things that might be useful.

Merge conflicts

As we've had a couple of merges recently there are now merge conflicts, I'll resolve these on GitHub but you will need to update your local branch with git pull to pull those changes down.

Development tools

You may want to install some of the development tools we have listed as optional dependencies and make sure pre-commit is installed and running.

Dev tools can be installed if you do the following in your virtual environment (assuming you are using one, if not this might be a useful resource Conda environments for effective and reproducible research)...

cd ~/path/to/TopoStats
pip install -e .[dev]

This will pull in pytest-cov and tell you what the coverage change is although don't worry too much we have GitHub working with CodeCov to report these things.

However, we are increasingly trying to adhere to PEP8 code style and NumpyDocstyle for docstrings and have pre-commit hooks in place which when enabled will check changes are compliant prior to allowing permits (we also use pre-commit.ci to check pull requests but its a faster feedback loop to enable pre-commit locally).

Once you've installed the dev requirements you can in your VirtualEnv install the pre-commit hooks with pre-commit install and then when you make commits the checks will run and tell you if there are any problems (some will be automatically corrected if possible). I've a blog post I've written on pre-commit that you may find useful pre-commit: Protecting your future self if you want to find out more.

Please do let me know if you've any problems, more than happy to help and could even have a chat over video if you think that would be useful.

@ns-rse

Copy link
Author

@iobataya iobataya Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ns-rse,

I appreciate your info about git pull and pip install -e .. Especially, pip install -e . saved tons of time. And pytest-cov works fine to show coverage of codes. The test_io.py covers 84%.

I'm happy that I've reached the topostats in my branch can load gwy of any sizes of image (square and non-square) with arbitary pixel/lines aspect ratio. Now this makes me to proceed my project. I'll continue to improve the codes, and format codes to the expected style.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant, that is great to hear @iobataya 👍

You'll notice below that there are a couple of merge conflicts that have arisen. This arises when changes have been made to files after you forked the repository and Git doesn't know which should be kept.

They need manually resolving, probably the easiest is to do that here on GitHub using the links/buttons below. You then choose which bit of code should be retained and a commit will be made to your branch here on GitHub which you then git pull down to your local copy.

Let me know if you've any problems or would like help with this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ns-rse Sorry for my silence for a while. I prepared for domestic conference commencing tomorrow. My program using TopoStats is public now (https://github.com/iobataya/line_detection). Once the conference and other events are over and I take a break, I will start organizing the codes here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @iobataya I hope the conference went well.

@iobataya iobataya closed this by deleting the head repository Dec 25, 2023
iobataya added a commit to iobataya/TopoStats that referenced this pull request Dec 26, 2023
This is discussed before in another pull request AFM-SPM#659. Tested in a new branch in a forked Topostats because of some conflicts with current Topostats.

# Issues found in loading Gwyddion file.
 - only /0/, /1/ channels were read.
 - X,Y are flipped in Numpy.ndarray
  - more SI units were necessary (depends on files)
 - No support for non-square image

# Fixed by this modifications
 - load_gwy can recognize channel name and unit for z value to load
 - Channel name/channel unit can be configurable in YAML
 - Flipped XY was fixed.
 - Non-square image can be loaded with its scaling params.
 - Loading conf and image scaling parameters are stored in .img_dict["loading_conf"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants