Conversion factors from um to nanometre added in load_gwy in io.py #659

iobataya · 2023-09-09T08:03:07Z

Some of .gwy files raise the following ValueError in io.py to fail to read.

ValueError: Units 'um' have not been added for .gwy files
Please add an SI to nanometre conversion factor for these units in _gwy_read_component in io.py

Conversion factors for 'mm' and 'um' were added. (My files contain only 'um' unit, but 'mm' was also added just in case.)

Some of .gwy files raise ValueError in io.py. ValueError: Units 'um' have not been added for .gwy files

# XY, Z units correction * units for px_to_nm to multiply data array should be picked up from 'si_unit_z' instetad of 'si_unit_xy'. * Non-square image should be considered. Probably, px_to_nm_z, px_to_nm_x, px_to_nm_y will be necessary. # Channel index Data key of gwy sometimes starts from 1 or larger. (ex. '/4/data') It happens when image is cropped on Gwyddion, and it remains after saving even other channels are removed. A loop to scan indices was added. And it guesses whether if the channel is topography or not by si_unit_z.

Unit is set during scanning of channel index.

SylviaWhittle · 2023-09-11T00:25:27Z

Hi @iobataya , thank you for looking into this issue.
My initial go at writing code to load .gwy files was rushed and I had planned to improve it when I had some free time, but you seem to have done it for me, thank you!

If you need any assistance with getting the tests to work, then let me know :)

iobataya · 2023-09-11T03:00:11Z

Hi @SylviaWhittle, thank you for the comment. That was my issue with the Jupyter Notebook. I'm going to work on it within my forked repository for a while. I'd like to see if we can get an np.ndarray from a .gwy file on the Notebook smoothly.

For now, I'm trying,

To implement auto detection of a topography channel when it has an unexpected key index like '/4/data'. This needs a Gwyddion file after its channels were edited/arranged. Saved data has such keys even if it is the first channel.
To handle non-square images. This may require two scalings, x_pixel_to_nm_scaling and y_pixel_to_nm_scaling.

# transposed np.ndarray corrected _gwy_read_component returns transposed dimensions of (xres,yres), this corrected to (yres, xres) # load_gwy updated * load_gwy can find multichannel data starting from any index (ex. 4/data) * load_gwy can guess whether if it is height channel by unitstr * load_gwy supports m,mm,um for z-unit * px_to_nm_x, px_to_nm_y were added for future usage, stored in img_dict

ns-rse

Thanks for the PR @iobataya really appreciate it.

I've made some comments in line that are perhaps helpful.

I had a thought that making the conv_factors into a dictionary that is read from the topostats/default_config.yaml which would have the benefit of making it easy for users to add additional conversion factors if required. This shouldn't block merging though and we can make this a subsequent issue to make it configurable.

Currently though the Continuous Integration that runs the test suite fails because of the reliance on the regex package that I've highlighted and there are two options for dealing with this that I've suggested.

On the topic of tests I was wondering if you are familiar with writing unittests? We use the pytest framework (rather than the standard library unittest) and try and keep our test coverage > 80%.

It would be nice to have some tests covering the use of these different conversion factors if you are familiar with writing tests? If not don't worry we can create a separate issue for this.

topostats/io.py

ns-rse · 2023-09-12T13:11:46Z

topostats/io.py

+
+            re = r"\/(\d+)\/data$"
+            components = list(image_data_dict.keys())
+            conv_factors = {"m": 1e9, "um": 1e6, "mm": 1e3}


This could perhaps be defined as a dictionary outside of the function, or perhaps even moved to default_config.yaml which may make it easier for users to add additional conversion factors without having to hack at the code.

I totally agree that such the dict should not be hard-coded. If default_config.yaml can manage units conversion, TopoStats can support additional units by other SPM-techniques.

Externalized Scaling units to configuration with Scale class

There are hard-coded strings of unit and conversion factors in io.py. I tried to externalize the strings and factors. The values are configurable by default_configuration.yaml. User can also add other units like "V" or "N" in configuration.yaml. Such conversion factors can be used for various signals from SPM in the future. (ex. invOLS or spring constants)

loading: channel: Height # Channel to pull data from in the data files. scale: nm: {m: 1e9, mm: 1e6, um: 1e3, nm: 1.0} # factors to convert to nm

The configuration is read by a created small class "Scale". The class has conversion factors accessible by two str keys, unit_to and unit_from.

self.scale = Scale(config["loading"]["scale"]) unit_from = "um" factor = self.scale.get_factor("nm", unit_from)

This scale object can hold scaling factors of image as well as that of physical units. So it's useful to support non-square images (ex. 512 x 256). In load_gwy, the scale object holds "px_to_nm_x" and "px_to_nm_y" in the scale object, and the object is added as img_dict["scale"]. Users can access the scale object after loading. (see also add_to_dict)

# example (simplified from actual code) m2nm = self.scale.get_factor("nm", "m") self.scale.add_factor("nm", "px_to_nm", xreal * m2nm / image.shape[1]) self.scale.add_factor("nm", "px_to_nm_x", xreal * m2nm / image.shape[1]) self.scale.add_factor("nm", "px_to_nm_y", yreal * m2nm / image.shape[0])

Currently, load function returns a tuple of 'image' and 'pixel_to_nm'. When other functions (load_jpk, load_spm,..) implement this img_dict["scale"], load function can just a single variable, 'image' and the scale object hold scaling factors for non-square images and more.

topostats/io.py

io.py modified - Conversion from gwy.DataField to np.ndarray was corrected. (dimension was transposed) - Scale class added to conversion of units and holds conversion factors - Initializable by configuration YAML file - This object is copied to img_dict of data accessible scan.img_dict[filename]["scale"] px_to_nm_x, px_to_nm_y values in img_dict from Gwy Testing and Configurations - Resource files (file_landscape.gwy, file.square.gy) added. - conftest.py modified to read file.gwy and file_landscape.gwy - Test for Scale class - test_scale_config was added. - test_scale in test_io.py was added. - "scale" configuration was added to default_config.yaml

iobataya · 2023-09-17T13:52:05Z

On the topic of tests I was wondering if you are familiar with writing unittests? We use the pytest framework (rather than the standard library unittest) and try and keep our test coverage > 80%.

It would be nice to have some tests covering the use of these different conversion factors if you are familiar with writing tests? If not don't worry we can create a separate issue for this.

I added codes with tests for pytest. I slightly changed conftest.py. And I added synthetic gwy files in resources for tests. In my environment, pytest of test_io.py passes.

However, I have a few things unclear for testing.
#1. I'm not sure how to estimate whole coverage containing the modified io.py. I tried some, I got errors from test_validation.py.
#2. I couldn't access a configuration dict of topostats in the test framework. I wrote as follows in order to pass pytest, but yaml should be read multiple times. Probably, good fixture definition can solve, but I couldn't find the way.

default_config = pkg_resources.open_text(__package__, "default_config.yaml").read()
config = yaml.safe_load(default_config)
self.scale = Scale(config["loading"]["scale"])

I'd appreciate your advice for those.

ns-rse · 2023-09-18T18:33:41Z

Hi @iobataya

Thanks for taking on the task of addressing the feedback really appreciate it.

To address your questions....

Coverage

Coverage if reported after running tests as we have the pytest-cov included in the dev dependencies and so if you were to pip install -e /[dev] it would install the plugin and coverage would be reported. What I should really do is add codecov to the CI/CD pipeline so its reported in PRs too (I've created #668 and will address that).

I'll look at why tests are currently failing on this branch later this week.

Configuration

You've correctly clocked that configurations are loaded by pytest from tests/conftest.py, the pytest framework looks for this file and loads fixtures and other objects automatically.

In conftest.py the topostats/default_config.yaml gets loaded once. We then create a series of fixtures, one for each major section of the configuration. The loading section gets subset out into its own dictionary on line 102-106 and is then available to use as a fixture called loading_config in tests.

This means you don't need to repeat the snippet of code that loads the topostats/default_config.yaml and can instead, as you've commented, use the loading_config fixture.

I'll leave more comments in line.

ns-rse · 2023-09-18T18:39:37Z

topostats/io.py

-                if match == None:
+            # How can we get current configuration["loading"] ?
+            # default_config.yaml is used to get conf["loading"]["scale"]
+            default_config = pkg_resources.open_text(__package__, "default_config.yaml").read()


Rather than loading the config again we have structured classes to take arguments and use the same argument name in the YAML configuration and the parameters are passed in using Pythons **kwargs functionality as the names match.

Thus to get the loading dictionary we need to add an argument to the __init__ method of the LoadScans and on line 123 of topostats/run_topostats.py this is loaded automatically and instead the line below can be modified to extract the scale.

ns-rse · 2023-09-18T18:40:38Z

topostats/io.py

+            # default_config.yaml is used to get conf["loading"]["scale"]
+            default_config = pkg_resources.open_text(__package__, "default_config.yaml").read()
+            config = yaml.safe_load(default_config)
+            self.scale = Scale(config["loading"]["scale"])


With the above suggestion we can move this to the __init__ method (as PyLint will complain about self. being defined outside of __init__.

The values will be in the argument scale so something like...

def __init__( self, img_paths: list, channel: str, scale: dict, ): ... self.scale = scale

...would suffice and the value will cascade through.

Would need to add details of the additional parameter to the docstring too.

ns-rse · 2023-09-19T15:54:24Z

topostats/io.py

@@ -449,6 +452,49 @@ def convert_basename_to_relative_paths(df: pd.DataFrame):
    return df


+class Scale:


I've been thinking about this and am wondering if making this a class is perhaps over-complicating things?

Scaling is a simple multiplication really, its just a case of selecting the correct scaling factor based on the units used in the scans and what is desired.

We can have sensible defaults defined in the configuration which also allows users to extend the possible scalings and could also be setup to allow the selection of the desired scaling using a simple function to select the correct options from the dictionary.

I realise this is a fairly poorly formed idea and lacks code but what do you think @iobataya ?

Hi @ns-rse ,
Since the loading dict is available in LoadScans by receiving it in its constructor now, the config dict will cover the factors without the Scale class. I agree that Scale class is too much and not necessary.

According to that direction, I'm planning to make change next as follows.

to make LoadScans receives loading_config dict.

to make load_gwy get the factors as dict from the configuration YAML

to eliminate Scale class.

to handle xres/yres for gwy correctly (square, landscape)

to copy the scaling factors (units, px_in_nm, px_in_nm_x, px_in_nm_y) in img_dict of image in add_dict method

I'll apply codes and test codes in my forked repository to check at first in this month. I'll try run tests with cov module of pytest.

I'd appreciate your ideas and advice.

Hi @iobataya

This all sounds eminently reasonable and bulks out my above comments with more solid ideas, thank you.

I appreciate this is somewhat divergent to your original PR and am grateful of you taking the time to work through these things.

A couple of things that might be useful.

Merge conflicts

As we've had a couple of merges recently there are now merge conflicts, I'll resolve these on GitHub but you will need to update your local branch with git pull to pull those changes down.

Development tools

You may want to install some of the development tools we have listed as optional dependencies and make sure pre-commit is installed and running.

Dev tools can be installed if you do the following in your virtual environment (assuming you are using one, if not this might be a useful resource Conda environments for effective and reproducible research)...

cd ~/path/to/TopoStats pip install -e .[dev]

This will pull in pytest-cov and tell you what the coverage change is although don't worry too much we have GitHub working with CodeCov to report these things.

However, we are increasingly trying to adhere to PEP8 code style and NumpyDocstyle for docstrings and have pre-commit hooks in place which when enabled will check changes are compliant prior to allowing permits (we also use pre-commit.ci to check pull requests but its a faster feedback loop to enable pre-commit locally).

Once you've installed the dev requirements you can in your VirtualEnv install the pre-commit hooks with pre-commit install and then when you make commits the checks will run and tell you if there are any problems (some will be automatically corrected if possible). I've a blog post I've written on pre-commit that you may find useful pre-commit: Protecting your future self if you want to find out more.

Please do let me know if you've any problems, more than happy to help and could even have a chat over video if you think that would be useful.

@ns-rse

Hello @ns-rse,

I appreciate your info about git pull and pip install -e .. Especially, pip install -e . saved tons of time. And pytest-cov works fine to show coverage of codes. The test_io.py covers 84%.

I'm happy that I've reached the topostats in my branch can load gwy of any sizes of image (square and non-square) with arbitary pixel/lines aspect ratio. Now this makes me to proceed my project. I'll continue to improve the codes, and format codes to the expected style.

Brilliant, that is great to hear @iobataya 👍

You'll notice below that there are a couple of merge conflicts that have arisen. This arises when changes have been made to files after you forked the repository and Git doesn't know which should be kept.

They need manually resolving, probably the easiest is to do that here on GitHub using the links/buttons below. You then choose which bit of code should be retained and a commit will be made to your branch here on GitHub which you then git pull down to your local copy.

Let me know if you've any problems or would like help with this.

@ns-rse Sorry for my silence for a while. I prepared for domestic conference commencing tomorrow. My program using TopoStats is public now (https://github.com/iobataya/line_detection). Once the conference and other events are over and I take a break, I will start organizing the codes here.

Thanks for the update @iobataya I hope the conference went well.

This is discussed before in another pull request AFM-SPM#659. Tested in a new branch in a forked Topostats because of some conflicts with current Topostats. # Issues found in loading Gwyddion file. - only /0/, /1/ channels were read. - X,Y are flipped in Numpy.ndarray - more SI units were necessary (depends on files) - No support for non-square image # Fixed by this modifications - load_gwy can recognize channel name and unit for z value to load - Channel name/channel unit can be configurable in YAML - Flipped XY was fixed. - Non-square image can be loaded with its scaling params. - Loading conf and image scaling parameters are stored in .img_dict["loading_conf"]

iobataya added 3 commits September 9, 2023 16:54

SI Unit of um and mm for gwy added

e51f9c4

Some of .gwy files raise ValueError in io.py. ValueError: Units 'um' have not been added for .gwy files

Update io.py

a68de1c

Unit is set during scanning of channel index.

iobataya and others added 2 commits September 12, 2023 06:43

[pre-commit.ci] Fixing issues with pre-commit

0b7ef85

ns-rse requested changes Sep 13, 2023

View reviewed changes

iobataya and others added 5 commits September 15, 2023 18:58

[pre-commit.ci] Fixing issues with pre-commit

5b8a5cc

Format corrected

de1ac76

Merge branch 'main' of https://github.com/iobataya/TopoStats

07fec64

[pre-commit.ci] Fixing issues with pre-commit

9ac1a92

ns-rse reviewed Sep 18, 2023

View reviewed changes

ns-rse reviewed Sep 19, 2023

View reviewed changes

Uploaded /gwy files for tests

d8103c8

iobataya closed this by deleting the head repository Dec 25, 2023

iobataya mentioned this pull request Jan 14, 2024

fixed load_gwy to read channels correctly and fixed flipped XY ndarray #779

Closed

ns-rse mentioned this pull request Feb 8, 2024

Added 'pm' and 'mm' for very small and large scans #796

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion factors from um to nanometre added in load_gwy in io.py #659

Conversion factors from um to nanometre added in load_gwy in io.py #659

iobataya commented Sep 9, 2023

SylviaWhittle commented Sep 11, 2023

iobataya commented Sep 11, 2023

ns-rse left a comment

ns-rse Sep 12, 2023

iobataya Sep 13, 2023

iobataya Sep 17, 2023

iobataya commented Sep 17, 2023

ns-rse commented Sep 18, 2023

ns-rse Sep 18, 2023

ns-rse Sep 18, 2023 •

edited

Loading

ns-rse Sep 19, 2023

iobataya Sep 21, 2023

ns-rse Sep 27, 2023

iobataya Sep 28, 2023 •

edited

Loading

ns-rse Oct 4, 2023

iobataya Nov 13, 2023

ns-rse Nov 15, 2023

		@@ -449,6 +452,49 @@ def convert_basename_to_relative_paths(df: pd.DataFrame):
		return df


		class Scale:

Conversion factors from um to nanometre added in load_gwy in io.py #659

Conversion factors from um to nanometre added in load_gwy in io.py #659

Conversation

iobataya commented Sep 9, 2023

SylviaWhittle commented Sep 11, 2023

iobataya commented Sep 11, 2023

ns-rse left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Externalized Scaling units to configuration with Scale class

iobataya commented Sep 17, 2023

ns-rse commented Sep 18, 2023

Coverage

Configuration

Choose a reason for hiding this comment

ns-rse Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Merge conflicts

Development tools

iobataya Sep 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ns-rse Sep 18, 2023 •

edited

Loading

iobataya Sep 28, 2023 •

edited

Loading