fixed load_gwy to read channels correctly and fixed flipped XY ndarray #779

iobataya · 2024-01-14T23:58:02Z

This is discussed before in another pull request #659. Tested in a new branch in a forked Topostats because of some conflicts with current Topostats.

Issues found in loading Gwyddion file.

only /0/, /1/ channels were read.
X,Y are flipped in Numpy.ndarray
more SI units were necessary (depends on files)
No support for non-square image

Fixed by this modifications

load_gwy can recognize channel name and unit for z value to load
Channel name/channel unit can be configurable in YAML
Flipped XY was fixed.
Non-square image can be loaded with its scaling params.
Loading conf and image scaling parameters are stored in resulting img_dict["loading_conf"]

This is discussed before in another pull request AFM-SPM#659. Tested in a new branch in a forked Topostats because of some conflicts with current Topostats. # Issues found in loading Gwyddion file. - only /0/, /1/ channels were read. - X,Y are flipped in Numpy.ndarray - more SI units were necessary (depends on files) - No support for non-square image # Fixed by this modifications - load_gwy can recognize channel name and unit for z value to load - Channel name/channel unit can be configurable in YAML - Flipped XY was fixed. - Non-square image can be loaded with its scaling params. - Loading conf and image scaling parameters are stored in .img_dict["loading_conf"]

ns-rse · 2024-01-15T14:07:51Z

Thanks for the PR @iobataya

The pre-commit Continuous Integration in place has highlighted a few linting errors that need tidying. Are you able to clear these up and add the commit to this PR please? You may want to install pre-commit locally and run it (I wrote a hopefully useful blog post on this which can be read here).

With TopoStats pre-commit is an optional dependency of the dev subset so in your virtual environment you can install it along with the test requirements with...

cd path/to/TopoStats
pip install -e .[dev,tests]
pre-commit install

Thanks for adding tests for your additions, really useful and appreciated.

Unfortunately some of the other tests now fail and looking through the examples that have run here I think its because you have made additions to the topostats/default_config.yaml and the scale key and its associated value are not recognised when the configuration is validated. You'd need to add to line 57 of validation.py (splitting over lines as necessary) details of the expected value for factor_to_nm that can be validated.

Thinking about it though these are fairly standard values that may be required in other places so maybe they could be stored elsewhere. 🤔 I'll give that some thought and get back to you.

iobataya · 2024-01-16T23:41:28Z

Thanks for the instruction for pre-commit. I'm preparing environment on linux, hooking my cloned repo by pre-commit, I'll check by myself again.

IMHO, as for the 'too standard' scaling values, I expect program manage difference between file formats. For example some file format may use 'u' as micrometer, others may use Unicode U+00B5. I've been checked which is which, but there's possibility of the case in new formats. I thought there's two ways In order to deal with that,

Config file defines the scalings according to each file format
Loading methods of each spm, jpk, gwy, etc. has the scalings in it.

Please let me know when your teams' direction is decided. If the latter seems better, I'll change my codes to adapt.

* Format of codes checked by pre-commit. * Long method was refactoring into external methods * Tests were added for the additional methods

codecov · 2024-01-17T07:47:24Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (cb03e38) 84.30% compared to head (72d993c) 84.65%.
Report is 3 commits behind head on main.

Files	Patch %	Lines
topostats/io.py	96.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #779      +/-   ##
==========================================
+ Coverage   84.30%   84.65%   +0.35%     
==========================================
  Files          21       21              
  Lines        3128     3174      +46     
==========================================
+ Hits         2637     2687      +50     
+ Misses        491      487       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…channel_num_and_flipped_xy

ns-rse

Thanks for your work on this @iobataya I've (finally) spent some time going through it and made some comments.

Its most likely my misunderstanding but I'm not sure what the scale_dict that are defined within gwy_process_xy_values() function and gwy_process_z_values() are used for since they are not returned by either function. Perhaps they should be defined when the class is initialised if they are values that are to be used as empty dictionaries and then updated by each method. 🤷

I assume that x/y and z can have different scaling which is why there are separate gwy_process_xy_values() and gwy_process_z_values() (sorry, perhaps surprisingly I'm not very familiar with AFM images). If they'll always have the scaling values then it would be possible to have a single function that scales them.

I've added #783 to remind me to add rows in the docs/configuration.md once this is merged so that we keep our documentation up-to-date.

topostats/io.py

ns-rse · 2024-01-19T16:58:54Z

topostats/io.py

+        LOGGER.info(f"DataField {unit_z} was multiplied by z-factor: {factor_z}")
+        return image
+
+    def gwy_process_xy_values(self, unit_xy, xreal, yreal, xres, yres) -> None:


Its not clear to me where this function is being used. I can see a test has been written for it and it is called on line 949 but the function returns nothing and only updates scale_dict which is a variable that is only defined and present within this function

ns-rse · 2024-01-19T17:02:07Z

topostats/validation.py

@@ -54,7 +54,7 @@ def validate_config(config: dict, schema: Schema, config_type: str) -> None:
            ".topostats",
            error="Invalid value in config for 'file_ext', valid values are '.spm', '.jpk', '.ibw', '.gwy', '.topostats', or '.asd'.",
        ),
-        "loading": {"channel": str},
+        "loading": {"channel": str, "scale": dict},


I'll have to give this some thought as to how to do it, but it might be worth validating that the values of the dictionary are numbers (typically I think they'll be float) so that we capture early any invalid configuration options.

Co-authored-by: Neil Shephard <[email protected]>

iobataya · 2024-01-25T04:01:29Z

At first, I just needed my software to be able to read my own gwy file format, but now I'm thinking about making it flexible enough to handle non-square images in the future, without affecting other methods or the config file. To clarify my understanding, I've diagrammed the file loading process (please zoom in. it's scalable). The changes related to the load_gwy() method can be considered from two perspectives.

(i) Unit Conversion: File Format Dependent
In the process of loading from gwy files, it was necessary to convert units given as strings (like micrometer, nanometer) into a float-type scaling factor. Some file formats (like .jpk or .ibw) don't require this step, while others (like .spm or .gwy) do.

In the previous commit, I had inserted a dictionary of strings and coefficients into the config.yaml file, but now I'm beginning to think that's not appropriate. Since this process depends on the file format, it seems better to confine it within the load_gwy() method. Currently, it's held as a config value for the entire topostats, but it's only used when loading gwy file.

(ii) Scaling Factors for Y as Well as X
Currently, Topostats only supports square AFM data. This is because the get_data() method of the LoadScans class only returns a 2D array of Z-values (heights) and a single conversion coefficient, pixel_to_nm.

However, actual AFM measurement data is not always a square matrix. Furthermore, even if the measured size is square, the number of pixels may differ. For example, a square area of 1um x 1um might be measured with 512px x 256px. In such cases, returning only a single conversion coefficient, pixel_to_nm, is insufficient for subsequent data processing and visualization. Although the number of pixels can be known from the shape of the ndarray, the physical dimensions (width and height) of the original measured image are unknown. Therefore, for instance, when plotting with imshow(), the image may not be displayed correctly (as shown in the top right of the attached figure). I believe variables like pixel_to_nm_x, pixel_to_nm_y should be returned.

However, changing to return a tuple like (image, pixel_to_nm_x, pixel_to_nm_y) seemed like it would require many changes to other methods and test codes. So, I embedded values corresponding to X and Y in the loading_config, which is a dict, and added a reference to loading_config at the end of get_data() in the check_image_size_and_add_to_dict() method. (That's why there are methods without return values).

Firstly, for (i), I will commit a change this week that performs the string to number conversion inside load_gwy(). This will eliminate the need for changes in default.yaml and validation.py. As for (ii), I will leave the current code as it is. This is because my own project's AFM measurement data is not a square matrix, and I absolutely need conversion coefficients for pixel vs physical size on both X and Y axes.

Please let me know once the Topostats team has decided on a policy regarding these processes. Depending on that policy, I will either modify the code or recode it in a new branch within a forked repository.

ns-rse · 2024-01-25T09:40:33Z

Thanks for your thoughts and on-going work on this @iobataya really useful.

For 1) I would recommend not hard coding values in the load_gwy() methods, it means that if other values are to be used they are hard to modify. We are mindful that many users find the YAML configuration file overwhelming already and so rather than add another section to topostats/default_config.yaml that to contained all possible conversions conditional on file type we could parametrise the conversions with a separate configuration file that is loaded
by default but users could over-ride if required (we already do this with the topostats/plotting_dictionary.yaml.

To avoid scope drift I would not try addressing that in this Pull Request but address it in a separate one. Another reason for doing so is that we are also looking at moving much of the io.py classes and functions to their own package. Some of this work has already been started by @SylviaWhittle and we have the AFM-SPM/topofileformats repository for that work.

With regards to 2) that is something I can't really comment on and would need to discuss with others. Having different scaling for x and y axes does sound as though it could be complicated!

iobataya · 2024-01-25T23:09:05Z

Thanks for your comment, @ns-rse .

It seems that there are significant changes to the io.py class, so it's probably a good idea for me to wait for those changes for a while. I look forward to the update!

Formatting by pre-commit

ab1ff5a

* Format of codes checked by pre-commit. * Long method was refactoring into external methods * Tests were added for the additional methods

Merge remote-tracking branch 'upstream/main' into PR659-gwy_load_arb_…

d1761de

…channel_num_and_flipped_xy

ns-rse mentioned this pull request Jan 19, 2024

Add scale and factor_to_nm to documentation #783

Open

ns-rse requested changes Jan 19, 2024

View reviewed changes

ns-rse added the IO Input and Output label Jan 19, 2024

iobataya and others added 3 commits January 21, 2024 10:38

Update topostats/io.py

b0e48c2

Co-authored-by: Neil Shephard <[email protected]>

[pre-commit.ci] Fixing issues with pre-commit

40c26b6

added comments, rearranged methods

b8bcfa7

Unit str conversion confined within load_gwy

72d993c

iobataya closed this Jan 25, 2024

ns-rse mentioned this pull request Feb 8, 2024

Added 'pm' and 'mm' for very small and large scans #796

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed load_gwy to read channels correctly and fixed flipped XY ndarray #779

fixed load_gwy to read channels correctly and fixed flipped XY ndarray #779

iobataya commented Jan 14, 2024

ns-rse commented Jan 15, 2024

iobataya commented Jan 16, 2024

codecov bot commented Jan 17, 2024 •

edited

Loading

ns-rse left a comment

ns-rse Jan 19, 2024

ns-rse Jan 19, 2024

iobataya commented Jan 25, 2024 •

edited

Loading

ns-rse commented Jan 25, 2024

iobataya commented Jan 25, 2024

fixed load_gwy to read channels correctly and fixed flipped XY ndarray #779

fixed load_gwy to read channels correctly and fixed flipped XY ndarray #779

Conversation

iobataya commented Jan 14, 2024

Issues found in loading Gwyddion file.

Fixed by this modifications

ns-rse commented Jan 15, 2024

iobataya commented Jan 16, 2024

codecov bot commented Jan 17, 2024 • edited Loading

Codecov Report

ns-rse left a comment

Choose a reason for hiding this comment

ns-rse Jan 19, 2024

Choose a reason for hiding this comment

ns-rse Jan 19, 2024

Choose a reason for hiding this comment

iobataya commented Jan 25, 2024 • edited Loading

ns-rse commented Jan 25, 2024

iobataya commented Jan 25, 2024

codecov bot commented Jan 17, 2024 •

edited

Loading

iobataya commented Jan 25, 2024 •

edited

Loading