Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed load_gwy to read channels correctly and fixed flipped XY ndarray #779

Conversation

iobataya
Copy link

This is discussed before in another pull request #659. Tested in a new branch in a forked Topostats because of some conflicts with current Topostats.

Issues found in loading Gwyddion file.

  • only /0/, /1/ channels were read.
  • X,Y are flipped in Numpy.ndarray
  • more SI units were necessary (depends on files)
  • No support for non-square image

Fixed by this modifications

  • load_gwy can recognize channel name and unit for z value to load
  • Channel name/channel unit can be configurable in YAML
  • Flipped XY was fixed.
  • Non-square image can be loaded with its scaling params.
  • Loading conf and image scaling parameters are stored in resulting img_dict["loading_conf"]

This is discussed before in another pull request AFM-SPM#659. Tested in a new branch in a forked Topostats because of some conflicts with current Topostats.

# Issues found in loading Gwyddion file.
 - only /0/, /1/ channels were read.
 - X,Y are flipped in Numpy.ndarray
  - more SI units were necessary (depends on files)
 - No support for non-square image

# Fixed by this modifications
 - load_gwy can recognize channel name and unit for z value to load
 - Channel name/channel unit can be configurable in YAML
 - Flipped XY was fixed.
 - Non-square image can be loaded with its scaling params.
 - Loading conf and image scaling parameters are stored in .img_dict["loading_conf"]
@ns-rse
Copy link
Collaborator

ns-rse commented Jan 15, 2024

Thanks for the PR @iobataya

The pre-commit Continuous Integration in place has highlighted a few linting errors that need tidying. Are you able to clear these up and add the commit to this PR please? You may want to install pre-commit locally and run it (I wrote a hopefully useful blog post on this which can be read here).

With TopoStats pre-commit is an optional dependency of the dev subset so in your virtual environment you can install it along with the test requirements with...

cd path/to/TopoStats
pip install -e .[dev,tests]
pre-commit install

Thanks for adding tests for your additions, really useful and appreciated.

Unfortunately some of the other tests now fail and looking through the examples that have run here I think its because you have made additions to the topostats/default_config.yaml and the scale key and its associated value are not recognised when the configuration is validated. You'd need to add to line 57 of validation.py (splitting over lines as necessary) details of the expected value for factor_to_nm that can be validated.

Thinking about it though these are fairly standard values that may be required in other places so maybe they could be stored elsewhere. 🤔 I'll give that some thought and get back to you.

@iobataya
Copy link
Author

Thanks for the instruction for pre-commit. I'm preparing environment on linux, hooking my cloned repo by pre-commit, I'll check by myself again.

IMHO, as for the 'too standard' scaling values, I expect program manage difference between file formats. For example some file format may use 'u' as micrometer, others may use Unicode U+00B5. I've been checked which is which, but there's possibility of the case in new formats. I thought there's two ways In order to deal with that,

  • Config file defines the scalings according to each file format
  • Loading methods of each spm, jpk, gwy, etc. has the scalings in it.

Please let me know when your teams' direction is decided. If the latter seems better, I'll change my codes to adapt.

 * Format of codes checked by pre-commit.
 * Long method was refactoring into external methods
 * Tests were added for the additional methods
Copy link

codecov bot commented Jan 17, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (cb03e38) 84.30% compared to head (72d993c) 84.65%.
Report is 3 commits behind head on main.

Files Patch % Lines
topostats/io.py 96.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #779      +/-   ##
==========================================
+ Coverage   84.30%   84.65%   +0.35%     
==========================================
  Files          21       21              
  Lines        3128     3174      +46     
==========================================
+ Hits         2637     2687      +50     
+ Misses        491      487       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ns-rse ns-rse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this @iobataya I've (finally) spent some time going through it and made some comments.

Its most likely my misunderstanding but I'm not sure what the scale_dict that are defined within gwy_process_xy_values() function and gwy_process_z_values() are used for since they are not returned by either function. Perhaps they should be defined when the class is initialised if they are values that are to be used as empty dictionaries and then updated by each method. 🤷

I assume that x/y and z can have different scaling which is why there are separate gwy_process_xy_values() and gwy_process_z_values() (sorry, perhaps surprisingly I'm not very familiar with AFM images). If they'll always have the scaling values then it would be possible to have a single function that scales them.

I've added #783 to remind me to add rows in the docs/configuration.md once this is merged so that we keep our documentation up-to-date.

topostats/io.py Show resolved Hide resolved
topostats/io.py Show resolved Hide resolved
topostats/io.py Outdated Show resolved Hide resolved
topostats/io.py Outdated Show resolved Hide resolved
topostats/io.py Outdated Show resolved Hide resolved
topostats/io.py Outdated Show resolved Hide resolved
topostats/io.py Outdated Show resolved Hide resolved
topostats/io.py Outdated Show resolved Hide resolved
LOGGER.info(f"DataField {unit_z} was multiplied by z-factor: {factor_z}")
return image

def gwy_process_xy_values(self, unit_xy, xreal, yreal, xres, yres) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not clear to me where this function is being used. I can see a test has been written for it and it is called on line 949 but the function returns nothing and only updates scale_dict which is a variable that is only defined and present within this function

@@ -54,7 +54,7 @@ def validate_config(config: dict, schema: Schema, config_type: str) -> None:
".topostats",
error="Invalid value in config for 'file_ext', valid values are '.spm', '.jpk', '.ibw', '.gwy', '.topostats', or '.asd'.",
),
"loading": {"channel": str},
"loading": {"channel": str, "scale": dict},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to give this some thought as to how to do it, but it might be worth validating that the values of the dictionary are numbers (typically I think they'll be float) so that we capture early any invalid configuration options.

@ns-rse ns-rse added the IO Input and Output label Jan 19, 2024
@iobataya
Copy link
Author

iobataya commented Jan 25, 2024

topostats_loading_scheme

At first, I just needed my software to be able to read my own gwy file format, but now I'm thinking about making it flexible enough to handle non-square images in the future, without affecting other methods or the config file. To clarify my understanding, I've diagrammed the file loading process (please zoom in. it's scalable). The changes related to the load_gwy() method can be considered from two perspectives.

(i) Unit Conversion: File Format Dependent
In the process of loading from gwy files, it was necessary to convert units given as strings (like micrometer, nanometer) into a float-type scaling factor. Some file formats (like .jpk or .ibw) don't require this step, while others (like .spm or .gwy) do.

In the previous commit, I had inserted a dictionary of strings and coefficients into the config.yaml file, but now I'm beginning to think that's not appropriate. Since this process depends on the file format, it seems better to confine it within the load_gwy() method. Currently, it's held as a config value for the entire topostats, but it's only used when loading gwy file.

(ii) Scaling Factors for Y as Well as X
Currently, Topostats only supports square AFM data. This is because the get_data() method of the LoadScans class only returns a 2D array of Z-values (heights) and a single conversion coefficient, pixel_to_nm.

However, actual AFM measurement data is not always a square matrix. Furthermore, even if the measured size is square, the number of pixels may differ. For example, a square area of 1um x 1um might be measured with 512px x 256px. In such cases, returning only a single conversion coefficient, pixel_to_nm, is insufficient for subsequent data processing and visualization. Although the number of pixels can be known from the shape of the ndarray, the physical dimensions (width and height) of the original measured image are unknown. Therefore, for instance, when plotting with imshow(), the image may not be displayed correctly (as shown in the top right of the attached figure). I believe variables like pixel_to_nm_x, pixel_to_nm_y should be returned.

However, changing to return a tuple like (image, pixel_to_nm_x, pixel_to_nm_y) seemed like it would require many changes to other methods and test codes. So, I embedded values corresponding to X and Y in the loading_config, which is a dict, and added a reference to loading_config at the end of get_data() in the check_image_size_and_add_to_dict() method. (That's why there are methods without return values).

Firstly, for (i), I will commit a change this week that performs the string to number conversion inside load_gwy(). This will eliminate the need for changes in default.yaml and validation.py. As for (ii), I will leave the current code as it is. This is because my own project's AFM measurement data is not a square matrix, and I absolutely need conversion coefficients for pixel vs physical size on both X and Y axes.

Please let me know once the Topostats team has decided on a policy regarding these processes. Depending on that policy, I will either modify the code or recode it in a new branch within a forked repository.

@ns-rse
Copy link
Collaborator

ns-rse commented Jan 25, 2024

Thanks for your thoughts and on-going work on this @iobataya really useful.

For 1) I would recommend not hard coding values in the load_gwy() methods, it means that if other values are to be used they are hard to modify. We are mindful that many users find the YAML configuration file overwhelming already and so rather than add another section to topostats/default_config.yaml that to contained all possible conversions conditional on file type we could parametrise the conversions with a separate configuration file that is loaded
by default but users could over-ride if required (we already do this with the topostats/plotting_dictionary.yaml.

To avoid scope drift I would not try addressing that in this Pull Request but address it in a separate one. Another reason for doing so is that we are also looking at moving much of the io.py classes and functions to their own package. Some of this work has already been started by @SylviaWhittle and we have the AFM-SPM/topofileformats repository for that work.

With regards to 2) that is something I can't really comment on and would need to discuss with others. Having different scaling for x and y axes does sound as though it could be complicated!

@iobataya
Copy link
Author

Thanks for your comment, @ns-rse .

It seems that there are significant changes to the io.py class, so it's probably a good idea for me to wait for those changes for a while. I look forward to the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Input and Output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants