-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion factors from um to nanometre added in load_gwy in io.py #659
Conversation
Some of .gwy files raise ValueError in io.py. ValueError: Units 'um' have not been added for .gwy files
# XY, Z units correction * units for px_to_nm to multiply data array should be picked up from 'si_unit_z' instetad of 'si_unit_xy'. * Non-square image should be considered. Probably, px_to_nm_z, px_to_nm_x, px_to_nm_y will be necessary. # Channel index Data key of gwy sometimes starts from 1 or larger. (ex. '/4/data') It happens when image is cropped on Gwyddion, and it remains after saving even other channels are removed. A loop to scan indices was added. And it guesses whether if the channel is topography or not by si_unit_z.
Unit is set during scanning of channel index.
Hi @iobataya , thank you for looking into this issue. If you need any assistance with getting the tests to work, then let me know :) |
Hi @SylviaWhittle, thank you for the comment. That was my issue with the Jupyter Notebook. I'm going to work on it within my forked repository for a while. I'd like to see if we can get an np.ndarray from a .gwy file on the Notebook smoothly. For now, I'm trying,
|
# transposed np.ndarray corrected _gwy_read_component returns transposed dimensions of (xres,yres), this corrected to (yres, xres) # load_gwy updated * load_gwy can find multichannel data starting from any index (ex. 4/data) * load_gwy can guess whether if it is height channel by unitstr * load_gwy supports m,mm,um for z-unit * px_to_nm_x, px_to_nm_y were added for future usage, stored in img_dict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @iobataya really appreciate it.
I've made some comments in line that are perhaps helpful.
I had a thought that making the conv_factors
into a dictionary that is read from the topostats/default_config.yaml
which would have the benefit of making it easy for users to add additional conversion factors if required. This shouldn't block merging though and we can make this a subsequent issue to make it configurable.
Currently though the Continuous Integration that runs the test suite fails because of the reliance on the regex
package that I've highlighted and there are two options for dealing with this that I've suggested.
On the topic of tests I was wondering if you are familiar with writing unittests? We use the pytest
framework (rather than the standard library unittest
) and try and keep our test coverage > 80%.
It would be nice to have some tests covering the use of these different conversion factors if you are familiar with writing tests? If not don't worry we can create a separate issue for this.
topostats/io.py
Outdated
|
||
re = r"\/(\d+)\/data$" | ||
components = list(image_data_dict.keys()) | ||
conv_factors = {"m": 1e9, "um": 1e6, "mm": 1e3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could perhaps be defined as a dictionary outside of the function, or perhaps even moved to default_config.yaml
which may make it easier for users to add additional conversion factors without having to hack at the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree that such the dict should not be hard-coded. If default_config.yaml
can manage units conversion, TopoStats can support additional units by other SPM-techniques.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Externalized Scaling units to configuration with Scale class
There are hard-coded strings of unit and conversion factors in io.py. I tried to externalize the strings and factors. The values are configurable by default_configuration.yaml. User can also add other units like "V" or "N" in configuration.yaml. Such conversion factors can be used for various signals from SPM in the future. (ex. invOLS or spring constants)
loading:
channel: Height # Channel to pull data from in the data files.
scale:
nm:
{m: 1e9, mm: 1e6, um: 1e3, nm: 1.0} # factors to convert to nm
The configuration is read by a created small class "Scale". The class has conversion factors accessible by two str keys, unit_to and unit_from.
self.scale = Scale(config["loading"]["scale"])
unit_from = "um"
factor = self.scale.get_factor("nm", unit_from)
This scale object can hold scaling factors of image as well as that of physical units. So it's useful to support non-square images (ex. 512 x 256). In load_gwy
, the scale object holds "px_to_nm_x" and "px_to_nm_y" in the scale object, and the object is added as img_dict["scale"]. Users can access the scale object after loading. (see also add_to_dict
)
# example (simplified from actual code)
m2nm = self.scale.get_factor("nm", "m")
self.scale.add_factor("nm", "px_to_nm", xreal * m2nm / image.shape[1])
self.scale.add_factor("nm", "px_to_nm_x", xreal * m2nm / image.shape[1])
self.scale.add_factor("nm", "px_to_nm_y", yreal * m2nm / image.shape[0])
Currently, load function returns a tuple of 'image' and 'pixel_to_nm'. When other functions (load_jpk, load_spm,..) implement this img_dict["scale"], load function can just a single variable, 'image' and the scale object hold scaling factors for non-square images and more.
io.py modified - Conversion from gwy.DataField to np.ndarray was corrected. (dimension was transposed) - Scale class added to conversion of units and holds conversion factors - Initializable by configuration YAML file - This object is copied to img_dict of data accessible scan.img_dict[filename]["scale"] px_to_nm_x, px_to_nm_y values in img_dict from Gwy Testing and Configurations - Resource files (file_landscape.gwy, file.square.gy) added. - conftest.py modified to read file.gwy and file_landscape.gwy - Test for Scale class - test_scale_config was added. - test_scale in test_io.py was added. - "scale" configuration was added to default_config.yaml
I added codes with tests for pytest. I slightly changed conftest.py. And I added synthetic gwy files in resources for tests. In my environment, pytest of test_io.py passes. However, I have a few things unclear for testing.
I'd appreciate your advice for those. |
Hi @iobataya Thanks for taking on the task of addressing the feedback really appreciate it. To address your questions.... CoverageCoverage if reported after running tests as we have the I'll look at why tests are currently failing on this branch later this week. ConfigurationYou've correctly clocked that configurations are loaded by In This means you don't need to repeat the snippet of code that loads the I'll leave more comments in line. |
if match == None: | ||
# How can we get current configuration["loading"] ? | ||
# default_config.yaml is used to get conf["loading"]["scale"] | ||
default_config = pkg_resources.open_text(__package__, "default_config.yaml").read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than loading the config again we have structured classes to take arguments and use the same argument name in the YAML configuration and the parameters are passed in using Pythons **kwargs
functionality as the names match.
Thus to get the loading
dictionary we need to add an argument to the __init__
method of the LoadScans
and on line 123 of topostats/run_topostats.py
this is loaded automatically and instead the line below can be modified to extract the scale.
# default_config.yaml is used to get conf["loading"]["scale"] | ||
default_config = pkg_resources.open_text(__package__, "default_config.yaml").read() | ||
config = yaml.safe_load(default_config) | ||
self.scale = Scale(config["loading"]["scale"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above suggestion we can move this to the __init__
method (as PyLint will complain about self.
being defined outside of __init__
.
The values will be in the argument scale
so something like...
def __init__(
self,
img_paths: list,
channel: str,
scale: dict,
):
...
self.scale = scale
...would suffice and the value will cascade through.
Would need to add details of the additional parameter to the docstring too.
@@ -449,6 +452,49 @@ def convert_basename_to_relative_paths(df: pd.DataFrame): | |||
return df | |||
|
|||
|
|||
class Scale: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this and am wondering if making this a class is perhaps over-complicating things?
Scaling is a simple multiplication really, its just a case of selecting the correct scaling factor based on the units used in the scans and what is desired.
We can have sensible defaults defined in the configuration which also allows users to extend the possible scalings and could also be setup to allow the selection of the desired scaling using a simple function to select the correct options from the dictionary.
I realise this is a fairly poorly formed idea and lacks code but what do you think @iobataya ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ns-rse ,
Since the loading dict is available in LoadScans
by receiving it in its constructor now, the config dict will cover the factors without the Scale class. I agree that Scale class is too much and not necessary.
According to that direction, I'm planning to make change next as follows.
- to make LoadScans receives loading_config dict.
- to make load_gwy get the factors as dict from the configuration YAML
- to eliminate Scale class.
- to handle xres/yres for gwy correctly (square, landscape)
- to copy the scaling factors (units, px_in_nm, px_in_nm_x, px_in_nm_y) in img_dict of image in
add_dict
method
I'll apply codes and test codes in my forked repository to check at first in this month. I'll try run tests with cov module of pytest.
I'd appreciate your ideas and advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @iobataya
This all sounds eminently reasonable and bulks out my above comments with more solid ideas, thank you.
I appreciate this is somewhat divergent to your original PR and am grateful of you taking the time to work through these things.
A couple of things that might be useful.
Merge conflicts
As we've had a couple of merges recently there are now merge conflicts, I'll resolve these on GitHub but you will need to update your local branch with git pull
to pull those changes down.
Development tools
You may want to install some of the development tools we have listed as optional dependencies and make sure pre-commit
is installed and running.
Dev tools can be installed if you do the following in your virtual environment (assuming you are using one, if not this might be a useful resource Conda environments for effective and reproducible research)...
cd ~/path/to/TopoStats
pip install -e .[dev]
This will pull in pytest-cov
and tell you what the coverage change is although don't worry too much we have GitHub working with CodeCov to report these things.
However, we are increasingly trying to adhere to PEP8 code style and NumpyDocstyle for docstrings and have pre-commit hooks in place which when enabled will check changes are compliant prior to allowing permits (we also use pre-commit.ci to check pull requests but its a faster feedback loop to enable pre-commit
locally).
Once you've installed the dev
requirements you can in your VirtualEnv install the pre-commit
hooks with pre-commit install
and then when you make commits the checks will run and tell you if there are any problems (some will be automatically corrected if possible). I've a blog post I've written on pre-commit that you may find useful pre-commit: Protecting your future self if you want to find out more.
Please do let me know if you've any problems, more than happy to help and could even have a chat over video if you think that would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ns-rse,
I appreciate your info about git pull
and pip install -e .
. Especially, pip install -e .
saved tons of time. And pytest-cov
works fine to show coverage of codes. The test_io.py covers 84%.
I'm happy that I've reached the topostats in my branch can load gwy of any sizes of image (square and non-square) with arbitary pixel/lines aspect ratio. Now this makes me to proceed my project. I'll continue to improve the codes, and format codes to the expected style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant, that is great to hear @iobataya 👍
You'll notice below that there are a couple of merge conflicts that have arisen. This arises when changes have been made to files after you forked the repository and Git doesn't know which should be kept.
They need manually resolving, probably the easiest is to do that here on GitHub using the links/buttons below. You then choose which bit of code should be retained and a commit will be made to your branch here on GitHub which you then git pull
down to your local copy.
Let me know if you've any problems or would like help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ns-rse Sorry for my silence for a while. I prepared for domestic conference commencing tomorrow. My program using TopoStats is public now (https://github.com/iobataya/line_detection). Once the conference and other events are over and I take a break, I will start organizing the codes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @iobataya I hope the conference went well.
This is discussed before in another pull request AFM-SPM#659. Tested in a new branch in a forked Topostats because of some conflicts with current Topostats. # Issues found in loading Gwyddion file. - only /0/, /1/ channels were read. - X,Y are flipped in Numpy.ndarray - more SI units were necessary (depends on files) - No support for non-square image # Fixed by this modifications - load_gwy can recognize channel name and unit for z value to load - Channel name/channel unit can be configurable in YAML - Flipped XY was fixed. - Non-square image can be loaded with its scaling params. - Loading conf and image scaling parameters are stored in .img_dict["loading_conf"]
Some of .gwy files raise the following ValueError in io.py to fail to read.
ValueError: Units 'um' have not been added for .gwy files
Please add an SI to nanometre conversion factor for these units in _gwy_read_component in io.py
Conversion factors for 'mm' and 'um' were added. (My files contain only 'um' unit, but 'mm' was also added just in case.)