-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support Dirac GOS #72
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #72 +/- ##
==========================================
+ Coverage 88.47% 88.55% +0.08%
==========================================
Files 67 67
Lines 7287 7341 +54
Branches 1179 1187 +8
==========================================
+ Hits 6447 6501 +54
Misses 571 571
Partials 269 269 ☔ View full report in Codecov by Sentry. |
Indeed, as you said, the duplication should be removed. Two approaches can be used:
@gguzzina, @francisco-dlp, any preferences? On a side note, saving a model |
The registry approach seems more sensible to me. If implemented in a sensible way it allows two things:
The new GOS data could include even revisions of the two datasets above, which might be at different URLs or have different hashes. |
The first approach can be easily done, but have the disadvantage of update the url and hash in the code everytime when update the databases. For the registry approach, do we have an mini example of how it works? |
It is already possible to specify a file path for a gosh file: https://hyperspy.org/exspy/user_guide/eels.html#generalised-oscillator-strengths. In the interest of simplicity and because I don't think that there is a use case for user changing the registry dynamically (the cross section are expected to be stable at some point and change rarely and user can use other gosh file easily), I would suggest to use a registry that is simply a python dictionary hardcoded in exspy, that would need to updated one in a while. |
To check if I understand it correctly, so we use the GoshGOS and add a dictionary of registry, instead of adding subclasses. The GoshGOS will then need to have a new init parameter (call it "source"?) to specify if use DFT or Dirac, default to DFT. The use of the GOS will be m = s.create_model(low_loss=ll, GOS="dirac")
m = s.create_model(low_loss=ll, GOS="dft") # currently called gosh
m = s.create_model(low_loss=ll) # this will use dft-gosh by default Currently, the pooch download the database to a cache directory by default, might be better to put in the database directory within the package. |
This looks good to me. I agree that
It is good practise to keep the distribution package as small as reasonable and therefore it is better to download and cache the database. This is only be download only once. |
What I mean is that the package indeed does not distribute those files and still do the download, but to define a directory that holds the database for downloading, instead of putting in the cache directory by default which user hard to find and likely to purge. The Dirac GOS database has all the elements and all the edges (over 2000 entries). In each entry, we have relatively dense sampling (256 in momentum and 128 in energy). Due to spin-orbit coupling, the L2 and L3 for instance are two different entries, instead of single array in the Schrodinger approach. We also used double precision which doubles the size. If preferred, we can give the single precision version but will still be ~200 MB.
|
Writing to
Could the file size be reduced by using compression? |
from exspy.misc.eels.gosh_gos import GoshGOS
GoshGOS('Ti_L2', source='dirac')
from exspy.models.eelsmodel import EELSModel
EELSModel(signal_1d, GOS='dirac')
from exspy.signals.eels import EELSSpectrum
s = EELSSpectrum(data)
s.create_model(GOS='dirac')
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks in very good shape.
@@ -146,7 +147,7 @@ class EELSCLEdge(Component): | |||
_fine_structure_coeff_free = True | |||
_fine_structure_spline_active = True | |||
|
|||
def __init__(self, element_subshell, GOS="gosh", gos_file_path=None): | |||
def __init__(self, element_subshell, GOS="dft", gos_file_path=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side note as this is outside of the scope of this PR: I find it confusing to have both GOS
and gos_file_path
. I am wondering if these should be merged together to make GOS
accept one of the supported string ("dft", "dirac", etc.) or a path (which could a str or
pathlib.Path`) in which case, it will try to read the file assuming this is gosh file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this makes sense to me.
Co-authored-by: Eric Prestat <[email protected]>
Co-authored-by: Eric Prestat <[email protected]>
redirect "gosh" to "dft" to avoid the name breaking Co-authored-by: Eric Prestat <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericpre that's fine by me. In my personal use I've started to use author names. But until there's an invasion of competing DFT and Dirac datasets I don't think we have a problem. |
Co-authored-by: Eric Prestat <[email protected]>
Co-authored-by: Eric Prestat <[email protected]>
@gguzzina Indeed the author names would be more specific, but the users probably would not have a instinct guess (can add in the doc string which corresponds the author name with the method). Let's keep this option open when the next new GOS added to the list. |
Description of the change
exspy/misc/eels/gosh_gos.py
toexspy/misc/eels/dirac_gos.py
and replace the links. I think it might be better to reduce the redundancy and combine into one. But I hope to hear your preference.Progress of the PR
upcoming_changes
folder (seeupcoming_changes/README.rst
),docs/readthedocs.org:exspy
build of this PR (link in github checks)Minimal example of the bug fix or the new feature