Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating _diffrn.id and _structure.id with the powder dictionary #171

Open
jamesrhester opened this issue Oct 21, 2024 · 16 comments
Open

Comments

@jamesrhester
Copy link
Contributor

Core CIF has recently added a few data names for handling more complex datasets that include data collected under different conditions, potentially yielding a variety of structures. These new data names are provided in the multi-block dictionary. The powder dictionary can make use of these.

_diffrn.id

First: new data name _diffrn.id (also found in mmCIF) labels a particular set of experimental conditions (ambient environment, radiation source, crystal specimen). Previously, this information was implicitly linked to a diffractogram by DIFFRN data names appearing in the same data block as the diffractogram. We should make this link explicit by defining _pd_diffractogram.diffrn_id, whose value would refer to the set of diffraction conditions relevant to the diffractogram identified by _pd_diffractogram.id.

_structure.id

Core CIF defines a structure as a combination of the atomic sites, a unit cell, and symmetry. Clearly this is closely related to a crystallographic phase. We should determine the nature of this relationship: either it is

  1. One _pd_phase.id implies at most one particular _structure.id
  2. A particular _structure.id describes at most one specific phase
  3. both are true
  4. neither are true

I suggest (1) is not true, as for each temperature step in a multi-temperature experiment the phase would be considered the same (assuming no phase transitions) but the unit cell would be different. Therefore neither can (3) be true. I believe that (2) is thus a reasonable assertion: any structure that is reported is the structure of a particular phase under particular conditions. This means that the powder dictionary should add a data name _structure.phase_id, identifying which phase the structure relates to. Note that the link between a structure and diffraction conditions is already taken care of by the core data name _structure.diffrn_id.

The above suggestions start to address also the points raised in #164 .

Please comment, particularly regarding my understanding of the term "phase".

@rowlesmr
Copy link
Collaborator

Dammit, James. ;) I thought I had it all figured out, and then I had another think and now I don't know what I think...

I think that we've been doing with _pd_phase.id what is now supposed to be done with _structure.id.

There are a couple of ways of looking at things. One is to see how the dictionary works: can we cope with having the same _pd_phase.id in multiple blocks. The other is to think on "what is a 'phase'?".

Taking the second one first, what are some interesting edge cases to look at the limits?

I think that Ian Madsen's (I think it's his) definition is a good starting point: A phase is a crystallographically-distinct material.

  • We can get a diffraction pattern from an amorphous material; it makes no sense to have unit cell params, atomic coordinates... for such a material, so there is no structure (at least in the CIF sense). So, a phase can have no structure.
  • We can index a powder pattern; that gives us a unit cell and symmetry, but we have no atomic coordinates. With these we can do some PONKCS to do QPA-type things. So a phase can have a partial structure, and I would argue that that is enough to give a _structure.id.
  • Then there is the "normal" way of doing things, where a powder pattern represents (at least one) crystalline material which has a structure (unit cell, symmetry, atom coordinates). A phase has a structure.
  • You can have multiple phases in a specimen, consisting of any or all of amorphous, partially crystalline, or fully crystalline.

Does it make sense for a phase to have more than one structure? In the CIF sense, I think the answer has to be yes. How can you look at, say, corundum at 300 K and 1000 K, with all the concomittent changes in unit cell parameters, atomic coordinates, and displacement parameters? It's still corundum. It's just at a higher temperature.

What about the inverse questions: can a structure have more than one phase? I think the answer has to be no. If multiple phases have the same structure, then they aren't crystallographically distinct.

Now, what about multi-diffractogram experiments?

  • Pressure is applied to a (single-phase) material, causing it to gradually orient. There is no crystallographic distortion.
    • PO/Texture is an extrinsic property, and is dependent on the specimen prep and diffractogram. PO is keyed on phase id and diffractogram id, so you could probably have many patterns with a single phase. Even the same specimen diffracted from by different instruments could give different PO; cf x-ray vs neutron and the difference in irradiated volume and distribution of PO.
  • A (single-phase) sample is held at an elevated temperature. The unit cell, symmetry, and atomic coordinates of the initial phase do not change. The sample decomposes to a different phase whose unit cell, symmetry, and atomic coordinates do not change.
    • I think this is a single phase with a single structure going to another single phase with a different single structure.
  • A (single-phase) sample is heated from room temperature to some higher temperature. The unit cell expands, the symmetry remains constant, the atomic coordinates alter slightly, and the atomic displacement parameters embiggen.
    • This is still single-phase, but it has many structures.
  • Through some sort of magic process, an end-member of a solid solution series (eg fayalite, Fe2SiO4) is slowly transformed into the opposite end-member (eg forsterite, Mg2SiO4). The unit cell alters as per Vegard's Law, the symmetry is unchanged, and the atomic coords change to accomodate the change in cation size. The site occupancy changes with each diffractogram.
    • Here we come up against how to define the difference between phases. My initial thought is that we must have many phases, each with one structure. This is because the site occupancies are changing, even though the symmetry is remaining constant. Different elemental composition = different phase. Even if each consecutive pair of diffractograms contain essentially the same phase, over the entire dataset, you start/finish with entirely different phases.

From this last one, do there exist two structures with (essentially) the same unit cell parameters, the same site occupancies, and the same symmetry, but have different atomic coordinates, and are considered to be two different phases? ie what is required to be different to be different phases? Unit cell params, not necessessarily. Symmetry, yes. Site occupancies, probably yes. (again, is Mg1.95Fe0.05SiO4 a different phase to Mg1.96Fe0.06SiO4) Atomic coordinates, probably yes.

I think I'll stop there for now. Its getting late and I need sleep. I'll come back to this later.

@briantoby
Copy link
Collaborator

A couple quick comments:

Pressure is applied to a (single-phase) material, causing it to gradually orient. There is no crystallographic distortion.

Any physical change, changes the structure and possibly the microstructure, how that is modeled OTOH is discretionary. Pressure is going to change the lattice parameters for sure.

What about the inverse questions: can a structure have more than one phase? I think the answer has to be no.

There is one exception that I can think of for this -- which is more of a nomenclature issue, than a real one -- but in describing a magnetic material, one presents a structure for the atoms and one for the spins. Breaking this into two views of a single entity makes the description more compact, so there is still really only one phase, but CIF sees this as two.

@jamesrhester
Copy link
Contributor Author

I think @rowlesmr comment confirms I'm on the right track. I have proposed defining _structure.phase_id. Therefore, mathematically, _pd_phase.id is a function of _structure.id. This is equivalent to stating that, given a particular structure, a unique phase can be identified (but doesn't have to be).

Taking @rowlesmr 's cases from the top:

We can get a diffraction pattern from an amorphous material; it makes no sense to have unit cell params, atomic coordinates... for such a material, so there is no structure (at least in the CIF sense). So, a phase can have no structure.

ie there is no mapping from phase to structure. I am only asserting a mapping from structure to phase, so that's fine.

We can index a powder pattern; that gives us a unit cell and symmetry, but we have no atomic coordinates. With these we can do some PONKCS to do QPA-type things. So a phase can have a partial structure, and I would argue that that is enough to give a _structure.id.

A structure can be partially defined, just provide values for _cell.structure_id and _structure.space_group_id.

Then there is the "normal" way of doing things, where a powder pattern represents (at least one) crystalline material which has a structure (unit cell, symmetry, atom coordinates). A phase has a structure.
You can have multiple phases in a specimen, consisting of any or all of amorphous, partially crystalline, or fully crystalline.

If there is a mapping from structure to phase, multiple structures can map to a single phase, or there can be a one-to-one mapping. Both these situations are covered by the proposed definition.

Pressure is applied to a (single-phase) material, causing it to gradually orient. There is no crystallographic distortion.

In this situation both diffractogram and phase are important to describe PO. Structure is not directly involved. So structure maps to phase, and phase together with diffractogram determine a particular set of PO parameters. This shows the importance of phase as a concept separate to structure (as do many of the other examples).

A (single-phase) sample is held at an elevated temperature. The unit cell, symmetry, and atomic coordinates of the initial phase do not change. The sample decomposes to a different phase whose unit cell, symmetry, and atomic coordinates do not change.

Not sure how you can raise the temperature and have no changes to the structure? In any case, under the proposed definitions a single phase can have multiple structures, and if you want to name the phase differently at some point, that works as well.

Through some sort of magic process, an end-member of a solid solution series (eg fayalite, Fe2SiO4) is slowly transformed into the opposite end-member (eg forsterite, Mg2SiO4). The unit cell alters as per Vegard's Law, the symmetry is unchanged, and the atomic coords change to accomodate the change in cation size. The site occupancy changes with each diffractogram.

Here we come up against how to define the difference between phases. My initial thought is that we must have many phases, each with one structure. This is because the site occupancies are changing, even though the symmetry is remaining constant. Different elemental composition = different phase. Even if each consecutive pair of diffractograms contain essentially the same phase, over the entire dataset, you start/finish with entirely different phases.

At each point in the solid solution, there is a defined structure. Under the proposed definition, you have the flexibility of assigning each structure to a different phase, or to the same phase. The important thing is that the proposed definition doesn't commit you to a particular view of when a phase is no longer the same. It does commit you to only allowing a structure to be associated with one phase.

(Of course, even the latter can be worked around by creating a new _structure.id with identical cell etc.)

@jamesrhester
Copy link
Contributor Author

There is one exception that I can think of for this -- which is more of a nomenclature issue, than a real one -- but in describing a magnetic material, one presents a structure for the atoms and one for the spins. Breaking this into two views of a single entity makes the description more compact, so there is still really only one phase, but CIF sees this as two.

I'm not sure why you say that CIF sees this as two. I'm assuming that the current approach is that a separate _pd_phase.id is created and the magnetic-only structure is presented in a separate data block? I think this can be accommodated by simply assigning a different _structure.id to the magnetic structure. As noted, this is just a bit less compact. Also, part of the reason for creating _structure.id is so that the magnetic structure (and an incommensurate structure) has a way to refer to the parent structure that doesn't involve just pointing to a data block.

As an aside, we haven't exactly bedded down how we want the magnetic structure to relate to the structure as currently defined (ie the bundle of cell, space group, and atomic positions). We can either absorb magnetic structure into structure by making magnetic space group etc. belong to structure, or we can define a separate magnetic_structure identifier, or we can do both with a magnetic structure being associated with a particular _structure.id.

@rowlesmr
Copy link
Collaborator

There are a couple of ways of looking at things. One is to see how the dictionary works: can we cope with having the same _pd_phase.id in multiple blocks. The other is to think on "what is a 'phase'?".

What about the first. Can we cope with non-unique values of _pd_phase.id in the dictionary as it currently stands?

Just FYI:

save_pd_phase.id

    _definition.id                '_pd_phase.id'
    _definition.update            2022-12-03
    _description.text
;
    Arbitrary label uniquely identifying a phase.
;
    _name.category_id             pd_phase
    _name.object_id               id
    _type.purpose                 Key
    _type.source                  Assigned
    _type.container               Single
    _type.contents                Text

save_

First, which categories use _pd_phase.id as a key?

  • PD_AMORPHOUS
    • _pd_peak.id & _pd_phase.id
  • PD_CALC_COMPONENT
    • _pd_diffractogram.id, _pd_phase.id, & _pd_data.point_id
  • PD_CALIB_WAVELENGTH
    • _pd_diffractogram.id, _diffrn.id, & _pd_phase.id
  • PD_PHASE
    • _pd_phase.id
  • PD_PHASE_MASS
    • _pd_diffractogram.id & _pd_phase.id
  • PD_PREF_ORIENT
    • _pd_diffractogram.id & _pd_phase.id
  • PD_PREF_ORIENT_MARCH_DOLLASE
    • _pd_diffractogram.id, _pd_pref_orient_March_Dollase.id, & _pd_phase.id
  • PD_PREF_ORIENT_SPHERICAL_HARMONICS
    • _pd_diffractogram.id, _pd_pref_orient_spherical_harmonics.id, & _pd_phase.id
  • PD_QPA_CALIB_FACTOR
    • _pd_phase.id
  • PD_QPA_INTENSITY_FACTOR
    • _pd_diffractogram.id & _pd_phase.id
  • PD_QPA_INTERNAL_STD
    • _pd_diffractogram.id & _pd_phase.id
  • REFLN
    • _refln.index_h, _refln.index_k, _refln.index_l, & _pd_phase.id

There is a preponderance of _pd_diffractogram.id & _pd_phase.id, so as long as _pd_diffractogram.id is globally unique, and there are not multiple _pd_phase.ids in the same diffractogram, then we're golden.

PD_QPA_CALIB_FACTOR is based solely on _pd_phase.id. I will have to remind myself on how it is supposed to work and maybe add _pd_diffractogram.id to it.

PD_QPA_INTERNAL_STD uses _pd_phase.id to identify the material used as an internal standard in the given diffractogram. as long as it is able to uniquely identify the phase and structure, then it should be good

REFLN is potentially interesting, as it can also be used to list d-spacings, but you need the structure to do that, not just the phase id.

Second, which categories use _pd_phase.id, but not as a key?

  • _pd_calib_detected_intensity.phase_id
    • A code which identifies the particular phase from which this intensity was taken, if it was calibrated by a specimen.
  • _pd_calib_incident_intensity.phase_id
    • A code which identifies the particular phase from which this intensity was taken, if it was calibrated by a specimen.
  • _pd_calib_xcoord_overall.phase_id
    • A code which identifies the particular phase used in calibrating the X-coordinate, if it was calibrated by a specimen. The phase can be an internal or external standard.
  • _pd_qpa_external_std.phase_id
    • The phase (see _pd_phase.id) used as the external standard.

As long as this is enough to uniquely identify the phase and structure.

@rowlesmr
Copy link
Collaborator

rowlesmr commented Oct 26, 2024

We're going to have to beef up the definitions in PD_PHASE and give examples of how it is supposed to interact with STRUCTURE.*

I'll try and draw up an example CIF.

* Even if it is just to get it right in my head.

@jamesrhester
Copy link
Contributor Author

I'm also working on some full examples generated from GSAS-II tutorial data. If the QPA standard is given as a phase id, then you'd have to associate only a single structure with that phase. You could instead give a structure id instead of a phase ID, and that would be associated with a particular phase.

@briantoby
Copy link
Collaborator

I think there are the following types of "project CIFs" generated in GSAS-II:

  1. single-block CIFs (one phase & one histogram)
  2. multi-block combined fits: >1 phase and/or >1 histogram
  3. sequential fits w/1 block per histogram, plus overall blocks (1 phase)
  4. sequential fits w/multiple blocks per histogram, plus overall blocks (>1 phase)

Not sure we have tutorials covering all of these. Probably all but 3, but that can be generated from the sequential fit tutorial if one only includes the majority phase.

There are probably quite a few subcases for 2, if one considers one phase, multiple histograms different from, one histogram w/multiple phases, and then cases where not all phases are found in all histograms, also combined powder/single crystal.

@rowlesmr
Copy link
Collaborator

First: new data name _diffrn.id (also found in mmCIF) labels a particular set of experimental conditions (ambient environment, radiation source, crystal specimen).

AcTuAlLy, its defined as a a label for a diffraction data set collected under particular diffraction conditions (see COMCIFS/MultiBlock_Dictionary#17).

I think it should label the conditions, so that if many diffractograms are collected under the same set of conditions, then you don't need to repeat yourself.

@rowlesmr
Copy link
Collaborator

There are probably quite a few subcases for 2, if one considers one phase, multiple histograms different from, one histogram w/multiple phases, and then cases where not all phases are found in all histograms, also combined powder/single crystal.

Definitely.

I think this is where the stress test lies. Taking a temperature-dependent experiment as a baseline (could be time, pressure, magnetic field, any other combination you'd like...)

  • Multi-diffractogram data set
    • can also include neutron CW, neutron TOF, and multiple CW X-ray diffractograms at each temperature
  • Multiple phases over all diffractograms
    • The same phases may exist in many diffractograms, may not appear in some, may appear after disappearing...
  • Multiple structures per phase
    • you have structural changes within a phase as you heat it up, but it is still (for instance) corundum.

I don't think we currently have the ability to define a structure or phase that has been co-refined over multiple diffractograms. Is this a thing we want to look at? (PD_DIFFRACTOGRAM_GROUP, anyone?)

Does core CIF worry about a structure being determined from multiple data sets?

@jamesrhester
Copy link
Contributor Author

I don't think we currently have the ability to define a structure or phase that has been co-refined over multiple diffractograms. Is this a thing we want to look at? (PD_DIFFRACTOGRAM_GROUP, anyone?)

Well, using _structure.id and the new STRUCTURE category, a structure is identified using _structure.id. In the case of multiple diffractograms being used to refine a single structure, the structure is associated with a phase using the proposed _structure.phase_id. Then, for example, PD_PHASE_MASS looks like it lists the phases modelled as being present in a given diffractogram, so that's one avenue to express a single phase in multiple diffractograms, and perhaps there are others depending on how the knowledge of which phase is in which sample has come about.

The concept of a refinement has not yet been added to core CIF (which means that implicitly the results in a CIF are from a single refinement), so that's the next frontier. You could imagine a pointer in the structure category to a _refinement.id to indicate that this structure resulted from the indicated refinement.

@jamesrhester
Copy link
Contributor Author

jamesrhester commented Nov 6, 2024

Please see below draft of first example: one phase, two measurements.

Each measurement is in a separate data block, each set of diffraction conditions is also in a separate data block, all other information is in a single data block. Data blocks are linked using data names linked to _diffrn.id and _pd_phase.id. I've used PD_PHASE_MASS to link a phase to a measurement.

Key issue: there is no well-defined value for _structure.diffrn_id, as we unfortunately have historically mixed environmental conditions and probe into a single category. Not a show-stopper, as it is an optional value. The remaining data names allow deduction of the environmental conditions for the structure by going structure.id -> phase_id, and noting that both diffractograms contain that phase_id, then determining that their diffrn.id has the same conditions.

A better solution is for us to deprecate _diffrn_radiation.diffrn_id (it's already in mmCIF, unfortunately) and instead define diffrn_radiation.id so that an experiment can be cobbled together from a _diffrn.id, a _diffrn_radiation.id, and an _exptl_crystal.id

Notes:

  1. Example assumes data names at beginning of this issue have been defined
  2. Example implicitly assumes _diffrn_radiation_wavelength.diffrn_id has been defined (see here)
  3. _structure.id and _pd_phase.id are in the same data block as there is only one of each. A multi-temperature or multi-phase example would require splitting them apart.
#\#CIF2.0
#
# Example of using CIF to describe two data sets, one phase
#
# Assumes use of proposed data names.
#
# There are five data blocks:
# 2 x diffraction experimental conditions
# 2 x raw powder data using `_pd_diffractogram.diffrn_id` to
#     refer to the relevant diffraction conditions
# 1 x data block for everything else
#
data_PWDR_PBSO4.CWN_Bank_1

_pd_diffractogram.id	'PWDR PBSO4.CWN Bank 1'
_pd_diffractogram.diffrn_id   11158    # <-proposed
_pd_phase_mass.phase_id  pbso4
_pd_phase_mass.percent   100

    loop_
      _pd_meas.2theta_scan
      _pd_meas.intensity_total
      _pd_meas.intensity_total_su
         10.0                          220.0             0.004
         10.05                         214.0             0.004
         10.1                          219.0             0.004
         10.15                         224.0             0.004
         10.2                          198.0             0.005
         10.25                         229.0             0.004
         10.3                          224.0             0.004

#...

data_PWDR_PBSO4.XRA_Bank_1

_pd_diffractogram.id	'PWDR PBSO4.XRA Bank 1'
_pd_diffractogram.diffrn_id   11080    # <-proposed
_pd_phase_mass.phase_id  pbso4
_pd_phase_mass.percent   100

    loop_
      _pd_meas.2theta_scan
      _pd_meas.intensity_total
      _pd_meas.intensity_total_su
         10.0                         179.0             0.005
         10.025                       147.0             0.006
         10.05                        165.0             0.006
         10.075                       172.0             0.005
         10.1                         150.0             0.006
         10.125                       165.0             0.006
#...

data_11158

_diffrn.id	11158

_diffrn.ambient_pressure	0.1
_diffrn.ambient_temperature	300.0
_diffrn_radiation.probe	        neutron
_diffrn_radiation_wavelength.value    1.909

data_11080

_diffrn.id	11080

_diffrn.ambient_pressure	0.1
_diffrn.ambient_temperature	300.0
_diffrn_radiation.probe	x-ray

loop_
      _diffrn_radiation_wavelength.id
      _diffrn_radiation_wavelength.value
     1   1.5405
     2   1.5443

data_classic

_pd_phase.id          pbso4

# Following two could be elided as no ambiguity
_structure.id         pbso4_rt
_structure.phase_id   pbso4  # <- Proposed

_cell.angle_alpha	90.0
_cell.angle_beta	90.0
_cell.angle_gamma	90.0
_cell.length_a	         8.485
_cell.length_b	         5.402
_cell.length_c	         6.965
_cell.volume           319.305

_space_group.crystal_system	orthorhombic
_space_group.laue_class	        mmm
_space_group.name_h-m_ref	'P n m a'

loop_
      _atom_site.label
      _atom_site.fract_x
      _atom_site.fract_y
      _atom_site.fract_z
      _atom_site.type_symbol
   Pb1       0.1882            0.25             0.167       Pb
   S2        0.063             0.25             0.686       S
   O3        -0.095            0.25             0.6         O
   O4        0.181             0.25             0.543       O
   O5        0.085             0.026            0.806       O

@briantoby
Copy link
Collaborator

I dislike putting the measurement conditions in a separate block from the diffraction pattern data itself. To me they are very much linked and I see little advantage from separating them, so I would go with three blocks here rather than five. Perhaps four, since I like to have something that serves as a TOC. In this case the TOC info can be combined with the Phase block, but with multiple phases, that would need to be free-standing.

@rowlesmr
Copy link
Collaborator

rowlesmr commented Nov 7, 2024

Please see below draft of first example: one phase, two measurements.

@jamesrhester I think you have a typo, as the X-ray diffrn.id isn't referenced anywhere

@jamesrhester
Copy link
Contributor Author

Fixed

@jamesrhester
Copy link
Contributor Author

I dislike putting the measurement conditions in a separate block from the diffraction pattern data itself. To me they are very much linked and I see little advantage from separating them, so I would go with three blocks here rather than five. Perhaps four, since I like to have something that serves as a TOC. In this case the TOC info can be combined with the Phase block, but with multiple phases, that would need to be free-standing.

There is indeed no technical reason that values corresponding to the _diffrn.id under which a particular diffractogram was measured couldn't be put together into a single block with the diffractogram. This approach just becomes repetitive if many diffractograms are collected under identical conditions. I suggest that when we come to draft recommendations for presenting complex PD datasets, putting measurement conditions and diffractogram in one block can be one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants