Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the content type of _journal_index.id be changed from "Integer" to "Word"? #345

Closed
vaitkus opened this issue Feb 2, 2023 · 6 comments
Labels
janitorial Small editing tasks question

Comments

@vaitkus
Copy link
Collaborator

vaitkus commented Feb 2, 2023

As discussed in issue #316, the content type of arbitrary loop identifiers in DDLm is being changed to Word which covers all case-sensitive CIF2 strings without whitespace symbols (e.g. 1, a, c7A). In most cases this migration was quite straightforward, however, the definition of the _journal_index.id item explicitly states that is it an "Index number identifier" and restricts all of the values to positive integers.

The _journal_index.id data item seems to have only been added in the DDLm version of the CIF_CORE dictionary and should therefore not have any historic usage practices. Are there any objections against changing it to have the Word content type?

Also, note that the _journal_index_id alias does not seem to have ever been used before and so this alias could potentially be completely removed.

Based on comments left in #341 I assume that @publcif and @nautolycus may have a stake in this.

@vaitkus
Copy link
Collaborator Author

vaitkus commented Jul 8, 2023

Sometime has passed so I assume it is reasonable to send out a reminder. @publcif , @nautolycus do you foresee any issue with changing the content type of the _journal_index.id from Integer to Word (see the previous comment for more context).

@nautolycus
Copy link
Collaborator

Sorry for delay. I see no problem with the suggested change. If effected, the _description.text should be changed from "Index number identifier of the JOURNAL_INDEX category" to something like "Unique identifier for a journal index entry term" (and of course the enumeration range omitted). I'm also happy if the
_alias.definition_id '_journal_index_id'
is dropped.

However,
[1] I don't know the history of this term and am curious why it was introduced when other similar categories (JOURNAL, PUBL, PUB_SECTION etc.) don't have a similar synthetic key.
[2] The JOURNAL_INDEX category catered for the 1990s practice of typesetting separate subject and formulae indexes, and I don't think this has been done (at least for IUCr journals) for ages.

@vaitkus
Copy link
Collaborator Author

vaitkus commented Jul 13, 2023

@nautolycus, thank you for the information, I have created PR #449 based on your comments.

However,
[1] I don't know the history of this term and am curious why it was introduced when other similar categories (JOURNAL, PUBL, PUB_SECTION etc.) don't have a similar synthetic key.

My guess would be that _journal_index.id was initially introduced in an attempt to normalize the JOURNAL_INDEX category when migrating from DDL1 to DDLm (neither _journal_index.id nor _journal_index_id are in the DDL1 version of the dictionary). Due to that, I guess that it is very unlikely that this item is currently in use anywhere so it should be safe to modify it in any way we want.

[2] The JOURNAL_INDEX category catered for the 1990s practice of typesetting separate subject and formulae indexes, and I don't think this has been done (at least for IUCr journals) for ages.

From the definitions in DDL1 dictionary, it seems that items from the JOURNAL_INDEX category were only intended to be used internally by the publishers: "The creator of a CIF will not normally specify these data items.". I checked the COD entries and indeed none of them contain data items from the JOURNAL_INDEX category. If they are indeed no longer in use, it might make sense to move them and similar publisher-specific items into a separate dictionary for clarity and conciseness of CIF_CORE. Is this something worth undertaking?

@nautolycus
Copy link
Collaborator

If they are indeed no longer in use, it might make sense to move them and similar publisher-specific items into a separate dictionary for clarity and conciseness of CIF_CORE.

I prefer not to do this at the moment. There are other categories in the "PUBLICATION" category of CIF_CORE that are really rather generic metadata, so the question then arises of whether to move more (or all) of these into a separate dictionary (or dictionaries) not specific to crystallography. One thing I will look at when we review the content of CIF_CORE for Volume G is whether it would be useful to amplify the definitions for the JOURNAL_* categories. Although they were designed with IUCr journals specifically in mind, giving more detailed definitions might in the long term encourage their use by other publishers.

@jamesrhester
Copy link
Contributor

Closing this issue as the original concern is solved.

@vaitkus
Copy link
Collaborator Author

vaitkus commented Sep 11, 2023

@nautolycus Thank you for the response. A lot of publishers that provide crystal structures in CIF format as supplementary material are already using these data names to related a crystal structures to specific publications (good!) and so does some databases (i.e. the COD). I know that there is also the CITATION category which allows to provide more than one source, etc., but this one seems to be much less used.

@jamesrhester Thank you for closing the issue, I had somehow forgotten about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
janitorial Small editing tasks question
Projects
None yet
Development

No branches or pull requests

3 participants