Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAC Codelist update from 28-02-2020 #326

Merged
merged 9 commits into from
Apr 9, 2020
Merged

DAC Codelist update from 28-02-2020 #326

merged 9 commits into from
Apr 9, 2020

Conversation

akmiller01
Copy link
Contributor

No description provided.

@andylolz
Copy link
Contributor

There are lots of instances of withdrawn codes being removed completely in this PR. Withdrawn codes should not be removed

@akmiller01
Copy link
Contributor Author

Thank you, @andylolz ! Working with a bit of inherited conversion code and the above was literally the result of a misplaced indentation in a bit of code that was trying to add the "withdrawn" status where a code was omitted:

From:

for key, element in iati_codes.items():
        if key not in dac_codes.keys():
            if element.attrib['status'] != 'withdrawn':
                element.attrib['status'] = 'withdrawn'
                dac_codes[key] = element

To:

for key, element in iati_codes.items():
        if key not in dac_codes.keys():
            if element.attrib['status'] != 'withdrawn':
                element.attrib['status'] = 'withdrawn'
            dac_codes[key] = element

@andylolz
Copy link
Contributor

andylolz commented Mar 10, 2020

Fix sounds good to me.

Working with a bit of inherited conversion code

Interesting – is the conversion code public, at all?

In case it’s of use, the code we are using to do this same task for codelists.codeforiati.org is here:
https://github.com/codeforIATI/codelist-updater/blob/master/importers/helpers/__init__.py

@akmiller01
Copy link
Contributor Author

Fix sounds good to me.

Working with a bit of inherited conversion code

Interesting – is the conversion code public, at all?

In case it’s of use, the code we are using to do this same task for codelists.codeforiati.org is here:
https://github.com/codeforIATI/codelist-updater/blob/master/importers/helpers/__init__.py

Yep absolutely. I've hosted it here https://github.com/akmiller01/DAC-Codelists but I don't know if it would be more appropriate to place it elsewhere.

It does seem like parts of it have already been sourced from your Codelist Updater.

@andylolz
Copy link
Contributor

andylolz commented Mar 11, 2020

Great – many thanks for sharing this!

It does seem like parts of it have already been sourced from your Codelist Updater.

Seems very possible that they have a shared ancestor. The codelist-updater code is based on the code in this pull request, which in turn is based on the code in this pull request.

I don't know if it would be more appropriate to place it elsewhere.

I’d suggest moving to an IATI owned repo.

@stevieflow
Copy link
Contributor

Don't forget all the previous work around this:

#51
#172

xml/Sector.xml Outdated
@@ -81,7 +81,7 @@
</description>
<category>112</category>
</codelist-item>
<codelist-item status="active">
<codelist-item status="withdrawn">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akmiller01 let's check this. There are a number of codes that have been changed from active to withdrawn and this looks incorrect to me:

  1. Is there a "withdrawn-date" that you can identify in the source XML?
  2. This is a "voluntary code" in OECD DAC terms. In IATI we don't differentiate between voluntary and non-voluntary, so my thinking is there is some error there. I think the status should still be "active".
    image
    Let's cross-check this for other codes.

xml/Sector.xml Outdated
@@ -249,6 +249,18 @@
</description>
<category>121</category>
</codelist-item>
<codelist-item status="withdrawn" activation-date="2019-01-01">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akmiller01 this looks like another one where codelist has been listed as "withdrawn", while it should be "active". Maybe there was an error in the code replacing "Vonlontary basis" with "withdrawn" which is incorrect?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I was recoding "Vonlontary basis" as withdrawn. This will be corrected in the next commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akmiller01 !

@@ -810,10 +602,6 @@
<narrative>Purchase of securities from issuing agencies</narrative>
<narrative xml:lang="fr">Titres et autres instruments émis par les agences multilatérales</narrative>
</name>
<description>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akmiller01 can you see a reason why this was removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was code to remove descriptions where narratives are empty; would we rather not do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes logical sense. I just did not understand why they were removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to remove this stuff, but I wonder if it should be a separate PR. It creates a lot of noise in this one, which also contains important changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Reversed for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah – that’s much better! Thanks

@@ -1,4 +1,4 @@
<codelist name="CRSChannelCode" xml:lang="en" complete="1" embedded="0">
<codelist name="Channel-category" xml:lang="en" complete="1" embedded="0">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know if the name of the codelist should be changed.

</codelist-item>
<codelist-item status="active" activation-date="2015-12-07">
<codelist-item status="withdrawn" activation-date="2010-01-01" withdrawal-date="2010-12-31">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this codelist item has been withdrawn. There are two withdrawn entries for this item, but also one active one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the source XML, code 21011 does indeed appear three times. Active from January 1, 2003, withdrawn on December 31, 2003; reactivated January 1, 2004 (without an in-line withdrawal); and then reactivated again on January 1, 2010, withdrawn December 31, 2010.

Maybe it's worth asking the OECD about the intended meaning, @PetyaKangalova ? I think it would be fair to concatenate codelist items when codes and narratives match verbatim (which would ultimately make this item withdrawn, with an activate date of 2003 and withdrawal date at the end of 2010?), but I don't want to make that determination for them. I'm also assuming that we don't want codelist items to have noncontiguous active time periods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akmiller01 ! Will check on this one.

Copy link
Contributor

@andylolz andylolz Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which would ultimately make this item withdrawn, with an activate date of 2003 and withdrawal date at the end of 2010?

I don’t think that’s necessarily the case. It completely depends on the rules you decide to apply when merging codelist items.

It’s also worth noting that in this instance, this code is present in the XLS:
Screenshot 2020-03-13 at 10 39 57

Anyway – thanks for checking with the DAC on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andylolz this code has been reactivated.

</codelist-item>
<codelist-item status="active" activation-date="2015-12-07">
<codelist-item status="withdrawn" activation-date="2010-01-01" withdrawal-date="2010-12-31">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this codelist item has been withdrawn. There are two withdrawn entries for this item, but also one active one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactivated.

@@ -458,7 +458,7 @@
<narrative xml:lang="fr"/>
</description>
</codelist-item>
<codelist-item status="withdrawn">
<codelist-item status="withdrawn" withdrawal-date="2020-02-28">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this withdrawal-date is correct. This code was withdrawn a long time ago – it wasn’t present in March 2017, so it was withdrawn earlier than that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andylolz I already emailed yesterday the OECD to check about all the withdrawn dates that were missing and ask them if they can specify the dates :) I'll update here once I hear back. Thanks for cross-checking the date for this one.

@@ -163,7 +163,7 @@
<narrative xml:lang="fr"/>
</description>
</codelist-item>
<codelist-item status="withdrawn">
<codelist-item status="withdrawn" withdrawal-date="2020-02-28">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has been withdrawn since at least June 2017.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Checking with the OECD and will be updating the dates once we hear back from them.

@andylolz
Copy link
Contributor

It appears as though missing withdrawal dates have been set to 28th Feb 2020 throughout. I think that could potentially cause confusion. Many of these codes were withdrawn long before that date. Where the withdrawal date is not known, I think it’s probably safer to leave it blank.

@PetyaKangalova
Copy link
Contributor

@andylolz thanks for all your comments. On the dates, we indeed discussed this with Alex and now waiting to hear back from the OECD if they can specify withdrawn dates where they were missing. Will provide an update here once I get a response.

@akmiller01
Copy link
Contributor Author

Attaching documents from the OECD. One spreadsheet which has activation/withdrawn years for specific sector codes. And two PDFs with withdraw decisions for the 151 and 920 series:

oldPurposecodes.xlsx

https://one.oecd.org/document/DCD/DAC/STAT(2010)6/en/pdf
https://one.oecd.org/document/DCD/DAC/STAT(2012)3/en/pdf

Copy link
Contributor

@PetyaKangalova PetyaKangalova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akmiller01 Thanks for the latest updates. I have now checked the updated withdrawn dates and looks good to me so happy to approve the full list of changes !

Copy link

@alexlydiate alexlydiate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good from my perspective.

@akmiller01 akmiller01 merged commit 394e449 into master Apr 9, 2020
@akmiller01 akmiller01 deleted the Feb-28-CRS-Update branch April 9, 2020 15:50
@andylolz
Copy link
Contributor

This is great! Great work getting the activation and withdrawal dates from the DAC 🎉

It looks like those dates were added to the source XML by the DAC – is that correct?

With these latest improvements, it looks like the XML may have overtaken the XLS as the canonical source for DAC codelist data!

Worth noting that some codelists are still only present in the DAC XLS, and not the XML. I think the only one that IATI replicates is Region. It would be great to get this added to the XML.

@akmiller01
Copy link
Contributor Author

Hi Andy, the OECD did not have time to add the new dates to their source XML, so I manually went into the output and added them according to the XLSX and PDF documents attached in a comment above. We'll reach out to them about the French translation you noted.

You can preview some of the new codes in the SSOT here: http://dev.reference.iatistandard.org/203/codelists/Sector/

I believe all looks good on our end, so shortly we'll be deploying that to http://reference.iatistandard.org as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants