Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

January 2019 DAC codelist updates #283

Merged
merged 8 commits into from
Feb 27, 2019
Merged

January 2019 DAC codelist updates #283

merged 8 commits into from
Feb 27, 2019

Conversation

andylolz
Copy link
Contributor

@andylolz andylolz commented Jan 24, 2019

These updates are all from the DAC XML source, available here:

https://webfs.oecd.org/crs-iati-xml/Lookup/

This replaces #249.


To summarise the changes here:

Aid Type

Codes added:

  • H03 - Asylum-seekers ultimately accepted
  • H04 - Asylum-seekers ultimately rejected
  • H05 - Recognised refugees

Sector Category

Codes added:

  • 123 - Non-communicable diseases (NCDs)

Sector

Codes added:

  • 11250 - School feeding
  • 12310 - NCDs control, general
  • 12320 - Tobacco use control
  • 12330 - Control of harmful use of alcohol and drugs
  • 12340 - Promotion of mental health and well-being
  • 12350 - Other prevention and treatment of NCDs
  • 12382 - Research for prevention and control of NCDs
  • 15190 - Facilitation of orderly, safe, regular and responsible migration and mobility
  • 16070 - Labour Rights
  • 16080 - Social Dialogue
  • 24050 - Remittance facilitation, promotion and optimisation
  • 25030 - Business development services
  • 25040 - Responsible Business Conduct
  • 43060 - Disaster Risk Reduction
  • 43071 - Food security policy and administrative management
  • 43072 - Household food security programmes
  • 43073 - Food safety and quality
  • 74020 - Multi-hazard response preparedness
  • 93011 - Refugees/asylum seekers in donor countries - food and shelter
  • 93012 - Refugees/asylum seekers in donor countries - training
  • 93013 - Refugees/asylum seekers in donor countries - health
  • 93014 - Refugees/asylum seekers in donor countries - other temporary sustenance
  • 93015 - Refugees/asylum seekers in donor countries - voluntary repatriation
  • 93016 - Refugees/asylum seekers in donor countries - transport
  • 93017 - Refugees/asylum seekers in donor countries - rescue at sea
  • 93018 - Refugees/asylum seekers in donor countries - administrative costs

Codes withdrawn:

  • 41050 - Flood prevention/control
  • 74010 - Disaster prevention and preparedness

Finance Type Category

Codes added:

  • 0 - NON FLOW ITEMS

Finance Type

Codes added:

  • 1 - GNI: Gross National Income
  • 2 - ODA % GNI
  • 3 - Total Flows % GNI
  • 4 - Population

These updates are all from the DAC XML source, available here:

https://webfs.oecd.org/crs-iati-xml/Lookup/
@andylolz
Copy link
Contributor Author

@samuele-mattiuzzo
Copy link
Contributor

@andylolz for the BAs benefit, would you mind providing a diff between your import and the original XML file (if you have something like that available that is)? Do you use the script you suggested we'd use in another PR?

Thank you!

@andylolz
Copy link
Contributor Author

andylolz commented Jan 25, 2019

Do you use the script you suggested we'd use in another PR?

Nope – I did this manually :) The script in #172 processes the DAC Excel file (well, it processes some CSV on datahub.io, but that comes from the DAC Excel file). @bill-anderson appears to suggest the Excel file should not be used ("more sustainable solution" etc) so this PR uses the XML instead.

would you mind providing a diff between your import and the original XML file (if you have something like that available that is)?

A diff is maybe tricky because the DAC XML file (available here) is just one big file. But I can explain the steps I went through. I did the following bits of cleanup:

  • Pretty print (with 4 space indentation)
  • Remove DAC namespace elements, because these aren’t valid
  • Remove trailing whitespace in text nodes
  • Replace bad @statuses with either "active" or "withdrawn" (The @status attribute in the DAC XML sometimes contains "active-MCD", "active- Pilot" or "Vonlontary basis" [sic], which are not valid statuses. I’ve flagged this issue with your colleagues and with Valerie by email, but for the purposes of this PR I’ve fixed these manually.)
  • Split into constituent files
  • Add the IATI metadata back (e.g. DAC calls the channel code codelist "Channelcode", whereas IATI calls it "CRSChannelCode")

I think that’s everything. Here’s what I haven’t done:

  • Reordered codes to match the previous order (which would probably make diffs a bit easier to read)
  • Gone through and checked for removed codes (anything removed is a problem, since it should instead be marked as status="withdrawn")

The diff in this PR shows that quite a lot of stuff has changed. I guess that’s mostly because the source has changed from Excel to XML, and there are some mismatches between the two. I think it will be difficult to verify and merge this PR for that reason. If the goal is to eventually use XML from the DAC as the source for these replicated codelists, then I’d be tempted to go back to the DAC technical team with a list of stuff to fix at their end, and use the Excel file as the source in the interim.


I’m very pleased you’re looking at this, because it’s really important that these replicated codelists are kept in sync with source. For instance, a validator might say that a dataset is invalid because a bad sector code is used, when in fact the problem might be that the IATI replicated Sector codelist is out of sync, and doesn’t include a complete list of sector codes. A publisher could also be scored down on the Aid Transparency Index for the same reason. Or an aid management system might rely on these codelists for interpreting published IATI data.

Anyway – I’d be happy to discuss next steps.

@samuele-mattiuzzo
Copy link
Contributor

@andylolz fab, thanks! Petya and the BAs have this to check on their todo list, it'll be checked during this week!

@PetyaKangalova
Copy link
Contributor

@andylolz thanks so much for your work on this and clarifying the steps you have undertaken. The crucial bit here is to again get confirmation from the OECD DAC that the Excel and XML include exactly the same content which at moment is not the case! We were promised that the XML will be in sync with the source file. I have copied you in the email I sent to Valerie from the DAC so that we get an answer from them and be able to proceed with the changes as soon as possible. Thanks again!

@PetyaKangalova
Copy link
Contributor

We have now received a response from the OECD that the XML files has been updated and both Excel and XML files have been pulled from the same source.

Both xml and xl file have been regenerated (from SQL as unique source, except for Channel codes) and are available on our website http://www.oecd.org/dac/financing-sustainable-development/development-finance-standards/dacandcrscodelists.htm.

From a quick look of the difference I have identified before the codelists are now identical in the Excel and XML files so I think we can use the updated XML to update the codelists on the IATI website.

@andylolz Is there a way of easily re-doing what you have done so far with the updated XML file? Then I can review the pull request. If it requires a lot of manual work for you, then I can look into making the comparison and adding the pull requests.

@andylolz
Copy link
Contributor Author

andylolz commented Feb 1, 2019

@PetyaKangalova no problem – I’ll try and get this sorted today.

@samuele-mattiuzzo
Copy link
Contributor

Thank you Andy!

@PetyaKangalova
Copy link
Contributor

Thanks @andylolz ! I am off on Monday and in meetings all of Tuesday but should be able to review mid-next week! Thanks again!

@andylolz
Copy link
Contributor Author

andylolz commented Feb 1, 2019

Okay – PR updated using the latest (updated) version of DAC XML. I followed the same steps described above.

@PetyaKangalova
Copy link
Contributor

@andylolz thanks again for redoing the commit. Really appreciate it! It took me a while to review all the changes as there are quite a lot of them! See summary below:

  • Aid Type Category- ready to approve changes
  • Aid Type- ready to approve changes
  • CRS Channel Code- Valerie confirmed that CRS Channel code is the only one not created from source. In the XML you have used some existing codes are missing and don’t think that is on purpose
  • Collaboration Type- ready to approve changes
  • Finance Type Category- ready to approve changes
  • Finance Type- ready to approve changes
  • Flow Type- ready to approve changes
  • Sector- ready to approve changes except for code 74010 and 74020
  • Sector Category- need to understand why descriptions have been removed before approving

Next steps:

  1. @andylolz , would you be able to remove the CRS Channel Code section from the pull request for now? I will contact Valerie to get confirmation but feel like this one will take some time as the XML has not been created from their source database and don’t want to hold the other changes.
  2. I will contact Valerie to get confirmation on whether sector code 74010 has been withdrawn and why description for sector categories have been removed.
  3. Once I get confirmation on 2 , I will make the necessary changes and approve the pull request. As I was reviewing the changes I kept track of all of them (whether it was code addition or change of name or description). I will then work on adding all changes to the non-embedded codelist changelog (just for the sector codelist there are more than 40 changes so might take us some time)
  4. Once codelist changes and changelog have been approved and deployed, we will add a post on IATI Discuss.
  5. We will also contact publishing tool providers to make them aware of the changes.

@andylolz
Copy link
Contributor Author

andylolz commented Feb 13, 2019

It took me a while to review all the changes as there are quite a lot of them!

There are indeed! Great work reviewing!

  1. @andylolz , would you be able to remove the CRS Channel Code section from the pull request for now?

Kk, done.

  • Sector- ready to approve changes except for code 74010 and 74020

Oh, good spot! The same applies to 41050 (Flood prevention/control), which has also disappeared.

Also, the following withdrawn sector codes have disappeared:

  • 15120: Public sector financial management
  • 15140: Government administration
  • 15161: Elections
  • 15162: Human rights
  • 15163: Free flow of information
  • 15164: Women's equality organisations and institutions
  • 23010: Energy policy and administrative management
  • 23020: Power generation/non-renewable sources
  • 23030: Power generation/renewable sources
  • 23040: Electrical transmission/ distribution
  • 23050: Gas distribution
  • 23061: Oil-fired power plants
  • 23062: Gas-fired power plants
  • 23063: Coal-fired power plants
  • 23064: Nuclear power plants
  • 23065: Hydro-electric power plants
  • 23066: Geothermal energy
  • 23067: Solar energy
  • 23068: Wind power
  • 23069: Ocean power
  • 23070: Biomass
  • 23081: Energy education/training
  • 23082: Energy research
  • 92010: Support to national NGOs
  • 92020: Support to international NGOs
  • 92030: Support to local and regional NGOs

They may have been replaced by other codes, but the idea is they’re supposed to remain in perpetuity as status="withdrawn".

@andylolz
Copy link
Contributor Author

3. just for the sector codelist there are more than 40 changes so might take us some time

I’ve mentioned elsewhere that I’m in favour of scrapping this changelog. I’m unconvinced it’s worth your time. It wasn’t updated for the last DAC codelist update (see: IATI/IATI-Guidance#312) so it’s only a partial list of changes anyway.

5. We will also contact publishing tool providers to make them aware of the changes.

Okay – this is very generous of you, but again I don’t think this should be standard practice. Tool providers should be keeping an eye on discuss, or routinely pulling from source. That’s the system as documented. If they start relying on updates from you then that just becomes an extra overhead for you.

@PetyaKangalova
Copy link
Contributor

@andylolz

Kk, done.

Thank you!

Oh, good spot! The same applies to 41050 (Flood prevention/control), which has also disappeared.

Thank you for flagging. I missed this one!

Also, the following withdrawn sector codes have disappeared:

  • 15120: Public sector financial management
  • 15140: Government administration
  • 15161: Elections
  • 15162: Human rights
  • 15163: Free flow of information
  • 15164: Women's equality organisations and institutions
  • 23010: Energy policy and administrative management
  • 23020: Power generation/non-renewable sources
  • 23030: Power generation/renewable sources
  • 23040: Electrical transmission/ distribution
  • 23050: Gas distribution
  • 23061: Oil-fired power plants
  • 23062: Gas-fired power plants
  • 23063: Coal-fired power plants
  • 23064: Nuclear power plants
  • 23065: Hydro-electric power plants
  • 23066: Geothermal energy
  • 23067: Solar energy
  • 23068: Wind power
  • 23069: Ocean power
  • 23070: Biomass
  • 23081: Energy education/training
  • 23082: Energy research
  • 92010: Support to national NGOs
  • 92020: Support to international NGOs
  • 92030: Support to local and regional NGOs

Yes, I agree! I also noticed that there were a few new 'withdrawn' as of 2015 that were not on the IATI list. As they are already withdrawn I was not so concerned but it means the XML is not consistent.

On your point for the changelog I agree that it is a lot of effort. However, this time round there are quite a lot of new codes and it will be important to alert people which ones those are and also make sure organisations can start using them via the various publishing tools. Hence, dropping them a quick email to speed up the process, but it is indeed their responsibility of the tool providers to keep them up-to-date.

Waiting to hear from Valerie and will then action the changes!

@andylolz
Copy link
Contributor Author

andylolz commented Feb 13, 2019

Excellent – all good!

Yes, I agree! I also noticed that there were a few new 'withdrawn' as of 2015 that were not on the IATI list. As they are already withdrawn I was not so concerned but it means the XML is not consistent.

Yes that’s true, but I’d expect DAC to have a better record of withdrawn codes than IATI (since IATI only started recording these relatively recently). So withdrawn codes in the XML that were not previously known to IATI are probably a good thing :)

xml/Sector.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@PetyaKangalova PetyaKangalova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving the full pull request following a few revisions and updated commits. Many thanks @andylolz

Leaving for @IATI/devs to merge and deploy next week.

<codelist-item status="active">
<code>0</code>
<name>
<narrative>NON FLOW ITEMS</narrative>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling this code is deliberately excluded from the IATI replicated codelist. @bill-anderson can confirm or deny.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andylolz @PetyaKangalova is this above comment here holding up the merge or can it still go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha – here’s why I think this is deliberately excluded: #16 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether that means they should or shouldn’t be included. I shall leave it with you to decide!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree that finance type '0- non flow items' does not fit within the activity standard as it does not describe type of finance at activity level. It is much wider than that. I would propose that we do not replicate- waiting for confirmation from @bill-anderson so @samuele-mattiuzzo please hold off merging until then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just talked to @bill-anderson and the view is that we keep this code in, the reason being that we replicate all 'OECD DAC' codelists exactly as they are in the table provided by the DAC. @samuele-mattiuzzo this means we can continue merging this request.

However, what will be useful is to add a note in the Finance Type and Finance Type (Category) to note that non-flow items are not activity specific and we do not expect the codes to be used when reporting finance types in the IATI activity standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure about the note idea.

If it’s important enough that it’s a problem, then you can either add a rule to the ruleset, or leave the codes out.

I guess in general, I think it’s best to avoid situations where the standard strongly advises something, but doesn’t enforce in any way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: "or leave the codes out."
As already mentioned the standard replicates third-party code lists. This is a well-established principle.

Copy link
Contributor Author

@andylolz andylolz Feb 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already mentioned the standard replicates third-party code lists

Not for the past year or so! That’s what this PR aims to fix.

In fact in the case of non-flow items, these codes have been missing from the replicated codelists since 2014 (see: #16).

Happy to go with whatever to get this moving forward this week.

@andylolz
Copy link
Contributor Author

I’ve added a summary of changes in the PR description.

@andylolz
Copy link
Contributor Author

andylolz commented Feb 27, 2019

Seems like the conclusion is: Merge this, and then add a note to the FinanceType codelist (about non-flow items) in a new pull request.

@PetyaKangalova is that right?

@samuele-mattiuzzo samuele-mattiuzzo merged commit d6f50c6 into IATI:master Feb 27, 2019
@andylolz
Copy link
Contributor Author

⭐ 🌟 ⭐

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants