Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iron out association asset names #3030

Closed
13 tasks done
Tracked by #2765
bendnorman opened this issue Nov 8, 2023 · 4 comments · Fixed by #3035
Closed
13 tasks done
Tracked by #2765

Iron out association asset names #3030

bendnorman opened this issue Nov 8, 2023 · 4 comments · Fixed by #3035

Comments

@bendnorman
Copy link
Member

bendnorman commented Nov 8, 2023

@zaneselvans suggested we alphabetize multiple source names when they appear in an asset name. I went through all of our association tables and realized we not super consistent with how we name these types of assets:

  • Association assets that link multiple sources and need to be alphabetized
    • core_epa__assn_epacamd_eia
    • core_epa__assn_epacamd_eia_subplant_ids
    • core_pudl__assn_raw_epacamd_eia
    • _core_epa__assn_epacamd_eia_unique
  • Mostly our association assets follow this naming convention {layer}_{source of association asset}__assn_{datasets being linked}_{entity being linked}. However, we aren't consistent about the order of {datasets being linked} and {entity being linked}. For example:
    • core_pudl__assn_plants_eia
    • core_pudl__assn_plants_ferc1
    • core_pudl__assn_utilities_eia
    • core_pudl__assn_utilities_ferc1
    • core_pudl__assn_utilities_ferc1_dbf
    • core_pudl__assn_utilities_ferc1_xbrl
  • I found one association asset that still has assn at the end of the asset name.
    • core_eia860__yearly_boiler_emissions_control_equipment_assn

I propose we rename these assets to follow this convention: {layer}_{source of association asset}__assn_{datasets being linked}_{entity being linked}

Tasks

@cmgosnell
Copy link
Member

I propose we rename these assets to follow this convention: {layer}{source of association asset}_assn{datasets being linked}{entity being linked}

I think this is the way!! which does - somewhat annoyingly- mean that we will have some instances where the source dataset is also included in the datasets being linked. but i think that is okay and will make everything more consistent!!

@cmgosnell
Copy link
Member

cmgosnell commented Nov 9, 2023

Which i thiiiink would mean this:

  • core_epa__assn_eia_epacamd
  • core_epa__assn_eia_epacamd_subplant_ids
  • core_pudl__assn_raw_eia_epacamd (i actually don't were the raw should be here)
  • _core_epa__assn_epacamd_eia_unique
  • core_pudl__assn_eia_pudl_plants (assuming pudl is also a dataset here & below)
  • core_pudl__assn_ferc1_eia_plants
  • core_pudl__assn_eia_pudl_utilities
  • core_pudl__assn_ferc1_pudl_utilities
  • core_pudl__assn_ferc1_dbf_pudl_utilities (assuming ferc1_dbf is considered a dataset)
  • core_pudl__assn_ferc1_xbrl_pudl_utilities

If the assn is within one dataset should it still get the {datasets being linked}:

  • core_eia860__assn_yearly_eia860_boiler_emissions_control_equipment
  • OR just core_eia860__assn_yearly_eia860_boiler_emissions_control_equipment

@bendnorman
Copy link
Member Author

bendnorman commented Nov 9, 2023

These look good to me! I think if the assn is within one dataset we shouldn't include {datasets being linked} just so there isn't too much duplication. So boiler emissions would be core_eia860__assn_yearly_boiler_emissions_control_equipment

@bendnorman
Copy link
Member Author

Closed by #3035

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants