Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added foreign keys #3830

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Nancy9ice
Copy link
Contributor

Overview:
Closes this Google Season of Docs Task:
Include the foreign keys of each table in the data dictionary page

What problem does this address?
The addition of a foreign key column to the data dictionary page helps users to know the column(s) in a table that is/are foreign key(s) and the columns they are related to in other tables. That way, it will be easy for them to understand how to create relationships between tables.

Testing
How did you make sure this worked? How can a reviewer verify this?

I ran the command make docs-build and checked if my changes reflected correctly locally

@aesharpe @zaneselvans @e-belfer I couldn't think of a better way to present these foreign keys so that users who don't have a programming background will understand it. Please do you have better ideas?

@e-belfer
Copy link
Member

e-belfer commented Sep 4, 2024

@Nancy9ice I think if possible, it'd be preferable to add the designation of the foreign key columns into the description field of the corresponding field in the table, rather than creating a table above. Ideally (if this is technically possible!), we'd link to the table itself.

Field Name Type Description
emission_control_equipment_type_code string Short code indicating the type of emission control equipment installed.
Foreign key: Encoded by code field of core_eia__codes_emission_control_equipment_types

@aesharpe Curious what you think! Let's land on a design before we implement anything.

@aesharpe aesharpe added the docs Documentation for users and contributors. label Sep 4, 2024
@aesharpe
Copy link
Member

aesharpe commented Sep 4, 2024

@aesharpe Curious what you think! Let's land on a design before we implement anything.

@e-belfer I think this makes sense. Another option is to add a "Foreign Key" column with the name of the table. I was also thinking of something similar for the primary keys (i.e., a boolean column saying if a value was a primary key, but I like the way we did it with them listed at the top. Easier to grasp at a glance).

@Nancy9ice
Copy link
Contributor Author

@aesharpe Curious what you think! Let's land on a design before we implement anything.

@e-belfer I think this makes sense. Another option is to add a "Foreign Key" column with the name of the table. I was also thinking of something similar for the primary keys (i.e., a boolean column saying if a value was a primary key, but I like the way we did it with them listed at the top. Easier to grasp at a glance).

Yeah, I like the idea of having a different column for the foreign key. So which are we finalizing? Having it in a different column or having it in the description column?

@aesharpe
Copy link
Member

aesharpe commented Sep 5, 2024

which are we finalizing? Having it in a different column or having it in the description column?

Hi @Nancy9ice, @e-belfer and I just chatted and we think that we should go with Ella's suggestion to have the foreign key table in the "Description" field instead of in it's own separate column. We decided this is better because we don't want users to miss a foreign_key column that they would probably have to scroll right to see.

Additional details:

  • Line break between the description and the foreign key reference
  • Bold "foreign key" like in Ella's example above.
  • For values with no foreign key, leave just the description (no need for an anecdote like "no foreign key").

@Nancy9ice
Copy link
Contributor Author

which are we finalizing? Having it in a different column or having it in the description column?

Hi @Nancy9ice, @e-belfer and I just chatted and we think that we should go with Ella's suggestion to have the foreign key table in the "Description" field instead of in it's own separate column. We decided this is better because we don't want users to miss a foreign_key column that they would probably have to scroll right to see.

Additional details:

  • Line break between the description and the foreign key reference
  • Bold "foreign key" like in Ella's example above.
  • For values with no foreign key, leave just the description (no need for an anecdote like "no foreign key").

Alright, thanks for the update. I'll work on it

@Nancy9ice
Copy link
Contributor Author

which are we finalizing? Having it in a different column or having it in the description column?

Hi @Nancy9ice, @e-belfer and I just chatted and we think that we should go with Ella's suggestion to have the foreign key table in the "Description" field instead of in it's own separate column. We decided this is better because we don't want users to miss a foreign_key column that they would probably have to scroll right to see.
Additional details:

  • Line break between the description and the foreign key reference
  • Bold "foreign key" like in Ella's example above.
  • For values with no foreign key, leave just the description (no need for an anecdote like "no foreign key").

Alright, thanks for the update. I'll work on it

I have implemented this and committed the changes. I can't make the field and the table to look like code formats probably because the whole foreign key text already appears like a code block. Also, I noticed that some Foreign keys have multiple fields as the encoders. I think it's in cases where the primary keys of the relationship table is made of multiple columns, please check if it's okay the way it appears in cases like these.

@aesharpe
Copy link
Member

Also, I noticed that some Foreign keys have multiple fields as the encoders. I think it's in cases where the primary keys of the relationship table is made of multiple columns, please check if it's okay the way it appears in cases like these.

@zaneselvans does this seem ok to you?

@zaneselvans
Copy link
Member

Also, I noticed that some Foreign keys have multiple fields as the encoders. I think it's in cases where the primary keys of the relationship table is made of multiple columns, please check if it's okay the way it appears in cases like these.

I'm not sure what situation this is describing. The encoders only apply to single columns.

Looking at the RTD output, there are two distinct things going on here that seem like they might have gotten conflated. There are the "encoders" which we define to standardize the values in categorical columns. Those standardized categorical values are defined in "coding" tables, where the standard codes are the primary key, and the standardized categorical columns in other tables have foreign key relationships that point to those coding tables (and the definitions of the codes). These should all be simple (not composite) keys.

Then there are the normal table primary keys and foreign key relationships which have nothing to do with the codes / encoders / code tables. Many of those involve composite keys, like (report_date, plant_id_eia, generator_id) I don't think it makes sense to list the entire composite foreign key in the entry for an individual field within a table. The FK relationships aren't a property of any individual field, they're a property of the table (or really, of the relationship between two tables).

So if we're including the primary and foreign key information in the data dictionary, I think they should both probably be displayed at the table level, rather than the column level.

As a side note, I think the Data Dictionary page is already overloaded, and including the PK & FK relationships will make it even moreso. I think we need a better way of presenting, visualizing, and searching the structure of the database / data warehouse, which is only going to get bigger and more complex over time, and I think it'll probably be better for us to rely on somebody else's dedicated solution since this is a widespread problem that's not specific to us / PUDL (for example). So just doing something simple for the moment probably makes sense.

@aesharpe
Copy link
Member

aesharpe commented Sep 11, 2024

I think it'll probably be better for us to rely on somebody else's dedicated solution since this is a widespread problem that's not specific to us / PUDL (for example). So just doing something simple for the moment probably makes sense.

@zaneselvans are you suggesting we implement something like this now? Could we just link to that dbdocs page instead of hosting the data dictionary on the readthedocs page?

@zaneselvans
Copy link
Member

are you suggesting we implement something like this now?

No, I'm just saying that I think it's unlikely we can really provide an appropriate interface for users to explore and understand our database schema through ReadTheDocs, and so tweaks to the Data Dictionary page should probably be a lower priority than the organization of the docs in general. Like, we could put a lot of time into the Data Dictionary page, but it ultimately just can't be that good (for hundreds of tables, thousands of columns, maybe multiple databases eventually, etc) given the constraints of the platform, so there are diminishing returns, and it's likely to get replaced with something that's more tailored to displaying and exploring this kind of information. So that same time is probably better spent on the narrative structure of the docs, making sure they're up to date, not duplicative, address both the reference and tutorial usage modes, etc. I just threw the DBDocs thing up as an example of what this might look like. It's not being updated, and it's not an open source solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation for users and contributors.
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

4 participants