-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cannot use 'Value' as the column name for observation column #857
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may find it "hacky", but I can't think of a simpler or better way to deal with this very specific bug! Just a couple of questions and something to clean up, then should be good for approval.
src/csvcubed/utils/csvdataset.py
Outdated
# parameter passed to the melt function to "Not-Value" so that we don't | ||
# trigger a pandas ValueError. | ||
value_name = "Value" | ||
if "Value" in value_cols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the pandas ValueError trigger if the column title contains "Value" (e.g. if "Value" in
), or only if the title is "Value" exactly (== "Value)? I am assuming it is the latter, since this seems to work and I guess it wouldn't otherwise, but I am just checking because I don't know what the error looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to set value_name to "Value", you can just go straight to checking if "Value" is in value_cols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickPapONS - Changed to loop through column titles in value_cols
instead of only matching "Value" in ..
@SarahJohnsonONS - I think we need to assign a default to value_name
because its only if Value
is a title in value_cols
do we assign a new string to value_name
but we pass value_name
regardless to melt()
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy that the comments have been addressed, will give approval with the knowledge of the follow-up ticket as well.
Addresses #856 in which when a column is titled
Value
and we need to use the pandasmelt()
function such as when callingtransform_dataset_to_canonical_shape()
which causes an error because pandas does not allow a melted dataframe to have a column namedValue
.This ticket adds a rather hacky workaround in which we catch any columns called
Value
and give the melt function avalue_name
ofNot-Value
(which is then used in the melted df). We then rename theNot-Value
column from the melted df and pass this back.Added a unit test to verify this functionality. Compares column names from the original dataframe are a subset of the ones in the melted df.