Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot use 'Value' as the column name for observation column #857

Merged
merged 4 commits into from
Jul 27, 2023

Conversation

CharlesRendle
Copy link
Contributor

@CharlesRendle CharlesRendle commented Jul 24, 2023

Addresses #856 in which when a column is titled Valueand we need to use the pandas melt() function such as when calling transform_dataset_to_canonical_shape() which causes an error because pandas does not allow a melted dataframe to have a column named Value.

This ticket adds a rather hacky workaround in which we catch any columns called Value and give the melt function a value_name of Not-Value (which is then used in the melted df). We then rename the Not-Value column from the melted df and pass this back.

Added a unit test to verify this functionality. Compares column names from the original dataframe are a subset of the ones in the melted df.

@CharlesRendle CharlesRendle changed the title Hacky fix for pandas 2.0 "Value" col melt() bug Cannot use 'Value' as the column name for observation column Jul 24, 2023
@CharlesRendle CharlesRendle changed the title Cannot use 'Value' as the column name for observation column [BUG] Cannot use 'Value' as the column name for observation column Jul 24, 2023
@github-actions
Copy link

github-actions bot commented Jul 24, 2023

ubuntu-latest-python3.11-pandas@latest test results

574 tests  +1   574 ✔️ +1   4m 20s ⏱️ +23s
  10 suites ±0       0 💤 ±0 
  10 files   ±0       0 ±0 

Results for commit 9174754. ± Comparison against base commit eea06d7.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jul 24, 2023

ubuntu-latest-python3.9- test results

574 tests   574 ✔️  4m 3s ⏱️
  10 suites      0 💤
  10 files        0

Results for commit 9174754.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jul 24, 2023

windows-latest-python3.11-pandas@latest test results

  11 files  ±0    12 suites  ±0   8m 51s ⏱️ + 1m 38s
588 tests +1  588 ✔️ +1  0 💤 ±0  0 ±0 
601 runs  +1  601 ✔️ +1  0 💤 ±0  0 ±0 

Results for commit 9174754. ± Comparison against base commit eea06d7.

♻️ This comment has been updated with latest results.

@CharlesRendle CharlesRendle marked this pull request as ready for review July 25, 2023 08:03
Copy link
Contributor

@NickPapONS NickPapONS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may find it "hacky", but I can't think of a simpler or better way to deal with this very specific bug! Just a couple of questions and something to clean up, then should be good for approval.

src/csvcubed/utils/csvdataset.py Show resolved Hide resolved
# parameter passed to the melt function to "Not-Value" so that we don't
# trigger a pandas ValueError.
value_name = "Value"
if "Value" in value_cols:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the pandas ValueError trigger if the column title contains "Value" (e.g. if "Value" in), or only if the title is "Value" exactly (== "Value)? I am assuming it is the latter, since this seems to work and I guess it wouldn't otherwise, but I am just checking because I don't know what the error looks like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to set value_name to "Value", you can just go straight to checking if "Value" is in value_cols

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickPapONS - Changed to loop through column titles in value_cols instead of only matching "Value" in ..

@SarahJohnsonONS - I think we need to assign a default to value_name because its only if Value is a title in value_cols do we assign a new string to value_name but we pass value_name regardless to melt()

tests/unit/inspect/test_inspectdatasetmanager.py Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Jul 26, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

0.0% 0.0% Coverage
0.0% 0.0% Duplication

Copy link
Contributor

@NickPapONS NickPapONS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy that the comments have been addressed, will give approval with the knowledge of the follow-up ticket as well.

@CharlesRendle CharlesRendle merged commit b886e59 into main Jul 27, 2023
14 checks passed
@CharlesRendle CharlesRendle deleted the 856-melt-bug branch July 27, 2023 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants