Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not allow NA as column name #316

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jmbarbone
Copy link
Contributor

@jmbarbone jmbarbone commented Jan 11, 2022

Resolves #292

@JanMarvin do you know how we can tell if the NA is converted to "NA" in the sheet data?

The change is in writeData so it should have an effect in buildWorkbook(). Just couldn't get my head around what writeData() and Workbook$writeData() were doing.

@codecov-commenter
Copy link

codecov-commenter commented Jan 11, 2022

Codecov Report

Merging #316 (19f97da) into master (f4bf841) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #316   +/-   ##
=======================================
  Coverage   67.70%   67.71%           
=======================================
  Files          34       34           
  Lines        8897     8898    +1     
=======================================
+ Hits         6024     6025    +1     
  Misses       2873     2873           
Impacted Files Coverage Δ
R/writeData.R 88.39% <100.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e44c5e...19f97da. Read the comment docs.

@JanMarvin
Copy link
Collaborator

JanMarvin commented Jan 11, 2022

You just want to know, if the string is imported as "NA"? If written, is is converted into a sharedString (<sst><si /></sst> in openxml). Therefore you could check if wb$sharedString contains "<si><t xml:space=\"preserve\">NA</t></si>".

The position in the shared string is a 0 index, matching the value of v + 1 in wb$worksheets[[1]]$sheet_data$v where the type is 1 in wb$worksheets[[1]]$sheet_data$t.

I know that feeling in writeData and friends. The construction of the shared string and the lists are deeply embedded in openxlsx and make it near impossible to really implement things like inlineStr.

@JanMarvin
Copy link
Collaborator

You could look into make.names() to create unique names, it is used in the package at various places, otherwise you might end up with multiple "NA"s which brings other problems.

@jmbarbone
Copy link
Contributor Author

You could look into make.names() to create unique names, it is used in the package at various places, otherwise you might end up with multiple "NA"s which brings other problems.

Considered that. Not sure if it's that's to best. There's no requirement for data in sheets to be unique, so the "NA" seemed like a less invasive fix. make.names() is used for the reading, which makes it difficult to test this (thought I could make the workbook and then read it to make sure it was okay). Technically you can still have a data.frame() in R without unique names, it's just a bad idea:

x <- data.frame(x = 1)
x <- cbind(x, x)
x
#>   x x
#> 1 1 1
str(x)
#> 'data.frame':    1 obs. of  2 variables:
#>  $ x: num 1
#>  $ x: num 1

Created on 2022-01-11 by the reprex package (v2.0.1)

I've seen plenty of workbooks with repeated column names. Maybe something a little nicer?

x <- c(NA, "a", "b", NA, "b", "a", NA)

make.names(x)
#> [1] "NA." "a"   "b"   "NA." "b"   "a"   "NA."
make.names(x, unique = TRUE)
#> [1] "NA."   "a"     "b"     "NA..1" "b.1"   "a.1"   "NA..2"

clean_na_names <- function(x) {
  w <- which(is.na(x))
  for (i in w) {
    x[w] <- paste0("NA..", w)
  }
  x
}

clean_na_names(x)
#> [1] "NA..1" "a"     "b"     "NA..4" "b"     "a"     "NA..7"

Created on 2022-01-11 by the reprex package (v2.0.1)

What I'm worried about is disharmony between reading and writing though. Maybe warnings/controls for reading a workbook as a data.frame with duplicated column names?

@github-actions
Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Corrupt Excel file when column name is NA
3 participants