Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] String columns read lazily from readr error when transferred to an arrow table #43349

Closed
jonkeane opened this issue Jul 20, 2024 · 3 comments
Assignees
Milestone

Comments

@jonkeane
Copy link
Member

Describe the bug, including details regarding any error messages, version, and platform.

On dev, here is a reprex:

arrow::as_arrow_table(
    readr::read_csv(readr::readr_example("chickens.csv"), lazy = TRUE), 
    sink = tempfile(fileext=".parquet")
)
#> Rows: 5 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): chicken, sex, motto
#> dbl (1): eggs_laid
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Error in Table__from_dots(dots, schema, option_use_threads()): No Set_elt found for ALTSTRING class [class: vroom_chr, pkg: vroom]

This wasn't a problem in our last release:

arrow::as_arrow_table(
    readr::read_csv(readr::readr_example("chickens.csv"), lazy = TRUE), 
    sink = tempfile(fileext=".parquet")
)
#> Rows: 5 Columns: 4
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): chicken, sex, motto
#> dbl (1): eggs_laid
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Table
#> 5 rows x 4 columns
#> $chicken <string>
#> $sex <string>
#> $eggs_laid <double>
#> $motto <string>
#> 
#> See $metadata for additional Schema metadata

Component(s)

R

@thisisnic
Copy link
Member

thisisnic commented Jul 27, 2024

Hey, I think y'all already know this from reading the PR but I didn't understand it fully myself so wanted to add this reprex and clarify for my own understanding: this is not related to readr and is just altrep stuff, right? @jonkeane

library(arrow)
tf <- tempfile()
write.csv(mtcars, tf)
mtarrow <- read_csv_arrow(tf)

# exclude the line below and we don't have a problem
mtarrow
#>                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#> 1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 5    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#> 6              Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#> 7           Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#> 8            Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> 9             Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 10            Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#> 11           Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#> 12          Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
#> 13          Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
#> 14         Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
#> 15  Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
#> 16 Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
#> 17   Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
#> 18            Fiat 128 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> 19         Honda Civic 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> 20      Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
#> 21       Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 22    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
#> 23         AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
#> 24          Camaro Z28 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
#> 25    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
#> 26           Fiat X1-9 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
#> 27       Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
#> 28        Lotus Europa 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> 29      Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> 30        Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
#> 31       Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
#> 32          Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
arrow_table(mtarrow)
#> Error in Table__from_dots(dots, schema, option_use_threads()): ALTSTRING objects of type <arrow::array_string_vector> are immutable

Created on 2024-07-27 with reprex v2.1.1

@jonkeane
Copy link
Member Author

jonkeane commented Jul 27, 2024

Aaah that's a nice even more minimal reprex. I might pull that over so we don't need to add readr to our dependencies.

I had actually assumed our altrep vectors would be ok because they already had the ELT stuff, but clearly not!

jonkeane added a commit that referenced this issue Jul 27, 2024
### Rationale for this change

To resolve the reverse dependency issue with `parquetize`

### What changes are included in this PR?

One step towards resolving the issue

### Are these changes tested?

yes

### Are there any user-facing changes?

no
* GitHub Issue: #43349

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
@jonkeane jonkeane added this to the 18.0.0 milestone Jul 27, 2024
@jonkeane
Copy link
Member Author

Issue resolved by pull request 43351
#43351

jonkeane added a commit that referenced this issue Jul 27, 2024
### Rationale for this change

To resolve the reverse dependency issue with `parquetize`

### What changes are included in this PR?

One step towards resolving the issue

### Are these changes tested?

yes

### Are there any user-facing changes?

no
* GitHub Issue: #43349

Authored-by: Jonathan Keane <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants