Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building a rust binding R package depends on this r-polars #808

Open
jshinonome opened this issue Feb 12, 2024 · 24 comments
Open

Building a rust binding R package depends on this r-polars #808

jshinonome opened this issue Feb 12, 2024 · 24 comments
Labels
enhancement New feature or request

Comments

@jshinonome
Copy link

jshinonome commented Feb 12, 2024

First of all, thank you very much for this great R Polars project.

I would like to implement a db connection rust binding R package and convert data to polars directly and hence r-polars, but I don't know how to access r-polars robj. e.g. py-polars has https://github.com/pola-rs/pyo3-polars, which allows me to access python polars object. I was struggling to access r-polars series and dataframe from rust code, keep getting "invalid permission" or "memory not mapped" error.

Any suggestion is appreciated.

I defined a same RPolarsDataFrame struct, but apparently, this doesn't work.

if any.inherits("RPolarsDataFrame") {
        let df: ExternalPtr<RPolarsDataFrame> = any.try_into().unwrap();
        Ok(K::DataFrame(df.0.clone()))
    }
@eitsupi
Copy link
Collaborator

eitsupi commented Feb 13, 2024

Here is https://github.com/rpolars/extendr_polars written by @sorhawell, but I don't know if it currently works as it is not tested and unmaintained.

See also #732 and #776 (comment)

Ideally, I think polars::DataFrame (Rust) should implement the Arrow C Stream interface and use it for each language's binding. (pola-rs/polars#14208)

@eitsupi eitsupi added the enhancement New feature or request label Feb 13, 2024
@jshinonome
Copy link
Author

Thanks for the answers @eitsupi . I will try the repo above

@jshinonome jshinonome changed the title Any suggestions on build a rust binding R package depends on this r-polars Building a rust binding R package depends on this r-polars Feb 13, 2024
@eitsupi
Copy link
Collaborator

eitsupi commented Feb 14, 2024

Contributions to https://github.com/rpolars/extendr_polars or this repository are welcome!

@eitsupi
Copy link
Collaborator

eitsupi commented May 6, 2024

When #1078 is merged, input/output through the Arrow C stream interface of Series and DataFrame will be supported, so it should be possible to read Series and DataFrame with nanoarrow_array_stream as an intermediate format.
See also https://github.com/JosiahParry/arrow-extendr.

@ju6ge
Copy link
Contributor

ju6ge commented May 31, 2024

In the near term there is also a very easy way to make this library publishable on crates.io. All you would really needed to do to make this possible is to stop using git imports in your Cargo.toml. That way you could just publish the rust part of the code as a crate and therefor eliminate the need for https://github.com/rpolars/extendr_polars. Which is just a very hacky solution to a problem that should just not exist. The main blocker here is not polars but the used the extendr crate. There have been quite a few features added to main but not published as a version.

@ju6ge
Copy link
Contributor

ju6ge commented May 31, 2024

Oh and allowing the library to also still be built as rust library instead of staticlib only should also help.

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 2, 2024

Thanks for your comment. However, I don't think there is a need to publish it to Crates.io.
For example, look at something like https://github.com/apache/datafusion-ballista/blob/04766d5fd7cbb15d583f736979c11de06c720928/python/Cargo.toml#L40.

I don't have the bandwidth to update extendr_polars, so I'd appreciate it if someone could work on updating it.

@ju6ge
Copy link
Contributor

ju6ge commented Jun 3, 2024

Well if you do not have the time to update extendr_polars then that is one good reason to publish r-polars instead. That would make extendr_polars obsolete.

But there are more reasons to do so! Let's say that somebody wants to write a rust library and export it via ffi to R. If this library has functions that expect or return a polars data frame then the function signatures need to use RPolarsDataFrame which comes from this crate. The best way to achieve this is to use r-polars directly. Currently this requires a git import and patching, because it is not published and only staticlibrary builds are enabled. Effectively this makes the rust part of this code R only. That is just not very useful when building on top of something and there really is no reason I can see that this is beneficial or required.

Allowing rust git import of this crate would be an easy first step to enable broader usage. The only change required for this is to update the supported crate-types:

[lib]
crate-type = ['staticlib', 'rlib']

This still taints any code on top to not be publishable on crates.io because of the git import. But it is better than the current state. To actually make this publishable all git imports need to be removed. Using the latest stable release of r-polars (0.16.4) removing the git import of upstream polars is a drop in replacement of the git import to polars = { version = "0.39.2", … }. For the main branch this is not working because 0.40.0 has some breaking changes, but they will need to be addressed in the future anyway. And I don't really think that using git versions of polars will make the transition easier in the long run. The only real blocker to this is extendr which has not had a stable release in a while, but that is just a question of time.

I hope you reconsider your stance on not publishing this crate, or at least enable building other R libraries on top of r-polars using rust.

@etiennebacher
Copy link
Collaborator

etiennebacher commented Jun 3, 2024

Hi @ju6ge, it seems you have a clear view of what needs to be done in this repo to make the Rust crate more usable by other packages. This is not my case since I'm not fluent in Rust. Given that @eitsupi's time is limited, could you maybe open a PR so that we can move forward on this issue?

@ju6ge
Copy link
Contributor

ju6ge commented Jun 3, 2024

@etiennebacher sure thing. I would propose the following way of moving forward.

Merge Request crate-type

  1. Update supported crate types, this one is trivial as described above. This update would be sufficient to allow other rust libraries to export R bindings with polars types in functions, via git import of the code.

Merge Request polars-0.40.0
2. Updating the dependencies to stable polars versions is more involved. I assume it would not be desirable to crate a new release of 0.16.4 with a stable polars version 0.39.2. I think it would be better to create a MR for the next release that updates the polars version to 0.40.0. All subsequent polars version updates should then always use stable polars versions. I am not sure if I am the right person for this job because i am not very deep into polars internals or r-polars. I am trying to build on top, but I am a rust developer so I can give it a shot ;)

Tracking Issue stable extendr
3. Given that the main blocker to publishing this lib as a rust library is that a new version of extendr needs to be released I would create an issue to keep track of it. As soon as a new version of extendr is released, the Issue can be closed with a MR updating the dependency.

Once 3 is completed publishing the crate to crates.io only requires a cargo publish in src/rust. Therefor enabling any downsteam libray using r-polars to also be published.

I have tested the procedure using my own private crate registry, so I am very confident that this will work.

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 3, 2024

One additional point: this repository already depends on commits prior to polars-0.40.0.
0.40.0 had a fatal bug that prevented it from being an R package unless the prior commits were included
polars is mainly developed for the Python package, and the Python packages always depend on the Polars crate on GitHub, not Polars on crates.io, so we are convinced that it is very difficult to depend on Polars on crates.io.
So I don't think it is worth it to publish to crates.io with such disadvantages.

I don't see the benefit of publishing to crates.io, as it is not at all a problem to rely on Rust crates on GitHub when using Rust source code in an R package.
(If submitted to CRAN, downloads from crates.io are not allowed as like downloads from GitHub, and all crates must be vendored)

@etiennebacher
Copy link
Collaborator

etiennebacher commented Jun 3, 2024

Thanks for this detailed plan! This looks good to me, feel free to open a PR for 1) as it's the easiest one.

Regarding 2), this might take some time (which is not a big deal since we have to wait for extendr to address 3) anyway). We updated to rust-polars 0.40.0 a couple of weeks ago but we decided to go forward and to include a more bleeding-edge version of rust-polars since there were many important fixes added right after 0.40.0. Therefore, we'll see when rust-polars 0.40.1 or 0.41.0 is released, which could take a month or more.


Edit: given @eitsupi message above (published at the same time), maybe wait before proceeding with 1)

@etiennebacher
Copy link
Collaborator

polars is mainly developed for the Python package, and the Python packages always depend on the Polars crate on GitHub, not Polars on crates.io, so we are convinced that it is very difficult to depend on Polars on crates.io.

Not sure I understand @eitsupi. Most of the time, we use the Github commit of rust-polars which is associated to the Github release (and therefore to the new version on crates.io). There were a few occasions (like the current state of the repo) where we use a commit that is not associated with a github release but that is rare. So in most occasions, what we do is equivalent to using the crates.io versions, no?

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 3, 2024

So in most occasions, what we do is equivalent to using the crates.io versions, no?

Yes, of course it is. But I think the patch is actually applied when it is released to crates.io.
I have tried in the past to reference crates.io instead of the GitHub commit, but for some reason I made it into the build. In short, I don't think polars on crates.io are guaranteed to be available.

@ju6ge
Copy link
Contributor

ju6ge commented Jun 3, 2024

@eitsupi I do not understand the sentiment that polars would be unavailible on crates.io. I have a code that depends on polars for at least two years now and is has not been a problem. If you always want the newest commit then of course they are not availible on crates.io. But developers version their releases for a reason and stable versions are always available on crates.io for downstream users of a library it make a lot of sense to use the stable version. If a new version breaks something critical nobody forces you to upgrade until it is fixed.

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 3, 2024

@ju6ge Most of Polars tests are written on the Python side, and I feel that Rust Polars is not stable at all.

I think this is evident from the fact that Python Polars will soon be 1.0.0, whereas most developers do not pay attention to the version of Rust Polars (Breaking changes unmarked breaking changes are frequently introduced).

Also, extendr and libR-sys are rarely released on crates.io, so I think it would be very hard for developers to stick to the version on crates.io.

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 3, 2024

Again, I don't understand the need to register r-polars on crates.io even if all dependencies can be downloaded from crates.io.
What's wrong with downloading from GitHub?

@ju6ge
Copy link
Contributor

ju6ge commented Jun 3, 2024

@eitsupi It is about the taint that is introduced because of the git import. Any library that is written using a git import can not be published. So if I write a library that has an optional feature flag to enable r-bindings and want to export RPolarsDataFrames. If I import r-polars via git I am prohibited from publishing it even if all other code works just with stable versions just fine. So this is a transitive requirement that gets in the way. Crates.io is the way all rust dependencies are distributed officially.

I agree that it might be annoying, if rust polars abi is unstable it does not matter if you use the git version or the crates.io version. The amount of work to stay in sync will be the same in the long run.

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 3, 2024

Crates.io is the way all rust dependencies are distributed officially.

In the Rust world, yes, but not in the CRAN-centered R world.
https://cran.r-project.org/web/packages/using_rust.html

In the R world, I don't see the value in releasing to crates.io at this time.

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 3, 2024

I'm sure I've argued the same thing many times, but I welcome contributions to extendr-polars.
Why not just update extendr-polars with a dependency on polars on crates.io like pyo3-polars?

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 3, 2024

I agree that it might be annoying, if rust polars abi is unstable it does not matter if you use the git version or the crates.io version. The amount of work to stay in sync will be the same in the long run.

It is a really painful task to fix a large number of breaking changes in Rust Polars in one PR, which is released only once a month.
So I definitely do not want to commit to that.

@ju6ge
Copy link
Contributor

ju6ge commented Jun 3, 2024

I'm sure I've argued the same thing many times, but I welcome contributions to extendr-polars. Why not just update extendr-polars with a dependency on polars on crates.io like pyo3-polars?

So you are arguing it is easier to update two seperate code bases instead of a single codebase?

All that is needed to export to R from rust is already here. extendr_polars even depends on r-polars by requiring a working R session with r-polars installed.

It is a really painful task to fix a large number of breaking changes in Rust Polars in one PR, which is released only once a month. So I definitely do not want to commit to that.

I get that, and I get your time is limited. I do also want to express my respect for the work that has gone into this project already.

@eitsupi
Copy link
Collaborator

eitsupi commented Jun 3, 2024

So you are arguing it is easier to update two seperate code bases instead of a single codebase?

Good point.
Ideally, I would prefer to move the extendr hack in r-polars to extendr-polars and have r-polars depend on extendr-polars.
However, it takes a lot of work to do that, so for now we feel it is easier to put a copy in extendr-polars.

In long-term, I expect the codebase to be simpler with savvy than with extendr. (#1126)

Currently there is code on the Rust and R sides to use the Result type on extendr, and these are needed for extendr-polars (or something like that).

@ju6ge
Copy link
Contributor

ju6ge commented Jun 6, 2024

I just created #1135. Which addresses enabling downstream projects to import r-polars as a rust dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants