Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Arrow by flag + arrow docs #345

Merged
merged 7 commits into from
Jul 19, 2022
Merged

Enable Arrow by flag + arrow docs #345

merged 7 commits into from
Jul 19, 2022

Conversation

niclashedam
Copy link
Collaborator

As promised in PR #331: Some docs for setting up Arrow and compiling it.
Is there anything else I should add or is this sufficient?

@niclashedam niclashedam added the doc About some form of documentation. Could be doxygen comments in the source, user documentation, etc. label Apr 29, 2022
@niclashedam niclashedam requested a review from pdamme April 29, 2022 12:48
@niclashedam niclashedam self-assigned this Apr 29, 2022
Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @niclashedam, thanks for writing it up! I can confirm that it works (with slight adjustments, see below). The changes in build.sh look good to me. However, I have a few comments on GettingStarted.md:

  1. I had to install libboost-dev first (by sudo apt install libboost-dev) to make it build. We should mention libboost-dev as an optional dependency (if Arrow is desired) in the table in GettingStarted.md.
  2. I had to use sudo make install instead of make install for Arrow.
  3. Cloning the latest commit of the Arrow repo is a bit fragile, since any change they make could break our code. Could we use a particular release of theirs?
  4. Hitherto, we either (a) mention libs/tools as dependencies in the table in GettingStarted.md (we could mention Arrow as optional there, but I also see that the particular instructions need some space), or (b) download and build the dependencies in the build script.
    We should make the instructions on installing Arrow more consistent with what we have so far. I'd say if we can go with a version of Arrow which can be installed via typical package repos, then we could mention Arrow as an optional dependency in the table. But as far as I understand, we need some newer features and therefore must build it from source. In that case, I would recommend integrating the download and build of Arrow into build.sh (to be executed only if --arrow is present). Ideally, we could omit the sudo make install (see point 2) then, so that we don't need admin rights. This would make it much easier for users.

In general, some major changes to the build script are currently in development (see #328). So it's probably easier to wait for those to be merged and extend build.sh afterwards.

Finally, I noticed that testing with Arrow support is not trivial at the moment, because test.sh does not accept --arrow. However, this is a more general problem which can be addressed later.

@EricMier
Copy link
Collaborator

EricMier commented May 6, 2022

Hi @niclashedam,

I also took a look into your changes. As Patrick already mentioned, there are upcomming changes to the script. Your current changes should fit in easily with my changes to the build script, but I can't speak for the upcomming changes of @corepointer of course.

Nevertheless if you are going to add Arrow as dependency to the build script, you already can take a look into this documentation under "Adding a dependency" and the new version of the build script.

Regards,
Eric

@corepointer
Copy link
Collaborator

Even though the changes in #328 and #359 are substantial, the changes from this PR can be adapted easily once the others are merged in. I held back separating fetch & build in #359, so the build script documentation form @EricMier should still be valid.
You could use CMake's -S and -B to avoid the directory changes and please use ninja (which is already a dependency of daphne) instead of make to have the build run at max paralellism automatically. For this use -G Ninja in the first cmake invocation and cmake --build <builddir> in the second one.

@corepointer
Copy link
Collaborator

Hi @niclashedam!
You could be the first to test the quality of the documentation of the build system changes that just landed in the main branch ;-)
Regards, Mark

@niclashedam
Copy link
Collaborator Author

@corepointer I'll give it a shot. I have a deadline next week, but I'll have time to fix this after that. I'll let you know if anything comes up!

@niclashedam
Copy link
Collaborator Author

I have updated the branch to follow the new build system. Please let me know if something needs to be modified.
As for tests, I am not sure how we can run Arrow tests if and only if Arrow is present. If you have an idea, let me know.

I looked into the documentation. I think it is not so pedagogical for developers outside of the consortium. We know how it works because we work with it often, but it is quite different from systems I've worked with before. I also think it needs to specify some edge cases, such as handling optional dependencies. Furthermore, the build script must install some of its artefacts in thirdparty/installed.

@niclashedam
Copy link
Collaborator Author

Have you had the time to look at this, @pdamme @corepointer ? 😄

@pdamme
Copy link
Collaborator

pdamme commented Jul 7, 2022

Hi @niclashedam, sorry for the delay in handling this PR. In fact, PRs have been piling up a bit recently, since I was quite busy with a few other things. Please let us know if this is blocking you. Otherwise, I could probably find some time at the end of next week. In case someone else would like to have a look and test and merge it earlier, that would be fine by me, too.

@niclashedam
Copy link
Collaborator Author

Hi @pdamme. It is not blocking me so no rush. I am just making sure it is not forgotten.
Let me know if I can help review some PRs.

@EricMier
Copy link
Collaborator

EricMier commented Jul 8, 2022

Hey @niclashedam, I took a look and it LGTM!
Your changes build successfully on a fresh clone for with & without the --arrow flag.
I just made a few changes (added one status message and exchanged static paths with predefined path variables).

Is there any particular reason to export BUILD_ARROW ((414)? I see no subscript using this variable.

Besides that I think we can merge your changes in.

@niclashedam
Copy link
Collaborator Author

Hi @EricMier. I assume you are talking about the export on line 414, yes? I don't think it is necessary as long as we forward the variable to CMake, which I believe we do anyways. I will remove it. Thanks!

By the way, do you know how I can invoke the Arrow/Parquet tests when Arrow is enabled? I don't think they are automatically invoked anymore due to the flag.

Copy link
Collaborator

@pdamme pdamme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @niclashedam and @EricMier for continuing the discussion. I've just applied some final adjustments. Most importantly, following @corepointer's advice, Arrow is now built with ninja instead of make, which reduces the build time. LGTM now.

niclashedam and others added 6 commits July 19, 2022 20:29
- Added deletion of Arrow download token with --cleanAll.
- Removed superfluous cd command.
- Changed Arrow build from make to ninja.
  - Should be faster (and is, on my system).
  - More consistent with the other dependencies.
- Mentioned libboost-dev as dependency in GettingStarted.md.
@pdamme pdamme merged commit 52f49d8 into main Jul 19, 2022
@pdamme
Copy link
Collaborator

pdamme commented Jul 19, 2022

Regarding the problem of telling test.sh to execute the Arrow-dependent test cases, I've created issue #414. Feel free to pick it up :) .

@niclashedam niclashedam deleted the docs/arrow branch August 1, 2022 10:53
@niclashedam
Copy link
Collaborator Author

Sounds good! Thanks for your time @pdamme, @corepointer and @EricMier! 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc About some form of documentation. Could be doxygen comments in the source, user documentation, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants