Use double precision in threaded calculation of linear tree coefficients (fixes #5226) #5368

btrotta · 2022-07-12T08:40:19Z

When calculating the linear tree coefficients, we need to calculate some matrix products. This calculation is multi-threaded for efficiency. However, floating-point addition can give slightly different results depending on the order terms are added, and this was causing the calculated coefficients to vary depending on the number of threads. I've resolved this by making these matrices double precision instead of single, so the inaccuracies are less significant. This will not use much additional memory, since the size of these matrices is only O(num_features ^ 2).

I also added a preprocessor directive to make Eigen calls single-threaded (since Eigen is always called inside a for-loop that is already parallelized.

jameslamb

Thanks so much for fixing this! I left a few initial suggestions for your consideration.

jameslamb · 2022-07-12T14:05:23Z

CMakeLists.txt

@@ -109,6 +109,7 @@ include_directories(${EIGEN_DIR})

 # See https://gitlab.com/libeigen/eigen/-/blob/master/COPYING.README
 add_definitions(-DEIGEN_MPL2_ONLY)
+add_definitions(-DEIGEN_DONT_PARALLELIZE)


Can you please add this to the flags used by the R package as well?

LightGBM/R-package/configure.ac

Line 38 in 44fe591

LGB_CPPFLAGS="${LGB_CPPFLAGS} -DEIGEN_MPL2_ONLY"

LightGBM/R-package/configure.win

Line 22 in 44fe591

LGB_CPPFLAGS="${LGB_CPPFLAGS} -DEIGEN_MPL2_ONLY"

This change seems to be causing CI to fail; I think the problem is that I need to regenerate the configure file in R-package. Is there a way to do this on Windows?

For non-Windows, we have a comment-triggered CI job that will update configure based on changes to configure.ac. I'll trigger that job right now.

For configure.win, nothing needs to be regenerated. configure.win is executed directly, instead of being used as a template.

These things are documented at https://github.com/microsoft/LightGBM/tree/master/R-package#changing-the-cran-package but that README is fairly large so it's easy to miss.

jameslamb · 2022-07-12T14:09:37Z

tests/python_package_test/test_engine.py

+    fd = FileLoader(EXAMPLES_DIR / 'binary_classification', 'binary',
+                    'train_linear.conf')


Would you consider just re-defining EXAMPLES_DIR at the top of this file + using load_breast_cancer() to get a binary classification dataset, the way that other tests in this file do?

I worry that having one test file import from another could cause issues for pytest in the future (even though right now I don't notice any). There are not any other places in this project's Python tests today where one test_*.py file imports from another.

I've changed it to use some randomly generated data (I was unable to reproduce the failure with the breast cancer dataset).

Ah ok, I didn't realize that it could be dataset-specific. Sorry for creating extra work for you! I guess we could have also avoided "test file importing another test file" by moving FileLoader to utils.py. That would probably good to do anyway (in a separate PR).

Anyway, if the new randomly-generated data is sufficient to reproduce the underlying issue fixed by this PR, I'm good with it!

jameslamb · 2022-07-13T15:11:17Z

/gha run r-configure

jameslamb · 2022-07-13T15:20:50Z

hmmm the run-configure job failed (build link)

Error: fatal: couldn't find remote ref refs/heads/linear-threading

I'll try one more run, and if it still fails I'll look into this later. Sorry for the disruption 😭

jameslamb · 2022-07-13T15:20:59Z

/gha run r-configure

jameslamb · 2022-07-13T16:10:27Z

I see the issue! I believe that updating R-package/configure on a PR from a fork doesn't currently work today. I've documented that in #5371 and will work on a fix, but it doesn't need to block this PR.

@btrotta I just regenerated configure locally using the dockerized steps mentioned at https://github.com/microsoft/LightGBM/tree/master/R-package#changing-the-cran-package. I tried to push those changes to your branch (which I thought I could do as a maintainer here), but unfortunately I got a "permission denied".

Thankfully, autoconf only generated a change on one line.

Can you please change this line

https://github.com/btrotta/LightGBM/blob/32f564c26017fbae7c78663046dffd7ca0eec177/R-package/configure#L1716

to

LGB_CPPFLAGS="${LGB_CPPFLAGS} -DEIGEN_MPL2_ONLY -DEIGEN_DONT_PARALLELIZE"

Sorry again for the disruption.

jameslamb · 2022-07-13T16:11:36Z

One other thing... as a maintainer you have permissions to push branches directly to LightGBM instead of using your fork. I recommend doing that in the future.

btrotta · 2022-07-16T00:16:09Z

@jameslamb Thanks for your help, I've updated R-package/configure now

guolinke

Thank you!

btrotta · 2022-07-24T06:13:41Z

@jameslamb If you're happy with the changes, could you please approve?

StrikerRUS

Thanks a lot for the fix!

jameslamb · 2022-07-25T00:31:10Z

If you're happy with the changes, could you please approve?

I'm leaving my "request changes" review and not re-reviewing until the issue that has been LightGBM's CI for the last few days (#5362 (comment)) is fixed. I don't want this to be merged until the CI is fixed and we run it over these changes one more time.

Sorry for the delay. Hopefully that issue will be fixed soon.

btrotta · 2022-07-25T08:24:15Z

@jameslamb Ok, no problem.

jameslamb · 2022-07-28T17:10:55Z

Ok @btrotta our CI concerns have been resolved (#5388).

Since I don't have access to push to your fork...could you please merge latest master into this branch? Once you do that and CI passes, I'll merge this fix.

Sorry for the delay, and thanks for your patience.

shiyu1994 · 2022-07-29T04:21:16Z

Maybe we can close and reopen this PR to rerun the ci? Since the ci tests are run with the merged version automatically.

jameslamb · 2022-07-29T04:36:33Z

Maybe we can close and reopen this PR to rerun the ci? Since the ci tests are run with the merged version automatically.

I personally prefer to merge the target branch (in this repo, master) into PRs directly as a way to trigger CI, instead of just closing and re-opening, in situations like this one where the PR has changes to a central part of the codebase and when there are several other active PRs touching other related parts of the codebase.

To avoid this situation:

branch-1 and branch-2 are created off of master at the same time
changes made on branch-1 are proposed as PR-1
changes made on branch-2 are proposed as PR-2
CI passes on both PR-1 and PR-2
PR-1 is merged, then PR-2 is merged without re-running CI on PR-2
master is broken because the changes in PR-1 and PR-2 are incompatible (for example, PR-2 removes a function that was referenced by the code in PR-1)

If CI passes for this particular PR I'm ok with merging it to keep making progress on all the other PRs, but please consider that for the future.

shiyu1994 · 2022-07-29T16:04:14Z

@jameslamb Thank you. I will follow this rule in the future.

shiyu1994 · 2022-07-29T16:05:08Z

It seems that all ci tests are passed. Maybe we can cancel the change request now?

jameslamb · 2022-07-29T16:08:19Z

Maybe we can cancel the change request now?

oh yep, sorry about that! Meant to come back and approve yesterday.

jameslamb

🎉 thanks again for the help @btrotta !

jameslamb · 2022-07-29T16:13:48Z

I will follow this rule in the future.

No problem! I've found it helpful. There is definitely a trade-off there. Especially since our CUDA CI jobs take 1-2 hours to run and we can only have a single job running at a time across the whole repo, extra CI runs can slow down development in the whole project.

My approach is:

if I need to re-trigger CI anyway, merge latest master and use that as a way to re-trigger it
if a PR recently passed CI and has very small, non-functional changes such as documentation fixes, just merge it without rebuilding

So, for example, once #5384 builds successfully, I'll just merge it even though this PR was merged to master after that build started. Since #5384 just contains documentation changes.

github-actions · 2023-08-19T03:34:29Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

btrotta added 3 commits July 12, 2022 18:24

Use double for model coefficient matrices

a1099e1

Don't parallelize Eigen calls

4f4990f

Merge branch 'master' into linear-threading

97d211d

btrotta requested review from guolinke, StrikerRUS, shiyu1994, jameslamb and jmoralez as code owners July 12, 2022 08:40

Remove n_jobs workaround

edd6864

jameslamb added the fix label Jul 12, 2022

jameslamb requested changes Jul 12, 2022

View reviewed changes

btrotta added 4 commits July 13, 2022 18:24

Update preprocessor directives for R config

49f15c3

Use random data in test

27a3066

Add comment

60335e0

Set random seed

32f564c

jameslamb mentioned this pull request Jul 13, 2022

[ci] r-configure comment job fails if PR branch is from a fork #5371

Closed

Update R config

1536569

guolinke approved these changes Jul 23, 2022

View reviewed changes

StrikerRUS approved these changes Jul 24, 2022

View reviewed changes

shiyu1994 closed this Jul 29, 2022

shiyu1994 reopened this Jul 29, 2022

jameslamb self-requested a review July 29, 2022 16:08

jameslamb approved these changes Jul 29, 2022

View reviewed changes

jameslamb merged commit 44d3718 into microsoft:master Jul 29, 2022

StrikerRUS mentioned this pull request Jul 30, 2022

Fix potential overflow in linear trees #5395

Merged

jameslamb mentioned this pull request Aug 4, 2022

feature: Add true streaming APIs to reduce client-side memory usage #5299

Merged

jameslamb mentioned this pull request Oct 7, 2022

[DO NOT MERGE] Release v3.3.3 #5525

Closed

40 tasks

github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use double precision in threaded calculation of linear tree coefficients (fixes #5226) #5368

Use double precision in threaded calculation of linear tree coefficients (fixes #5226) #5368

btrotta commented Jul 12, 2022

jameslamb left a comment

jameslamb Jul 12, 2022

btrotta Jul 13, 2022

btrotta Jul 13, 2022

jameslamb Jul 13, 2022

jameslamb Jul 12, 2022

btrotta Jul 13, 2022

jameslamb Jul 13, 2022

jameslamb commented Jul 13, 2022

jameslamb commented Jul 13, 2022

jameslamb commented Jul 13, 2022

jameslamb commented Jul 13, 2022

jameslamb commented Jul 13, 2022 •

edited

Loading

btrotta commented Jul 16, 2022

guolinke left a comment

btrotta commented Jul 24, 2022

StrikerRUS left a comment

jameslamb commented Jul 25, 2022

btrotta commented Jul 25, 2022

jameslamb commented Jul 28, 2022

shiyu1994 commented Jul 29, 2022

jameslamb commented Jul 29, 2022

shiyu1994 commented Jul 29, 2022

shiyu1994 commented Jul 29, 2022

jameslamb commented Jul 29, 2022

jameslamb left a comment

jameslamb commented Jul 29, 2022

github-actions bot commented Aug 19, 2023

		fd = FileLoader(EXAMPLES_DIR / 'binary_classification', 'binary',
		'train_linear.conf')

Use double precision in threaded calculation of linear tree coefficients (fixes #5226) #5368

Use double precision in threaded calculation of linear tree coefficients (fixes #5226) #5368

Conversation

btrotta commented Jul 12, 2022

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb Jul 12, 2022

Choose a reason for hiding this comment

btrotta Jul 13, 2022

Choose a reason for hiding this comment

btrotta Jul 13, 2022

Choose a reason for hiding this comment

jameslamb Jul 13, 2022

Choose a reason for hiding this comment

jameslamb Jul 12, 2022

Choose a reason for hiding this comment

btrotta Jul 13, 2022

Choose a reason for hiding this comment

jameslamb Jul 13, 2022

Choose a reason for hiding this comment

jameslamb commented Jul 13, 2022

jameslamb commented Jul 13, 2022

jameslamb commented Jul 13, 2022

jameslamb commented Jul 13, 2022

jameslamb commented Jul 13, 2022 • edited Loading

btrotta commented Jul 16, 2022

guolinke left a comment

Choose a reason for hiding this comment

btrotta commented Jul 24, 2022

StrikerRUS left a comment

Choose a reason for hiding this comment

jameslamb commented Jul 25, 2022

btrotta commented Jul 25, 2022

jameslamb commented Jul 28, 2022

shiyu1994 commented Jul 29, 2022

jameslamb commented Jul 29, 2022

shiyu1994 commented Jul 29, 2022

shiyu1994 commented Jul 29, 2022

jameslamb commented Jul 29, 2022

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Jul 29, 2022

github-actions bot commented Aug 19, 2023

jameslamb commented Jul 13, 2022 •

edited

Loading