Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] add type hints on Booster.set_network() #4068

Merged
merged 3 commits into from
Mar 15, 2021

Conversation

jameslamb
Copy link
Collaborator

While working on lightgbm.dask, I noticed that the docstring for machines in Booster.set_network() seems to be incorrect.

It says the following:

machines : list, set or string
Names of machines.

However, machines is passed to lightgbm.basic.c_str(), which only accepts strings:

_safe_call(_LIB.LGBM_NetworkInit(c_str(machines),
.

If you pass a set or list, you'll get an error like.

AttributeError: 'set' object has no attribute 'encode'

import lightgbm as lgb
machines = set(["10.1.1.1:12400", "10.1.1.2:12400"])
lgb.basic.c_str(machines)

This PR proposes updating that docstring to make it clear that only strings are supported. It also adds type hints on the other arguments in Booster.set_network() and a return type hint, as part of #3756.

"""Set the network configuration.

Parameters
----------
machines : list, set or string
Copy link
Collaborator

@StrikerRUS StrikerRUS Mar 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding support for behavior from old docstring will be better.
It will be consistent with the case when machines param is passed during Booster creation.

machines = params["machines"]
if isinstance(machines, str):
num_machines_from_machine_list = len(machines.split(','))
elif isinstance(machines, (list, set)):
num_machines_from_machine_list = len(machines)
machines = ','.join(machines)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright no problem

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 14, 2021

Hmm, looks like one more random Dask error in tests.

__ test_network_params_not_required_but_respected_if_given[array-regression] ___

client = <Client: 'tcp://127.0.0.1:41279' processes=2 threads=2, memory=236.37 GB>
task = 'regression', output = 'array', listen_port = 13030

...

        with pytest.raises(lgb.basic.LightGBMError, match=error_msg):
>           dask_model3.fit(dX, dy, group=dg)
E           Failed: DID NOT RAISE <class 'lightgbm.basic.LightGBMError'>

@jameslamb
Copy link
Collaborator Author

pretty strange! here's the full logs

__ test_network_params_not_required_but_respected_if_given[array-regression] ___

client = <Client: 'tcp://127.0.0.1:41279' processes=2 threads=2, memory=236.37 GB>
task = 'regression', output = 'array', listen_port = 13030

    @pytest.mark.parametrize('task', tasks)
    @pytest.mark.parametrize('output', data_output)
    def test_network_params_not_required_but_respected_if_given(client, task, output, listen_port):
        if task == 'ranking' and output == 'scipy_csr_matrix':
            pytest.skip('LGBMRanker is not currently tested on sparse matrices')
    
        if task == 'ranking':
            _, _, _, _, dX, dy, _, dg = _create_ranking_data(
                output=output,
                group=None,
                chunk_size=10,
            )
        else:
            _, _, _, dX, dy, _ = _create_data(
                objective=task,
                output=output,
                chunk_size=10,
            )
            dg = None
    
        dask_model_factory = task_to_dask_factory[task]
    
        # rebalance data to be sure that each worker has a piece of the data
        if output == 'array':
            client.rebalance()
    
        # model 1 - no network parameters given
        dask_model1 = dask_model_factory(
            n_estimators=5,
            num_leaves=5,
        )
        dask_model1.fit(dX, dy, group=dg)
        assert dask_model1.fitted_
        params = dask_model1.get_params()
        assert 'local_listen_port' not in params
        assert 'machines' not in params
    
        # model 2 - machines given
        n_workers = len(client.scheduler_info()['workers'])
        open_ports = [lgb.dask._find_random_open_port() for _ in range(n_workers)]
        dask_model2 = dask_model_factory(
            n_estimators=5,
            num_leaves=5,
            machines=",".join([
                "127.0.0.1:" + str(port)
                for port in open_ports
            ]),
        )
    
        dask_model2.fit(dX, dy, group=dg)
        assert dask_model2.fitted_
        params = dask_model2.get_params()
        assert 'local_listen_port' not in params
        assert 'machines' in params
    
        # model 3 - local_listen_port given
        # training should fail because LightGBM will try to use the same
        # port for multiple worker processes on the same machine
        dask_model3 = dask_model_factory(
            n_estimators=5,
            num_leaves=5,
            local_listen_port=listen_port
        )
        error_msg = "has multiple Dask worker processes running on it"
        with pytest.raises(lgb.basic.LightGBMError, match=error_msg):
>           dask_model3.fit(dX, dy, group=dg)
E           Failed: DID NOT RAISE <class 'lightgbm.basic.LightGBMError'>

../tests/python_package_test/test_dask.py:1086: Failed

That error not being raised means that the input data partitions are all on one worker. I tried to explicitly control for that with

# rebalance data to be sure that each worker has a piece of the data
if output == 'array':
client.rebalance()
and by setting chunk_size=10. But I guess that is not 100% reliable?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@jameslamb jameslamb merged commit dc1bc23 into master Mar 15, 2021
@jameslamb jameslamb deleted the fix/machines-hint branch March 15, 2021 12:15
@StrikerRUS
Copy link
Collaborator

@jameslamb

Hmm, looks like one more random Dask error in tests.

pretty strange!

One more time:

2021-03-23T11:32:00.3599665Z __ test_network_params_not_required_but_respected_if_given[array-regression] ___
2021-03-23T11:32:00.3600265Z 
2021-03-23T11:32:00.3601496Z client = <Client: 'tcp://127.0.0.1:44705' processes=2 threads=2, memory=16.70 GB>
2021-03-23T11:32:00.3602774Z task = 'regression', output = 'array', listen_port = 13030
2021-03-23T11:32:00.3603131Z 
2021-03-23T11:32:00.3603872Z     @pytest.mark.parametrize('task', tasks)
2021-03-23T11:32:00.3604785Z     @pytest.mark.parametrize('output', data_output)
2021-03-23T11:32:00.3605598Z     def test_network_params_not_required_but_respected_if_given(client, task, output, listen_port):
2021-03-23T11:32:00.3606647Z         if task == 'ranking' and output == 'scipy_csr_matrix':
2021-03-23T11:32:00.3607853Z             pytest.skip('LGBMRanker is not currently tested on sparse matrices')
2021-03-23T11:32:00.3608415Z     
2021-03-23T11:32:00.3608814Z         client.wait_for_workers(2)
2021-03-23T11:32:00.3609214Z     
2021-03-23T11:32:00.3609975Z         _, _, _, _, dX, dy, _, dg = _create_data(
2021-03-23T11:32:00.3610490Z             objective=task,
2021-03-23T11:32:00.3610940Z             output=output,
2021-03-23T11:32:00.3611381Z             chunk_size=10,
2021-03-23T11:32:00.3611829Z             group=None
2021-03-23T11:32:00.3612267Z         )
2021-03-23T11:32:00.3612672Z     
2021-03-23T11:32:00.3613153Z         dask_model_factory = task_to_dask_factory[task]
2021-03-23T11:32:00.3613612Z     
2021-03-23T11:32:00.3614090Z         # rebalance data to be sure that each worker has a piece of the data
2021-03-23T11:32:00.3615068Z         if output == 'array':
2021-03-23T11:32:00.3615633Z             client.rebalance()
2021-03-23T11:32:00.3616057Z     
2021-03-23T11:32:00.3616765Z         # model 1 - no network parameters given
2021-03-23T11:32:00.3617345Z         dask_model1 = dask_model_factory(
2021-03-23T11:32:00.3617873Z             n_estimators=5,
2021-03-23T11:32:00.3618366Z             num_leaves=5,
2021-03-23T11:32:00.3618788Z         )
2021-03-23T11:32:00.3619384Z         dask_model1.fit(dX, dy, group=dg)
2021-03-23T11:32:00.3619926Z         assert dask_model1.fitted_
2021-03-23T11:32:00.3627880Z         params = dask_model1.get_params()
2021-03-23T11:32:00.3634170Z         assert 'local_listen_port' not in params
2021-03-23T11:32:00.3635077Z         assert 'machines' not in params
2021-03-23T11:32:00.3635494Z     
2021-03-23T11:32:00.3636048Z         # model 2 - machines given
2021-03-23T11:32:00.3636712Z         n_workers = len(client.scheduler_info()['workers'])
2021-03-23T11:32:00.3637274Z         open_ports = [lgb.dask._find_random_open_port() for _ in range(n_workers)]
2021-03-23T11:32:00.3637781Z         dask_model2 = dask_model_factory(
2021-03-23T11:32:00.3638307Z             n_estimators=5,
2021-03-23T11:32:00.3638916Z             num_leaves=5,
2021-03-23T11:32:00.3639628Z             machines=",".join([
2021-03-23T11:32:00.3640331Z                 "127.0.0.1:" + str(port)
2021-03-23T11:32:00.3641301Z                 for port in open_ports
2021-03-23T11:32:00.3645753Z             ]),
2021-03-23T11:32:00.3646302Z         )
2021-03-23T11:32:00.3646761Z     
2021-03-23T11:32:00.3647176Z         dask_model2.fit(dX, dy, group=dg)
2021-03-23T11:32:00.3647895Z         assert dask_model2.fitted_
2021-03-23T11:32:00.3648388Z         params = dask_model2.get_params()
2021-03-23T11:32:00.3649217Z         assert 'local_listen_port' not in params
2021-03-23T11:32:00.3650342Z         assert 'machines' in params
2021-03-23T11:32:00.3651161Z     
2021-03-23T11:32:00.3651958Z         # model 3 - local_listen_port given
2021-03-23T11:32:00.3652472Z         # training should fail because LightGBM will try to use the same
2021-03-23T11:32:00.3653014Z         # port for multiple worker processes on the same machine
2021-03-23T11:32:00.3653488Z         dask_model3 = dask_model_factory(
2021-03-23T11:32:00.3653902Z             n_estimators=5,
2021-03-23T11:32:00.3654276Z             num_leaves=5,
2021-03-23T11:32:00.3654694Z             local_listen_port=listen_port
2021-03-23T11:32:00.3655075Z         )
2021-03-23T11:32:00.3655481Z         error_msg = "has multiple Dask worker processes running on it"
2021-03-23T11:32:00.3656048Z         with pytest.raises(lgb.basic.LightGBMError, match=error_msg):
2021-03-23T11:32:00.3656573Z >           dask_model3.fit(dX, dy, group=dg)
2021-03-23T11:32:00.3657297Z E           Failed: DID NOT RAISE <class 'lightgbm.basic.LightGBMError'>
2021-03-23T11:32:00.3657622Z 
2021-03-23T11:32:00.3658017Z ../tests/python_package_test/test_dask.py:1060: Failed
2021-03-23T11:32:00.3658754Z ---------------------------- Captured stderr setup -----------------------------
2021-03-23T11:32:00.3659810Z distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
2021-03-23T11:32:00.3660872Z distributed.scheduler - INFO - Clear task state
2021-03-23T11:32:00.3661642Z distributed.scheduler - INFO -   Scheduler at:     tcp://127.0.0.1:44705
2021-03-23T11:32:00.3662618Z distributed.scheduler - INFO -   dashboard at:            127.0.0.1:8787
2021-03-23T11:32:00.3663603Z distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:37381
2021-03-23T11:32:00.3664626Z distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:37381
2021-03-23T11:32:00.3665468Z distributed.worker - INFO -          dashboard at:            127.0.0.1:43161
2021-03-23T11:32:00.3666187Z distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:44705
2021-03-23T11:32:00.3666860Z distributed.worker - INFO - -------------------------------------------------
2021-03-23T11:32:00.3667631Z distributed.worker - INFO -               Threads:                          1
2021-03-23T11:32:00.3668411Z distributed.worker - INFO -                Memory:                    8.35 GB
2021-03-23T11:32:00.3669360Z distributed.worker - INFO -       Local Directory: /__w/1/s/python-package/_test_worker-59578367-9a9e-4146-8046-12a28195c1e4/dask-worker-space/worker-sxa57r7x
2021-03-23T11:32:00.3670148Z distributed.worker - INFO - -------------------------------------------------
2021-03-23T11:32:00.3670830Z distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:39083
2021-03-23T11:32:00.3671559Z distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:39083
2021-03-23T11:32:00.3672321Z distributed.worker - INFO -          dashboard at:            127.0.0.1:45311
2021-03-23T11:32:00.3673083Z distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:44705
2021-03-23T11:32:00.3673740Z distributed.worker - INFO - -------------------------------------------------
2021-03-23T11:32:00.3674413Z distributed.worker - INFO -               Threads:                          1
2021-03-23T11:32:00.3675194Z distributed.worker - INFO -                Memory:                    8.35 GB
2021-03-23T11:32:00.3676065Z distributed.worker - INFO -       Local Directory: /__w/1/s/python-package/_test_worker-dab41a7a-2886-4e47-a534-0b90ff7ff829/dask-worker-space/worker-dedr43e2
2021-03-23T11:32:00.3677019Z distributed.worker - INFO - -------------------------------------------------
2021-03-23T11:32:00.3677927Z distributed.scheduler - INFO - Register worker <Worker 'tcp://127.0.0.1:37381', name: tcp://127.0.0.1:37381, memory: 0, processing: 0>
2021-03-23T11:32:00.3679032Z distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:37381
2021-03-23T11:32:00.3679706Z distributed.core - INFO - Starting established connection
2021-03-23T11:32:00.3680422Z distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:44705
2021-03-23T11:32:00.3681131Z distributed.worker - INFO - -------------------------------------------------
2021-03-23T11:32:00.3682110Z distributed.scheduler - INFO - Register worker <Worker 'tcp://127.0.0.1:39083', name: tcp://127.0.0.1:39083, memory: 0, processing: 0>
2021-03-23T11:32:00.3682928Z distributed.core - INFO - Starting established connection
2021-03-23T11:32:00.3683640Z distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:39083
2021-03-23T11:32:00.3684286Z distributed.core - INFO - Starting established connection
2021-03-23T11:32:00.3684967Z distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:44705
2021-03-23T11:32:00.3685675Z distributed.worker - INFO - -------------------------------------------------
2021-03-23T11:32:00.3686316Z distributed.core - INFO - Starting established connection
2021-03-23T11:32:00.3687035Z distributed.scheduler - INFO - Receive client connection: Client-c0e8c2b7-8bca-11eb-a01f-4f98255d250d
2021-03-23T11:32:00.3687807Z distributed.core - INFO - Starting established connection
2021-03-23T11:32:00.3688463Z ----------------------------- Captured stdout call -----------------------------
2021-03-23T11:32:00.3689032Z Finding random open ports for workers
2021-03-23T11:32:00.3689592Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3690055Z [LightGBM] [Info] Trying to bind port 42351...
2021-03-23T11:32:00.3690540Z [LightGBM] [Info] Binding port 42351 succeeded
2021-03-23T11:32:00.3691003Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3691495Z [LightGBM] [Warning] Connecting to rank 1 failed, waiting for 200 milliseconds
2021-03-23T11:32:00.3692052Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3692442Z [LightGBM] [Info] Trying to bind port 58255...
2021-03-23T11:32:00.3692858Z [LightGBM] [Info] Listening...
2021-03-23T11:32:00.3693235Z [LightGBM] [Info] Binding port 58255 succeeded
2021-03-23T11:32:00.3693609Z [LightGBM] [Info] Listening...
2021-03-23T11:32:00.3694004Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3694393Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3694818Z [LightGBM] [Info] Connected to rank 1
2021-03-23T11:32:00.3695278Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3695663Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3696104Z [LightGBM] [Info] Connected to rank 0
2021-03-23T11:32:00.3696761Z [LightGBM] [Info] Local rank: 0, total number of machines: 2
2021-03-23T11:32:00.3697289Z [LightGBM] [Info] Local rank: 1, total number of machines: 2
2021-03-23T11:32:00.3698123Z [LightGBM] [Warning] num_threads is set=1, n_jobs=-1 will be ignored. Current value: num_threads=1
2021-03-23T11:32:00.3698951Z [LightGBM] [Warning] num_threads is set=1, n_jobs=-1 will be ignored. Current value: num_threads=1
2021-03-23T11:32:00.3699687Z Using passed-in 'machines' parameter
2021-03-23T11:32:00.3700080Z [LightGBM] [Info] Listening...
2021-03-23T11:32:00.3700448Z [LightGBM] [Info] Listening...
2021-03-23T11:32:00.3700901Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3701366Z [LightGBM] [Warning] Set TCP_NODELAY failed
2021-03-23T11:32:00.3701989Z Using passed-in 'local_listen_port' for all workers
2021-03-23T11:32:00.3702654Z ----------------------------- Captured stderr call -----------------------------
2021-03-23T11:32:00.3703425Z distributed.worker - INFO - Run out-of-band function '_find_random_open_port'
2021-03-23T11:32:00.3704109Z distributed.worker - INFO - Run out-of-band function '_find_random_open_port'
2021-03-23T11:32:00.3704795Z --------------------------- Captured stderr teardown ---------------------------
2021-03-23T11:32:00.3705621Z distributed.scheduler - INFO - Remove client Client-c0e8c2b7-8bca-11eb-a01f-4f98255d250d
2021-03-23T11:32:00.3706382Z distributed.scheduler - INFO - Remove client Client-c0e8c2b7-8bca-11eb-a01f-4f98255d250d
2021-03-23T11:32:00.3707218Z distributed.scheduler - INFO - Close client connection: Client-c0e8c2b7-8bca-11eb-a01f-4f98255d250d
2021-03-23T11:32:00.3707967Z distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:39083
2021-03-23T11:32:00.3708716Z distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:37381
2021-03-23T11:32:00.3709706Z distributed.scheduler - INFO - Remove worker <Worker 'tcp://127.0.0.1:39083', name: tcp://127.0.0.1:39083, memory: 0, processing: 0>
2021-03-23T11:32:00.3710643Z distributed.core - INFO - Removing comms to tcp://127.0.0.1:39083
2021-03-23T11:32:00.3711570Z distributed.scheduler - INFO - Remove worker <Worker 'tcp://127.0.0.1:37381', name: tcp://127.0.0.1:37381, memory: 0, processing: 0>
2021-03-23T11:32:00.3712604Z distributed.core - INFO - Removing comms to tcp://127.0.0.1:37381
2021-03-23T11:32:00.3713283Z distributed.scheduler - INFO - Lost all workers
2021-03-23T11:32:00.3713862Z distributed.scheduler - INFO - Scheduler closing...
2021-03-23T11:32:00.3714539Z distributed.scheduler - INFO - Scheduler closing all comms

@StrikerRUS
Copy link
Collaborator

StrikerRUS commented Mar 23, 2021

One more time:

Created #4099.

StrikerRUS added a commit that referenced this pull request Mar 25, 2021
* [docs]Add alt text on images

* Update docs/GPU-Windows.rst

Co-authored-by: James Lamb <[email protected]>

* Update docs/GPU-Windows.rst

Co-authored-by: James Lamb <[email protected]>

* Apply suggestions from code review

Co-authored-by: James Lamb <[email protected]>

* Apply suggestions from code review

Co-authored-by: James Lamb <[email protected]>

* Merge main branch commit updates (#1)

* [docs] Add alt text to image in Parameters-Tuning.rst (#4035)

* [docs] Add alt text to image in Parameters-Tuning.rst

Add alt text to Leaf-wise growth image, as part of #4028

* Update docs/Parameters-Tuning.rst

Co-authored-by: James Lamb <[email protected]>

Co-authored-by: James Lamb <[email protected]>

* [ci] [R-package] upgrade to R 4.0.4 in CI (#4042)

* [docs] update description of deterministic parameter (#4027)

* update description of deterministic parameter to require using with force_row_wise or force_col_wise

* Update include/LightGBM/config.h

Co-authored-by: Nikita Titov <[email protected]>

* update docs

Co-authored-by: Nikita Titov <[email protected]>

* [dask] Include support for init_score (#3950)

* include support for init_score

* use dataframe from init_score and test difference with and without init_score in local model

* revert refactoring

* initial docs. test between distributed models with and without init_score

* remove ranker from tests

* test value for root node and change docs

* comma

* re-include parametrize

* fix incorrect merge

* use single init_score and the booster_ attribute

* use np.float64 instead of float

* [ci] ignore untitle Jupyter notebooks in .gitignore (#4047)

* [ci] prevent getting incompatible dask and distributed versions (#4054)

* [ci] prevent getting incompatible dask and distributed versions

* Update .ci/test.sh

Co-authored-by: Nikita Titov <[email protected]>

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [ci] fix R CMD CHECK note about example timings (fixes #4049) (#4055)

* [ci] fix R CMD CHECK note about example timings (fixes #4049)

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [ci] add CMake + R 3.6 test back (fixes #3469) (#4053)

* [ci] add CMake + R 3.6 test back (fixes #3469)

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* Update .ci/test_r_package_windows.ps1

* -Wait and remove rtools40

* empty commit

Co-authored-by: Nikita Titov <[email protected]>

* [dask] include multiclass-classification task in tests (#4048)

* include multiclass-classification task and task_to_model_factory dicts

* define centers coordinates. flatten init_scores within each partition for multiclass-classification

* include issue comment and fix linting error

* Update index.rst (#4029)

Add alt text to logo image

Co-authored-by: James Lamb <[email protected]>

* [dask] raise more informative error for duplicates in 'machines' (fixes #4057) (#4059)

* [dask] raise more informative error for duplicates in 'machines'

* uncomment

* avoid test failure

* Revert "avoid test failure"

This reverts commit 9442bdf.

* [dask] add tutorial documentation (fixes #3814, fixes #3838) (#4030)

* [dask] add tutorial documentation (fixes #3814, fixes #3838)

* add notes on saving the model

* quick start examples

* add examples

* fix timeouts in examples

* remove notebook

* fill out prediction section

* table of contents

* add line back

* linting

* isort

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

* move examples under python-guide

* remove unused pickle import

Co-authored-by: Nikita Titov <[email protected]>

* set 'pending' commit status for R Solaris optional workflow (#4061)

* [docs] add Yu Shi to repo maintainers (#4060)

* Update FAQ.rst

* Update CODEOWNERS

* set is_linear_ to false when it is absent from the model file (fix #3778) (#4056)

* Add CMake option to enable sanitizers and build gtest (#3555)

* Add CMake option to enable sanitizer

* Set up gtest

* Address reviewer's feedback

* Address reviewer's feedback

* Update CMakeLists.txt

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: Nikita Titov <[email protected]>

* added type hint (#4070)

* [ci] run Dask examples on CI (#4064)

* Update Parallel-Learning-Guide.rst

* Update test.sh

* fix path

* address review comments

* [python-package] add type hints on Booster.set_network() (#4068)

* [python-package] add type hints on Booster.set_network()

* change behavior

* [python-package] Some mypy fixes (#3916)

* Some mypy fixes

* address James' comments

* Re-introduce pass in empty classes

* Update compat.py

Remove extra lines

* [dask] [ci] fix flaky network-setup test (#4071)

* [tests][dask] simplify code in Dask tests (#4075)

* simplify Dask tests code

* enable CI

* disable CI

* Revert "[ci] prevent getting incompatible dask and distributed versions (#4054)" (#4076)

This reverts commit 4e9c976.

* Fix parsing of non-finite values (#3942)

* Fix index out-of-range exception generated by BaggingHelper on small datasets.

Prior to this change, the line "score_t threshold = tmp_gradients[top_k - 1];" would generate an exception, since tmp_gradients would be empty when the cnt input value to the function is zero.

* Update goss.hpp

* Update goss.hpp

* Add API method LGBM_BoosterPredictForMats which runs prediction on a data set given as of array of pointers to rows (as opposed to existing method LGBM_BoosterPredictForMat which requires data given as contiguous array)

* Fix incorrect upstream merge

* Add link to LightGBM.NET

* Fix indenting to 2 spaces

* Dummy edit to trigger CI

* Dummy edit to trigger CI

* remove duplicate functions from merge

* Fix parsing of non-finite values.  Current implementation silently returns zero when input string is "inf", "-inf", or "nan" when compiled with VS2017, so instead just explicitly check for these values and fail if there is no match.  No attempt to optimise string allocations in this implementation since it is usually rarely invoked.

* Dummy commit to trigger CI

* Also handle -nan in double parsing method

* Update include/LightGBM/utils/common.h

Remove trailing whitespace to pass linting tests

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>

* [dask] remove unused imports from typing (#4079)

* Range check for DCG position discount lookup (#4069)

* Add check to prevent out of index lookup in the position discount table. Add debug logging to report number of queries found in the data.

* Change debug logging location so that we can print the data file name as well.

* Revert "Change debug logging location so that we can print the data file name as well."

This reverts commit 3981b34.

* Add data file name to debug logging.

* Move log line to a place where it is output even when query IDs are read from a separate file.

* Also add the out-of-range check to rank metrics.

* Perform check after number of queries is initialized.

* Update

* [ci] upgrade R CI scripts to work on Ubuntu 20.04 (#4084)

* [ci] install additional LaTeX packages in R CI jobs

* update autoconf version

* bump upper limit on package size to 100

* [SWIG] Add streaming data support + cpp tests (#3997)

* [feature] Add ChunkedArray to SWIG

* Add ChunkedArray
* Add ChunkedArray_API_extensions.i
* Add SWIG class wrappers

* Address some review comments

* Fix linting issues

* Move test to tests/test_ChunkedArray_manually.cpp

* Add test note

* Move ChunkedArray to include/LightGBM/utils/

* Declare more explicit types of ChunkedArray in the SWIG API.

* Port ChunkedArray tests to googletest

* Please C++ linter

* Address StrikerRUS' review comments

* Update SWIG doc & disable ChunkedArray<int64_t>

* Use CHECK_EQ instead of assert

* Change include order (linting)

* Rename ChunkedArray -> chunked_array files

* Change header guards

* Address last comments from StrikerRUS

* store all CMake files in one place (#4087)

* v3.2.0 release (#3872)

* Update VERSION.txt

* update appveyor.yml and configure

* fix Appveyor builds

Co-authored-by: James Lamb <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>

* [ci] Bump version for development (#4094)

* Update .appveyor.yml

* Update cran-comments.md

* Update VERSION.txt

* update configure

Co-authored-by: James Lamb <[email protected]>

* [ci] fix flaky Azure Pipelines jobs (#4095)

* Update test.sh

* Update setup.sh

* Update .vsts-ci.yml

* Update test.sh

* Update setup.sh

* Update .vsts-ci.yml

* Update setup.sh

* Update setup.sh

Co-authored-by: Subham Agrawal <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: shiyu1994 <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: jmoralez <[email protected]>
Co-authored-by: marcelonieva7 <[email protected]>
Co-authored-by: Philip Hyunsu Cho <[email protected]>
Co-authored-by: Deddy Jobson <[email protected]>
Co-authored-by: Alberto Ferreira <[email protected]>
Co-authored-by: mjmckp <[email protected]>
Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: ashok-ponnuswami-msft <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: James Lamb <[email protected]>
Co-authored-by: Subham Agrawal <[email protected]>
Co-authored-by: shiyu1994 <[email protected]>
Co-authored-by: Nikita Titov <[email protected]>
Co-authored-by: jmoralez <[email protected]>
Co-authored-by: marcelonieva7 <[email protected]>
Co-authored-by: Philip Hyunsu Cho <[email protected]>
Co-authored-by: Deddy Jobson <[email protected]>
Co-authored-by: Alberto Ferreira <[email protected]>
Co-authored-by: mjmckp <[email protected]>
Co-authored-by: matthew-peacock <[email protected]>
Co-authored-by: Guolin Ke <[email protected]>
Co-authored-by: ashok-ponnuswami-msft <[email protected]>
Co-authored-by: StrikerRUS <[email protected]>
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants