Get parallel runner to recognise framework-side logging config #1558

antonymilne · 2022-05-19T12:27:14Z

The parallel runner does not seem to use the framework-side logging config. By default, if you do a kedro run --runner=ParallelRunner then the project-side logging config is picked up correctly and you will see the logs correctly in the console and in the logs/info.log file. However, if you delete conf/base/logging.yml then most messages will no longer be logged:

See this explanation for how logging config is setup for kedro run. However, with the parallel runner there's something else that happens too. Assuming that conf/base/logging.yml does exist:

kedro/kedro/framework/session/session.py

Line 190 in 676600c

session._setup_logging()

calls this function:

kedro/kedro/framework/project/__init__.py

Line 208 in 62cc27a

def configure_logging(logging_config: Dict[str, Any]) -> None:

This sets the global LOGGING to be the project-side logging config.
LOGGING is used in https://github.com/kedro-org/kedro/blob/main/kedro/runner/parallel_runner.py to _bootstrap_subprocess which means that all subprocesses have the project-side logging config.

This seems to work (although it's not very nice to have the global variable). However, when there's no conf/base/logging.yml, configure_logging isn't called, so LOGGING will be left as None, so _bootstrap_subprocess won't set the logging configuration. This is fine because there's no project-side logging.yml to use BUT the problem is that the framework-side logging doesn't make it into the subprocesses, so no log messages get shown at all.

What we need to do is:

make sure that the framework-side logging config makes it to the subprocesses. e.g. maybe the default LOGGING should not be None but instead what comes out of kedro.config.default_logger
ideally figure out a way to do all this without needing the global LOGGING so we can remove that

The text was updated successfully, but these errors were encountered:

noklam · 2022-06-20T16:25:29Z

Interestingly, the behaviour is not the same in Windows and Linux.

If I do kedro run --runner ParallelRunner, the console logs are actually there even if I delete the logging.yml.

antonymilne · 2022-06-20T17:06:51Z

The above problem should exist only for cases where the subprocesses are spawned rather than forked. I can confirm that you should see the following behaviour with no conf/base/logging.yml:

spawn, as done on Windows (all Python versions) and macOS (Python 3.8+) ➡️ subprocesses call _bootstrap_subprocess with conf_logging is None hence above bug exists (initial INFO Kedro project spaceflights gets shown in rich, then there's no output until Pipeline execution completed successfully)
fork, as done on Unix (all Python versions) and macOS (Python 3.7) ➡️ subprocesses do not call _bootstrap_subprocesses and Python logging is still there as the framework-side one, hence above bug doesn't exist (and all messages get rendered in rich as expected)

There does not seem to be a good way to get rid of LOGGING global variable. This was actually added relatively recently (#1103 and #1114) and is a lighter coupling than we used to have (which was to the kedro session). The ultimate problem boils down to how Python spawns processes (different behaviour from forking). See https://github.com/quantumblacklabs/private-kedro/pull/850.

noklam · 2022-06-21T09:33:04Z

See the results in PDF ParallelRunnerLogging.pdf

Few things that I found inconsistent.

1st round of experiment - All without `logging.yml`

I expect Mac Py3.7 should has the same behavior as Unix, but here I see the full log printed out. (with/without plugin)
With plugin installed, all logs are printed for Mac Py3.8 (Spawn process)

Seeing the log is not clear enough, so I do another set of testing with logging.yml but with console log, so it's clearly if there are logs that didn't apply the config correctly.

2nd round - with `logging.yml` + `console` handler

This time, Mac Py3.8 + without plugin logs are completely normal and format correctly.
Mac 3.7 logs are corrected - which is expected because we have provide logging.yml
Mac Py3.8 + plugin, you can clearly there a few lines of log didn't format properly.

antonymilne · 2022-06-22T11:43:36Z

Thanks very much @noklam for all this testing, it's very helpful!!

1st round of experiment - All without `logging.yml`

I expect Mac Py3.7 should has the same behavior as Unix, but here I see the full log printed out. (with/without plugin)

This is what I would expect: python 3.7 on mac should indeed be the same as Unix, which means processes are forked, and so the bug where logs don't get shown with ParallelRunner doesn't exist.

With plugin installed, all logs are printed for Mac Py3.8 (Spawn process)

This is definitely weird and not what I expected.

2nd round - with `logging.yml` + `console` handler

This time, Mac Py3.8 + without plugin logs are completely normal and format correctly.

Mac 3.7 logs are corrected - which is expected because we have provide logging.yml

These are both what I would expect.

Mac Py3.8 + plugin, you can clearly there a few lines of log didn't format properly.

This is definitely weird and not what I expected. Just to check: these are exactly the same lines you see missing from the Windows e2e test log file, right?

I will investigate further but overall the issue definitely seems to be plugin + logging + spawned process (i.e. mac + Python 3.8 or Windows e2e tests). Hopefully we can kill both this issue and your windows e2e test issue at the same time.

antonymilne · 2022-06-22T11:45:34Z

P.S. do you know why all those geopandas warnings are being thrown for Python 3.8?

noklam · 2022-06-22T11:50:22Z

Correct, they match the missing log that fail the e2etest, which suggest the logging.yml isn't applied on that few lines of log but we don't know the root cause yet.

noklam · 2022-06-28T17:30:11Z

@AntonyMilneQB Sorry just saw your comments now. I test it on my Mac and I get these warnings in Python3.7, 3.9, 3.10 but not 3.8.

The warning is due to distutils now being dropped and the latest geopandas (0.11.0) have been updated with this change.

geopandas/geopandas#2083

The warning would show up on geopandas<0.11.0, geopandas==0.11.0 only supports Python >= 3.8. It should install the latest version automatically when possible.

geopandas_require = {
"geopandas.GeoJSONDataSet": ["geopandas>=0.6.0, <1.0", "pyproj~=3.0"]
}

P.S I get an error when trying to do pip install "kedro[geopandas.geoJSONDataSet" -U, not sure why telemetry have such a strict requirements for requests, and it also fails to upgrade my geopandas. pip install geopandas -U would upgrade my library successfully.

kedro-telemetry 0.2.0 requires requests~=2.25.1, but you have requests 2.27.1 which is incompatible.

antonymilne mentioned this issue May 19, 2022

Improve logging setup in Kedro #1461

Closed

18 tasks

antonymilne added this to the Improve logging setup milestone May 19, 2022

antonymilne self-assigned this Jun 12, 2022

antonymilne mentioned this issue Jun 22, 2022

Put framework logging into LOGGING and fix import resetting logging #1644

Merged

5 tasks

antonymilne mentioned this issue Jun 29, 2022

Loosen requests requirement kedro-org/kedro-plugins#42

Merged

4 tasks

antonymilne closed this as completed in #1644 Jun 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get parallel runner to recognise framework-side logging config #1558

Get parallel runner to recognise framework-side logging config #1558

antonymilne commented May 19, 2022 •

edited

Loading

noklam commented Jun 20, 2022 •

edited

Loading

antonymilne commented Jun 20, 2022 •

edited

Loading

noklam commented Jun 21, 2022 •

edited

Loading

antonymilne commented Jun 22, 2022 •

edited

Loading

antonymilne commented Jun 22, 2022

noklam commented Jun 22, 2022

noklam commented Jun 28, 2022 •

edited

Loading

Get parallel runner to recognise framework-side logging config #1558

Get parallel runner to recognise framework-side logging config #1558

Comments

antonymilne commented May 19, 2022 • edited Loading

noklam commented Jun 20, 2022 • edited Loading

antonymilne commented Jun 20, 2022 • edited Loading

noklam commented Jun 21, 2022 • edited Loading

1st round of experiment - All without logging.yml

2nd round - with logging.yml + console handler

antonymilne commented Jun 22, 2022 • edited Loading

1st round of experiment - All without logging.yml

2nd round - with logging.yml + console handler

antonymilne commented Jun 22, 2022

noklam commented Jun 22, 2022

noklam commented Jun 28, 2022 • edited Loading

antonymilne commented May 19, 2022 •

edited

Loading

noklam commented Jun 20, 2022 •

edited

Loading

antonymilne commented Jun 20, 2022 •

edited

Loading

noklam commented Jun 21, 2022 •

edited

Loading

1st round of experiment - All without `logging.yml`

2nd round - with `logging.yml` + `console` handler

antonymilne commented Jun 22, 2022 •

edited

Loading

1st round of experiment - All without `logging.yml`

2nd round - with `logging.yml` + `console` handler

noklam commented Jun 28, 2022 •

edited

Loading