Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logging setup in Kedro #1461

Closed
15 of 18 tasks
idanov opened this issue Apr 20, 2022 · 7 comments
Closed
15 of 18 tasks

Improve logging setup in Kedro #1461

idanov opened this issue Apr 20, 2022 · 7 comments

Comments

@idanov
Copy link
Member

idanov commented Apr 20, 2022

Currently Kedro has some issues with the logging setup, which can cause some problems from minor annoyances to failures to run a Kedro project in certain environments. Here is a small list of the problems our current logging setup causes:

  • Kedro logs at root level, rather than kedro.* in many places
  • root log level is set to INFO instead of WARNING on all official starters
  • on read-only file systems Kedro projects fail to execute, rather than just failing to write the logs
  • KedroSession logs useless things at INFO level, like e.g. that the default session store doesn’t have a read() method and git describe failures…
  • maybe I am mistaken, but I think Kedro logs things before the logging config is setup (to be confirmed).
  • CLI commands output too many things, e.g. when you do kedro catalog list it should only show the output so it can be machine readable, but now there's lots of DEBUG level worthy output
  • kedro leaves info.log files everywhere for unknown reasons (easiest to notice is when runing a notebook in notebooks/)
  • Kedro-Airflow not working with Astrocloud kedro-plugins#13

In order to address those issues, we need to work on the following:

@idanov idanov changed the title Remove all root-level logging inside Kedro Improve logging setup in Kedro Apr 20, 2022
@antonymilne
Copy link
Contributor

antonymilne commented Apr 21, 2022

For reference, some PRs where I tried to fix some of these things before. I wrote my findings up in the PRs at the time, so I'll re-read these to figure out better what we should do here:

@noklam
Copy link
Contributor

noklam commented Apr 21, 2022

In general, I am fine with most of the item with some comments for these 2 items.

Ensure logging.yml is optional and remove it from all starters

I like that Kedro is getting lean but I am not sure if I like the idea that the user needs to put the file back to the corresponding location in order to customize it (Although for logging.yml I doubt there are many people customize it).

But it is the same thing for cli.py, it's not obvious that you can extend the CLI by creating a new cli.py out of nowhere.

Is this a workaround to maintain backward compatibility or do we prefer users to customize with this approach? or we should gradually move things to settings.py?

Remove the file-based logging handlers by default

Personally, I like the RotatingFile handler, and IMO it's something that people are not using enough. It could be a life-saver when I am running a lot of experiments (especially without a proper experiment tracking tool). Searching in a single file is very handy even when I have a full experiment tracking setup.

Are we removing this because we no longer think this is a good default?

@antonymilne
Copy link
Contributor

antonymilne commented Apr 22, 2022

Thanks for the comments @noklam!

I like that Kedro is getting lean but I am not sure if I like the idea that the user needs to put the file back to the corresponding location in order to customize it (Although for logging.yml I doubt there are many people customize it).
But it is the same thing for cli.py, it's not obvious that you can extend the CLI by creating a new cli.py out of nowhere.
Is this a workaround to maintain backward compatibility or do we prefer users to customize with this approach? or we should gradually move things to settings.py?

This is a very fair point, and I'm also not sure about the removal of cli.py and hooks.py for the same reasons (like you say, logging.yml seems less important though). It's a balance between:

  • the kedro template should be simple for beginner users (in the past there's generally been feedback that it's too overwhelming)
  • it should be clear for advanced users that they can customise their kedro project, e.g. by modifying cli.py and logging.yml if they want to

If we don't supply some template (even blank file with comments) logging.yml or cli.py then we need to make it very clear in the documentation that a user can customise their project this way and make it easy for them to copy and paste an example file to do so. Currently we don't emphasise this enough I think.

Another way to achieve this is to keep the default template lean and create a more advanced starter that includes hooks, customised cli.py, etc.

settings.py is the new way to customise things in kedro, but not everything belongs in there. e.g. I don't think logging configuration would go there because it's run time configuration:
image

Personally, I like the RotatingFile handler, and IMO it's something that people are not using enough. It could be a life-saver when I am running a lot of experiments (especially without a proper experiment tracking tool). Searching in a single file is very handy even when I have a full experiment tracking setup.
Are we removing this because we no longer think this is a good default?

Definitely understand this and I also like the file-based logging. I think the problems are:

  • practical: it crashes on read-only file systems like databricks repos. I don't think it's possible to just catch and ignore these errors (but we should check that)
  • philosophical: a 12 factor app shouldn't have responsibility for logging (although in practice maybe it's best to make user lives easier by logging to a file)

Again, if we do remove this then I think we need to make it very clear in documentation how a user can create a logging.yml that will do file-based logging.

Edit: I think we should get some user opinions on this and there's now a ticket for it: #1472

@antonymilne antonymilne changed the title Improve logging setup in Kedro Improve logging setup in Kedro [parent issue] Apr 22, 2022
@merelcht merelcht added this to the Improve logging setup milestone Apr 25, 2022
@antonymilne
Copy link
Contributor

antonymilne commented May 4, 2022

For those not already familiar with it, here's how the logging setup in kedro currently works:

  1. Right at the beginning, as soon as you call any kedro command from the CLI we setup the default logger:
    import kedro.config.default_logger # noqa

    This sets up logging according to the framework-side logging config in kedro/config/logging.yml
  2. Anything that is logged at this point follows the framework-side logging config
  3. When a session is created, the project-side logging config in your project's conf/base/logging.yml is loaded. This is achieved by:
    session._setup_logging()

    The reason this happens relatively late on in the process (i.e. not right at the beginning) is because it needs the config loader available (e.g. to load logging.yml from a different environment, not that anyone ever does that).
  4. Anything that is logged at this point follows the project-side logging config

The above process will not change in any of the issues mentioned here. It's just that users will by default no longer need to (but still can) provide a project-side logging.yml file. This will make steps 3 and 4 above disappear for anyone who hasn't setup custom logging by providing a project-side logging.yml file, since the framework-side logging config will be loaded and not altered.

All logging in kedro should come from a named logger like this:

logger = logging.getLogger(__name__)
logger.info(...) # or logger.debug(...) etc.

The name is what determines which loggers to use (as set in the loggers section of the logging config), e.g. anything that has module name kedro.x will use the kedro.x logger config. This works hierarchically so that a module kedro.x.y.z will use kedro.x.y.z, kedro.x.y, kedro.x and root config unless we have propagate=False for that logger. If you don't set a name then root is used.

This actually reminds me of two small issues I should create:

@antonymilne
Copy link
Contributor

antonymilne commented May 16, 2022

The outcome of #1472, which showed that people do use the log files, unfortunately makes some of the simplification that was planned here difficult:

  1. We should maintain the current behaviour in which we log to files.
  2. One way of handling this is to ensure that the appropriate file handlers are activated in the framework-side logging.yml. Users would then need to create/modify their project-side logging.yml to disable file-based logging (e.g. on databricks repos).
  3. Defining the file-based logging on the framework side is unfortunately very awkward though. Python file-based loggers are able to create a file that they log to but not a directory structure. So we need to either (a) log to info.log/errors.log in the project root (seems ugly and is different from how we currently do it) or (b) write our own custom log handler that creates the directory (easy but annoying; means one more piece to maintain) or (c) ensure that no logging takes place before the project-side logging.yml is loaded (fragile, and actually impossible if file-based logging is applied to root logger, which ideally it would be).
  4. None of these seem like good solutions, and arguably running a kedro project in package mode shouldn't create log files (or need a log directory) anyway.
  5. Hence we should define the file-based handlers in the project-side logging.yml. The downside of this is that the default template cannot immediately be simplified to remove the project-side logging.yml or the logs directory. Users who need to disable file-based logging will need to modify their project-side logging.yml as in point 2.

So the plan in #1469 changes as follows:

  • Step 1: project logging.yml is still made optional but not deleted from the project template. The logs folder isn't deleted either. Instead, we move the file-based logging so that it's in the project template only
  • Step 2: this can be done more simply and directly just by adding the relevant lines in the project-side logging config

One advantage of this approach is that it is quite minimal compared to how things currently stand. Hence future changes that could simplify the project template here are still possible, e.g. to direct logs to project root info.log/errors.log instead of the logs folder and remove the logs directory.

@antonymilne
Copy link
Contributor

antonymilne commented Jul 4, 2022

Rough notes to be written up into a new issue in due course...

from rich import reconfigure
from rich import get_console

class Hooks:
    @hook_impl
    def after_context_created(self):
        reconfigure(record=True)

    @hook_impl
    def after_pipeline_run(self):
        get_console().save_text("logs.txt")

@antonymilne
Copy link
Contributor

Milestone completed and outstanding issues moved to
Tweak logging to improve UX
Complete rich-ification of kedro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Shipped 🚀
Development

No branches or pull requests

5 participants