Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore bad NetDefs and files via parser flags #412

Merged
merged 6 commits into from
Apr 30, 2024

Conversation

daniloegea
Copy link
Collaborator

@daniloegea daniloegea commented Oct 4, 2023

Description

This is an implementation of what I called "parser flags". The idea is to enable the user to change some parsing decisions.

The main application at the moment is to support ignoring parsing errors. Currently Netplan will not generate any configuration if a little mistake is made in one of its YAML files. It can be really bad if you're operating a remote system and end up rebooting the system with a syntax issue in one of your files. Your system will basically boot without any network configuration.

The IGNORE_ERRORS flag will allow the Netplan generator to ignore bad files and bad netdefs so it will still generate some configuration.

TESTS

Integration tests will be executed twice now, once without the ignore_errors flags and once with it enabled. It's intended to increase the confidence that the parser behaves as expected when the flag is enabled and the configuration is good.

I prepared a PPA for Ubuntu Noble with this patch: https://launchpad.net/~danilogondolfo/+archive/ubuntu/netplan-parser-flags

Suggestions of tests:

It can be easily tested in a LXD VM. You can add some broken configuration and reboot the VM or call /usr/libexec/netplan/generate -i to see the parsing process when it's ignoring errors

  1. Break the default 50-cloud-init.yaml and reboot the VM. For example, the config below should still bring enp5s0 up and with DHCP working.
network:
  version: 2
  ethernets:
    enp5s0:
      dhcp4: true
      dhcp: true
  1. Add a second file that is not a valid YAML
  2. Add a third broken file with some complex configuration and dependencies between netdefs. For example, with the configuration below eth2 should still have a backend configuration file and it should contain Bond=bond0.
network:
  ethernets:
    eth0: {}
    eth1: {}
    eth2:
      a: b

  bonds:
    bond0:
      interfaces:
        - eth0
        - eth1
        - eth2
  1. Add a fourth broken file with missing dependencies. For example, with the config below you still should have a br0 interface created:
network:
  bridges:
    br0:
      addresses:
        - 10.20.30.40/24

      interfaces:
        - aaaa
        - bbbb
        - dddd

Checklist

  • Runs make check successfully.
  • Retains 100% code coverage (make check-coverage).
  • New/changed keys in YAML format are documented.
  • (Optional) Adds example YAML for new feature.
  • (Optional) Closes an open bug in Launchpad.

@daniloegea daniloegea force-pushed the ignore_bad_netdefs_and_files branch 3 times, most recently from 148c0b5 to 907163e Compare January 29, 2024 12:00
@daniloegea daniloegea force-pushed the ignore_bad_netdefs_and_files branch 4 times, most recently from ab633f2 to e4eecd0 Compare February 20, 2024 18:26
Copy link
Contributor

@schopin-pro schopin-pro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that was a fun lunch break :)

python-cffi/netplan/_build_cffi.py Outdated Show resolved Hide resolved
include/parse.h Outdated Show resolved Hide resolved
python-cffi/netplan/parser.py Outdated Show resolved Hide resolved
python-cffi/netplan/parser.py Outdated Show resolved Hide resolved
src/parse.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the overall approach of this. It is a slight change in behavior, but only for the better, as a non-configured network (due to parsing errors) is always a bad thing. It could lead to unexpected network configuration if some files are ignored but other aren't, but IMO that is still better than no network configuration at all.

src/generate.c Show resolved Hide resolved
include/parse.h Outdated Show resolved Hide resolved
src/generate.c Outdated Show resolved Hide resolved
src/parse.c Outdated Show resolved Hide resolved
include/types.h Outdated Show resolved Hide resolved
src/util.c Outdated Show resolved Hide resolved
tests/generator/test_errors.py Show resolved Hide resolved
@slyon slyon changed the title Ignore bad netdefs and files Ignore bad NetDefs and files via parser flags Feb 22, 2024
@daniloegea daniloegea force-pushed the ignore_bad_netdefs_and_files branch 10 times, most recently from cb84ed4 to fe9f371 Compare April 24, 2024 16:25
@daniloegea daniloegea marked this pull request as ready for review April 24, 2024 16:42
@daniloegea
Copy link
Collaborator Author

Thanks for your comments, lads. I basically adopted all of them.

I decided to not drop bad netdefs anymore. I think it's actually better to keep them and generate configuration for what was parsed until the point it failed. There is a chance the interface will still be brought up and work if the error happened after settings IPs or DHCP for example.

To make sure (or at least increase the confidence) that these changes will not cause a change in behavior when the flag is enabled and all the configuration is good, I changed our integration/run.py script to run each test twice, with and without the flag.

The new "Configuration fuzzing" CI action is catching a memory leak! As far as I can tell, it's happening inside glib and there is a comment in the code that handles datalists saying it could happen, see FR-5666 for more details. So, this leak is not introduces by these changes.

I prepared a PPA and added to suggestions of tests in the PR comments.

@daniloegea daniloegea requested a review from slyon April 24, 2024 16:52
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working through all the remarks. This should be relatively safe now.

The doubling of testing-time is a though sell... OTOH more coverage is better. But at least it should be easy to disable (e.g. through ENV variable), so we can skip the double-testing for local tests.

I left a bunch of additional inline comments that need additional consideration.

tests/generator/test_errors.py Show resolved Hide resolved
tests/integration/run.py Show resolved Hide resolved
tests/integration/run.py Outdated Show resolved Hide resolved
include/parse.h Outdated Show resolved Hide resolved
include/parse.h Outdated Show resolved Hide resolved
src/types-internal.h Show resolved Hide resolved
tests/generator/test_errors.py Show resolved Hide resolved
tests/generator/test_errors.py Show resolved Hide resolved
src/generate.c Outdated Show resolved Hide resolved
src/networkd.c Outdated Show resolved Hide resolved
Parser flags are intended to be used to change the parser's behavior.
This patch introduces a single flag used to instruct the parser that
parsing errors should be ignored and it should continue to parse
netdefs.

The main goal with this flag is to make Netplan more resilient.
Currently, any mistake made in the YAML files will prevent Netplan to
generate configuration. So, rebooting the system while one of the YAML
files are broken will result in the system having no network connectivity.
@daniloegea daniloegea force-pushed the ignore_bad_netdefs_and_files branch 3 times, most recently from ce7a9fd to 711677b Compare April 29, 2024 15:00
@daniloegea daniloegea force-pushed the ignore_bad_netdefs_and_files branch 3 times, most recently from af92b75 to 9b7340d Compare April 30, 2024 09:57
@daniloegea
Copy link
Collaborator Author

Thanks Lukas, I tried to address all of your comments.

I added an error counter to the parser state instead of a boolean, that might be more useful I think.
I also added a new parser_error to the enum and exposed it to the python code.
I will squash the two extra commits that contain these changes later.

The first commit was split into the definition and implementation of flags as suggested by @schopin-pro in a conversation we had.

The config_fuzzer was changed to run the generator with the --ignore-errors flag for one YAML at a time to avoid that annoying glib memory leak. It's not the ideal test though... I need to investigate how the leak can be identified and ignored.

@daniloegea daniloegea requested a review from slyon April 30, 2024 10:58
Copy link
Collaborator

@slyon slyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the remarks! lgtm.

I like the error handling to be integrated with existing error handling logic and the .error_count field.

When the ignore_errors flag is enabled, the parser will ignore YAML
files that failed to be loaded due to syntax errors. YAMLs that are
loaded successfully but contain invalid Netplan configuration will not
be completely ignored.

Network definitions that contain errors will not be completely ignored.
The parser will ignore the error in the bad YAML mapping and continue
with the next one. By doing it, all the good definitions will produce
backend configuration and the bad ones will produce *some* backend
configuration.

Add a simple counter to the parser struct that will store the number of
errors that were ignored. This can be used by callers to check if there
were errors when the IGNORE_ERRORS flag is used.
Add a new flag --ignore-errors. It it's used mostly for tests.

IGNORE_ERRORS will be set either if --ignore-errors or the environment
variable NETPLAN_PARSER_IGNORE_ERRORS are set. These are mostly used for
tests from the unit and integration tests.

The flag will be set by default if "generate" is called as a systemd
generator. The idea is to ignore errors when the machine boots up so the
system still has a chance to have a working network configuration if an
issue is introduced in any of the YAML files.

For the time being, this flag will not be enabled by default when the
CLI is used. This will make the issue evident so the user will need to
fix it.
Add a "flags" property that can be used as a getter and setter for
flags.

Add the error_count getter to Parser.

Add some unit tests using the new properties.
config_fuzzer: run generate against the whole dataset generated by the
config fuzzer and ignore all the errors.

integration/run.py: run integration tests a second time with
ignore_errors set via the NETPLAN_PARSER_IGNORE_ERRORS environment
variable.

integration/dbus.py: call apply after set. Without that, the netplan
state will be considered dirty and the dbus tests will fail when called
a second time.
Add a new section to "Explanation" about the generator behavior.

Update netplan-generate(8) to mention the new generator behavior.
@daniloegea daniloegea merged commit 11fe6f2 into canonical:main Apr 30, 2024
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants