Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix a pile of directory and/or state retention related issues #15243

Merged
merged 19 commits into from
Jan 28, 2024

Conversation

eqvinox
Copy link
Contributor

@eqvinox eqvinox commented Jan 27, 2024

All of these are inter-related to some degree, here's the rough grouping:

  • f8eb388 build: add ax_recursive_eval.m4
  • b052b18 build: add recursion limit for AX_RECURSIVE_EVAL
  • 27f6171 build: use AX_RECURSIVE_EVAL

These 3 just replace some open-coded autoconf hacking with a proper autoconf function.

  • ff62df2 build: untangle sysconfdir & localstatedir
  • 444bc5e build: update packaging & docs for dir changes

Main change re. ./configure parameters:

after this, --sysconfdir= and --localstatedir= expect /etc and /var (previously /etc/frr and /var/run/frr). Both of these now match standard package behavior. (The frr is appended by autoconf.) The localstatedir one is necessary to correctly encode /var/lib/frr down the road. The sysconfdir one comes along for the ride to make packaging life easier.

Note: --sbindir --libdir etc. don't have the same level of standard package behavior, and different distros do different things here anyway (e.g. multi-arch libraries), that's why they're intentionally not getting the same treatment.

  • 42eeea0 *: rename frr_vtydir to frr_runstatedir

Cleanup to make things less confusing.

  • 0d5a249 lib, mgmtd: fix wrong mgmtd socket paths`

This was plain wrong, mgmtd didn't correctly do -N.

  • 80b6787 build: nuke BFDD_CONTROL_SOCKET
  • 72783ec build: nuke ZEBRA_SERV_PATH
  • 0f79e6b build: nuke LDPD_SOCKET

autoconf simplifications, there's no benefit in plumbing these through autoconf

  • a97d0c5 lib: set up frr_libstatedir

Main change: start using /var/lib/frr.

Any state-saving features in FRR need to go to /var/lib/frr, not /var/run/frr. /var/run is cleared out across reboots and therefore unsuitable for these. The daemons will create the directory (with correct permissions, which are frr:frr 0700) if it doesn't exist.

  • cd35ecc lib: create frr_daemon_state_{load,save}

Extracting common code from isisd, ospfd and ospf6d, and fixing state sync writes with proper sync-to-disk behavior.

  • 634f481 *: fix frr_daemon_info indentation

Indentation fix, here because I'm touching these in the next commits.

  • e354c6e isisd: fix overload state location
  • 110945b ospfd: fix GR state location
  • 567f570 ospf6d: fix GR & auth seqno state location

Proper state save and load using libfrr common functions and /var/lib/frr.

  • bfd6d8e lib, mgmtd: fix commit history location

Same kind of problem, mgmtd was saving things into places that get erased across reboots. Oops.

  • f1ad2c4 babeld: remove bogus config path print

Just an incorrect printout.

  • bbd8589 build: homologize path handling

This is, TBFH, mostly a guardrail against future issues of the same kind. The ./configure paths are moved into lib/config_paths.h, which is a red flag if you see it included somewhere else, as noted in the file.

  • d5dfc9b doc, build: shorten too long filename

Freebie at the end. This caused make dist to fail. Removed due to conflict.


I tried actually running FRR for a change. It did not go well. I ended up stepping in a few piles of 💩 in the process.

The specific thing that made this cross the use-a-💩-emoji threshold is in bfd6d8e which clearly never worked to begin with and therefore cannot have been tested at all.

As found in the GNU autoconf archive
https://git.savannah.gnu.org/gitweb/?p=autoconf-archive.git

Signed-off-by: David Lamparter <[email protected]>
The autoconf version can just loop forever, abort instead.

Signed-off-by: David Lamparter <[email protected]>
Replace inline expansion loop.  (Also correctly handles prefix=NONE and
exec_prefix=NONE inside the macro.)

Signed-off-by: David Lamparter <[email protected]>
`--sysconfdir` should be `/etc` and `--localstatedir` should be `/var`.
The package-specific subdirectory should be added by configure, not
given by the user, to match established behavior by other packages.

Note that `--bindir`, `--sbindir`, `--libdir` and `--libexecdir` have
different established/expected behavior due to distro specific
multi-arch support.  That's why these are left unchanged.

The reason this is getting fixed now is that we need to use
`--localstatedir` for its actual value to put things in `/var/lib`.  As
it is now, being overloaded for `/run`, the configured `/var` path
becomes inaccessible.

Signed-off-by: David Lamparter <[email protected]>
`--sysconfdir` and `--localstatedir` now align with general autoconf
practices.

Signed-off-by: David Lamparter <[email protected]>
Also remove frr_init_vtydir(), just initialize to default.

Signed-off-by: David Lamparter <[email protected]>
These paths were ignoring the `-N` namespacing option.

Signed-off-by: David Lamparter <[email protected]>
This just unnecessarily complicates things by involving autoconf.

Signed-off-by: David Lamparter <[email protected]>
This just unnecessarily complicates things by involving autoconf.

Signed-off-by: David Lamparter <[email protected]>
This just unnecessarily complicates things by involving autoconf.

Signed-off-by: David Lamparter <[email protected]>
This needs to be used for persistent state, which currently is misplaced
into `/var/run` / `/run` where it gets deleted across reboots.

Signed-off-by: David Lamparter <[email protected]>
These functions load daemon-specific persistent state from
`/var/lib/frr` and supersede open-coded variants of similar calls in
ospfd, ospf6d and isisd to save GR state and/or sequence numbers.

Unlike the open-coded variants, the save call correctly `fsync()`s the
saved data to ensure disk contents are consistent.

Signed-off-by: David Lamparter <[email protected]>
clang-format doesn't understand FRR_DAEMON_INFO is a long macro where
laying out items semantically makes sense.

(Also use only one `FRR_DAEMON_INFO(` in isisd so editors don't get
confused with the mismatching `( ( )`.

Signed-off-by: David Lamparter <[email protected]>
This belongs in `/var/lib`, not `/var/run`.  Also the filename was
typo'd (`isid-restart.json`).

Change to proper location and fall back to previous in case it's the
first restart after an FRR update from a version with the bugged path.

Signed-off-by: David Lamparter <[email protected]>
This belongs in `/var/lib`, not `/var/run`.

Use library facility to load/save, support previous path as fallback,
and do proper fsync().

Signed-off-by: David Lamparter <[email protected]>
Unfortunately, `ospf6d` is much worse than `ospfd` and `isisd` regarding
its state saving, due to the existence of the auth trailer code.

Again, this belongs in `/var/lib`, not `/var/run`.

Merge both state files into one, and add reconciliation code for the
auth seqno.

I'm gonna save my comment on the fact that `ospf6_auth_seqno_nvm_delete`
is not in fact used anywhere.  Which is now a warning because it's
`static`.  Well.  It probably should be used somewhere, so leave it in.

Signed-off-by: David Lamparter <[email protected]>
Both of these belong in `/var/lib`, not `/var/run`.

Rather hilariously, the history read in
`mgmt_history_read_cmt_record_index` was always failing, because it was
doing a `file_exists(MGMTD_COMMIT_FILE_PATH)` check.  Which is the wrong
macro - it's `.../commit-%s.json`, including the unprocessed `%s`, which
would never exist.

I guess noone ever tried if this actually works.  Cool.

On the plus side, this means I don't have to implement legacy
compatibility for this, since it never worked to begin with.

(SQLite3 DB location is also changed in this commit since it also uses
`DAEMON_DB_DIR`.)

Signed-off-by: David Lamparter <[email protected]>
This doesn't align with what the code actually loads, remove it.

Signed-off-by: David Lamparter <[email protected]>
Use consistent `e_somepath` names for expanded versions of `somepath`.
Also remove all paths from `config.h` and put them into
`lib/config_paths.h` - this is to make more obvious when someone is
doing something probably not quite properly structured.

Signed-off-by: David Lamparter <[email protected]>
@eqvinox
Copy link
Contributor Author

eqvinox commented Jan 27, 2024

… frrbot complaining about diff --git a/tests/topotests/lib/topogen.py b/tests/topotests/lib/topogen.py

… that's not something I touched (I guess I touched code nearby) …

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@@ -0,0 +1,56 @@
# ===========================================================================
# https://www.gnu.org/software/autoconf-archive/ax_recursive_eval.html
# ===========================================================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason this commit doesn't replace the licensing with the spx first line while the second commit does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the import 1:1 to the source from the GNU autoconf archive before changing it, so it's visible what exactly changed

@@ -28,20 +28,18 @@ source="$pkgname-$pkgver.tar.gz"
builddir="$srcdir"/$pkgname-$pkgver

_sbindir=/usr/lib/frr
_sysconfdir=/etc/frr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of developers probably have custom scripts to set this up for themselves. Do we need to have them change anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configure.ac Outdated
@@ -2740,8 +2740,6 @@ AC_DEFINE_UNQUOTED([LDPD_SOCKET], ["$CFG_STATE%s%s/ldpd.sock"], [ldpd control so
AC_DEFINE_UNQUOTED([ZEBRA_SERV_PATH], ["$CFG_STATE%s%s/zserv.api"], [zebra api socket])
AC_DEFINE_UNQUOTED([BFDD_CONTROL_SOCKET], ["$CFG_STATE%s%s/bfdd.sock"], [bfdd control socket])
AC_DEFINE_UNQUOTED([OSPFD_GR_STATE], ["$CFG_STATE%s/ospfd-gr.json"], [ospfd GR state information])
AC_DEFINE_UNQUOTED([MGMTD_FE_SERVER_PATH], ["$CFG_STATE/mgmtd_fe.sock"], [mgmtd frontend server socket])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a genuine bug fix for mgmtd ( and should be backported to 9.0 and 9.1 as well right? ) Should this be in it's own PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I'm still pondering if we shouldn't just backport this entire PR; saving state to /var/run is such a weird thing to do. But fixing that needs the entire other pile of commits too, which is a bit much…

… and because we have (historically) overloaded --localstatedir in this stupid way I can't really "undo" or split off that change for a backport, without that /var/lib is kinda "inaccessible" …

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case in point: if you configure OSPFv3 with authentication trailers and then reboot your box… your sequence number is gone and all the other routers refuse you into the network due to replay protection 🤡

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(regardless, it's not going to be a clean backport ⇒ need manual backport PR anyways ⇒ can adjust/drop things if needed)

@eqvinox
Copy link
Contributor Author

eqvinox commented Jan 28, 2024

To be clear, this is a bugfix PR, not a cleanup PR. It just looks a bit like cleanup PR because I became exasperated enough to clean up the entire thing rather than fixing individual weirdness (and doing so was necessary because --localstatedir is weirdly overloaded)

@donaldsharp donaldsharp merged commit 259e3d4 into FRRouting:master Jan 28, 2024
9 checks passed
eqvinox added a commit to opensourcerouting/frr that referenced this pull request Apr 25, 2024
Adapting to FRRouting#15243 upstream FRR.

Signed-off-by: David Lamparter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants