Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(userspace/engine): remove Lua from Falco and re-implement the rule loader #1966

Merged
merged 13 commits into from
Apr 11, 2022

Conversation

jasondellaluce
Copy link
Contributor

@jasondellaluce jasondellaluce commented Apr 6, 2022

What type of PR is this?

/kind bug

/kind cleanup

/kind design

Any specific area of the project related to this PR?

/area build

/area engine

What this PR does / why we need it:

This PR follows up the recent refactoring of falcosecurity/libs#217 and #1947 with the end goal of having a single language parser inside Falco and libsinsp written in C++. Currently, Falco successfully uses the new language parser and compiler introduced in libsinsp. The only missing piece in the puzzle is to port the last portion of Lua code into C++, which right now represents the YAML ruleset file loading and runtime stats counting.

The contributions of this PR are:

  • Dropping legacy dependencies: chisels, luajit, lyaml, libyaml
  • Remove all the Lua code present in Falco
  • Re-implementing the YAML rule loader logic in C++
  • Fix some high-level issues related to the rule loader logic
  • Improve the documentation of the codebase

Since this work is big and non-trivial, the end goal of this is to not introduce any breaking change. As such, the rule loader re-implementation is a 1:1 porting of the legacy one written in Lua. The only divergences involve fixing a couple of high-level bugs and an improved codebase management. This is also the reason why the rule loader is implemented as a single class.

Goals:

  • Polishing the codebase making it more readable and easier to modify in the future by all contributors
  • Simplify the state management of the Rule Engine, which right now is split between the two Lua and C++ worlds
  • Improving the overall performance and memory footprint by having the whole codebase written in C++
  • Removing unhandy and unnecessary dependencies, and speed-up the build time
  • Fixing high-level notorious rule loading bugs
  • Opening the door for future refactorings and features, that could improve the quality of the Rule Engine as a whole

Non-goals:

  • Improving the logic and modularity of the rule loader. To make this easier to review, and to help everyone spotting potential regressions and prevent breaking changes, we want to achieve a 1:1 C++ porting of the legacy Lua rule loader. Once we are confident with these changes, I would like to work on another PR to refactor the code to a more modular fashion.

Which issue(s) this PR fixes:

Fixing bugs is not a goal of this PR, but after some testing I realized that the following have been solved:

Fixes #706
Fixes #1890
Fixes #1785

Special notes for your reviewer:

Considering how sticky the whole Lua codebase were, I could not find a way to break down this contribution in multiple PRs. Although the number of LOC is high, I've put special attention in developing the new rule loader as a ~1:1 of the existing one written in Lua. In that way, a reviewer could follow the code flow of the two versions, and easily understand if there is any point of divergence. Additionally, I spent quite some time reproducing error codes and behaviors as the one expected by the integration tests. Having the automatic testing passing this PR is a good helper and indicator of the correctness of the implementation. Do not esitate to contact me for further discussions or questions, either here below or through PM/email.

Does this PR introduce a user-facing change?:

refactor(userspace/engine): remove Lua from Falco and re-implement the rule loader

…ning common static utilities

Signed-off-by: Jason Dellaluce <[email protected]>
…o_utils

The function implementation was removed, however it was still defined in the .h header. Moreover,
this will now be required in order to replace its lua equivalent.

Signed-off-by: Jason Dellaluce <[email protected]>
…er to support shared_ptrs

This change allows working with safety with AST nodes wrapped into shared pointers.

Signed-off-by: Jason Dellaluce <[email protected]>
…-based O(1) access in vectors

Signed-off-by: Jason Dellaluce <[email protected]>
This is a porting of what we had inside the Lua codebase. This now handles the single responsibility
of gathering stats about rule-event matching, and of formatting them to print them to the user.

Signed-off-by: Jason Dellaluce <[email protected]>
Copy link
Contributor

@mstemm mstemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this!

I think the biggest feedback is to consider more strongly separating the yaml parsing from the other actions that work on rule/macro/etc objects, like expanding macros/lists, assembling exceptions, parsing conditions, checking for compatible sources, etc. I think it would result in something where we could swap out the yaml library if we wanted to (most important if we can find one that preserves wrapping/line numbers for context reporting).

It also avoids the ugly-looking .as<type>s which are all over.

test/falco_tests.yaml Show resolved Hide resolved
userspace/engine/falco_common.cpp Show resolved Hide resolved
userspace/engine/falco_engine.cpp Show resolved Hide resolved
userspace/engine/falco_utils.cpp Outdated Show resolved Hide resolved
userspace/engine/rule_loader.cpp Outdated Show resolved Hide resolved
userspace/engine/rule_loader.cpp Show resolved Hide resolved
userspace/engine/rule_loader.cpp Show resolved Hide resolved
}

// todo(jasondellaluce): this breaks string escaping in lists
static bool resolve_list(string& cnd, YAML::Node& list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should try to separate the yaml parsing from the rest of the rule loading more strongly. For example, you could define objects for rules, macros, lists, etc and have functions like this work on those types instead of YAML::Nodes. Might result in slightly cleaner access (all the list["list"].as<string>s would become list.name() or similar). It would also allow future code to work on the objects in the file without having to know what yaml library is being used to represent them.

userspace/engine/rule_loader.cpp Show resolved Hide resolved
userspace/engine/rule_loader.cpp Show resolved Hide resolved
@jasondellaluce
Copy link
Contributor Author

Thanks for the great feedback Mark! I agree with you that we must create some separation between YAML parsing and rules building. Fun fact, this is also the approach I was undertaking when first implementing this (by also splitting the responsibilities between different classes).

The downside of this is that this PR is already "bulky" enough with the single scope of porting the rule loader from Lua to C++, and I quickly realized that it would be impossible to review this PR to spot potential regressions if I went on for that route.

Overall, the goal of this PR is to remove Lua from the codebase and port the rule loader as a 1:1 copy of what we had before. Refactoring the rule loader logic to a more modular design is a must-do IMHO, but I think this should be the goal of a successive PR once we are confident with the changes of this one. WDYT?

cc @leogr

@leogr
Copy link
Member

leogr commented Apr 7, 2022

This PR is already ok for me. I will just have a second look. I agree with Jason to improve the implementation in subsequent PRs and not directly here. It will be easier to review the code change. 👍

@jasondellaluce jasondellaluce changed the title wip: refactor(userspace/engine): remove Lua from Falco and re-implement the rule loader refactor(userspace/engine): remove Lua from Falco and re-implement the rule loader Apr 8, 2022
@jasondellaluce
Copy link
Contributor Author

/milestone 0.32.0

Copy link
Contributor

@mstemm mstemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change! We agreed that a more concrete split between yaml parsing and other parts of rule loading will be in a follow-on PR. Let's make sure it's a relatively soon PR and not the ambiguous "someday" kind of follow-on PR. :)

@poiana poiana added the lgtm label Apr 9, 2022
@poiana
Copy link
Contributor

poiana commented Apr 9, 2022

LGTM label has been added.

Git tree hash: 7694234af3185193ddf3e662c040374c99bb027a

Copy link
Member

@leogr leogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't wait to see this happen. Tears of joy 😭

😃

/approve

@poiana
Copy link
Contributor

poiana commented Apr 11, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jasondellaluce, leogr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jasondellaluce,leogr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 2934ef2 into master Apr 11, 2022
@poiana poiana deleted the refactor/rule-loader branch April 11, 2022 10:22
mstemm added a commit that referenced this pull request Jun 27, 2022
The only use of it was to include in --support output, which is
redundant as the support output already includes the full contents of
each rules file.

Additionally, it wasn't even being updated after the switch from lua
rules loading to c++ rules
loading (#1966 or
surrounding PRs).

This will simplify follow-on changes to add a real "result" to rules
loading methods, as there will be fewer API variants to support.

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit that referenced this pull request Jun 27, 2022
The only use of it was to include in --support output, which is
redundant as the support output already includes the full contents of
each rules file.

Additionally, it wasn't even being updated after the switch from lua
rules loading to c++ rules
loading (#1966 or
surrounding PRs).

This will simplify follow-on changes to add a real "result" to rules
loading methods, as there will be fewer API variants to support.

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit that referenced this pull request Jun 27, 2022
The only use of it was to include in --support output, which is
redundant as the support output already includes the full contents of
each rules file.

Additionally, it wasn't even being updated after the switch from lua
rules loading to c++ rules
loading (#1966 or
surrounding PRs).

This will simplify follow-on changes to add a real "result" to rules
loading methods, as there will be fewer API variants to support.

Signed-off-by: Mark Stemm <[email protected]>
mstemm added a commit that referenced this pull request Jun 27, 2022
…ules APIs

The only use of it was to include in --support output, which is
redundant as the support output already includes the full contents of
each rules file.

Additionally, it wasn't even being updated after the switch from lua
rules loading to c++ rules
loading (#1966 or
surrounding PRs).

This will simplify follow-on changes to add a real "result" to rules
loading methods, as there will be fewer API variants to support.
mstemm added a commit that referenced this pull request Jun 27, 2022
…ules APIs

The only use of it was to include in --support output, which is
redundant as the support output already includes the full contents of
each rules file.

Additionally, it wasn't even being updated after the switch from lua
rules loading to c++ rules
loading (#1966 or
surrounding PRs).

This will simplify follow-on changes to add a real "result" to rules
loading methods, as there will be fewer API variants to support.
mstemm added a commit that referenced this pull request Jun 28, 2022
…ules APIs

The only use of it was to include in --support output, which is
redundant as the support output already includes the full contents of
each rules file.

Additionally, it wasn't even being updated after the switch from lua
rules loading to c++ rules
loading (#1966 or
surrounding PRs).

This will simplify follow-on changes to add a real "result" to rules
loading methods, as there will be fewer API variants to support.
mstemm added a commit that referenced this pull request Jul 25, 2022
The only use of it was to include in --support output, which is
redundant as the support output already includes the full contents of
each rules file.

Additionally, it wasn't even being updated after the switch from lua
rules loading to c++ rules
loading (#1966 or
surrounding PRs).

This will simplify follow-on changes to add a real "result" to rules
loading methods, as there will be fewer API variants to support.

Signed-off-by: Mark Stemm <[email protected]>
poiana pushed a commit that referenced this pull request Jul 25, 2022
The only use of it was to include in --support output, which is
redundant as the support output already includes the full contents of
each rules file.

Additionally, it wasn't even being updated after the switch from lua
rules loading to c++ rules
loading (#1966 or
surrounding PRs).

This will simplify follow-on changes to add a real "result" to rules
loading methods, as there will be fewer API variants to support.

Signed-off-by: Mark Stemm <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants