Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Structured Audit Logging #891

Merged
merged 1 commit into from
Jul 30, 2020
Merged

Add Structured Audit Logging #891

merged 1 commit into from
Jul 30, 2020

Conversation

mrzzy
Copy link
Collaborator

@mrzzy mrzzy commented Jul 18, 2020

Why we need this PR
Feast behavior and usage be opaque due to a lack of consistent logging of:

  • request/response messages handled by Feast Core/Serving.
  • The user identity that was adapted.

Logging in Feast is also inconsistent making it difficult to parsed by third party logging systems.

What this PR does:
Adds Audit Logging to Feast Core and Feast Serving:

  • Added AuditLogger that exposes structured logging methods logMessage(), logAction() etc.
    • Make AuditLogger disable/configurable from Core/Serving's application.yml
    • Log entries produced by AuditLogger are structured JSON and machine parsable.
  • Added AuditLogEntry and subclasses that define the structure of each log entries and provides JSON conversion.
  • Added GrpcMessageInterceptor to intercept incoming/request or outgoing/response and log them to the Audit Log for both Core and Serving Services.
    • full request/response available in JSON audit log.
    • support for displaying authenticated identity (ie authenticated user).

Refactor/Tech debt cleanups:

  • Make JobService perform actions on jobs via JobTask instead of directly with JobManager to be consistent with JobCoordinatorService

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Add Structured Audit Logging
- Audit Log Entries are produced in structured JSON format.
- Audit Logger can be configured from application.yml:
     - feast.logging.audit.enabled - enables and disables audit logging
     - feast.logging.audit.messageEnabled - enables and disables request/response message audit logging.

try {
super.onHalfClose();
} catch (AuthenticationException e) {
// TODO: figure out of catching exception breaks spring security
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@woop Does the current E2E auth tests verify that there are no false positives for authentication? (ie should fail to authenticate due to wrong credentials. )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#892 These tests will fail your authentication if you try to sign in with a JWT that cannot be verified.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer relevant as this was deemed out of scope for this PR.

@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 19, 2020

/retest

import org.apache.logging.log4j.Marker;
import org.apache.logging.log4j.MarkerManager;

@Log4j2
Copy link
Member

@woop woop Jul 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Log4j2 be hardcoded here? #277

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used SLF4J instead.

@Getter
public class AuditLogger {
/** Defines the output formats that the Audit Logger can produce logs in */
public static enum LogFormat {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the format be a part of the logger configuration xml instead of a part of the code base?

Copy link
Collaborator Author

@mrzzy mrzzy Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to expose the ability to configure the logger with the most common use cases via application.yml without interacting with the logging configuration XML as it inaccessible from our Helm and Docker-Compose setup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the response above, I think we can use the application.yml for configuring logging without writing our own logformat configuration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

private Set<LogFormat> formats;

// Configures the directions of messages of which matchingo messages are logged
// are logged to the audit log. Requires message event kind to be enabled in kinds./
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cant make out this comment. What does the direction represent?

Copy link
Collaborator Author

@mrzzy mrzzy Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction represents the flow of messages relative to the service:

  • incoming - matches requests messages
  • outgoing - matches response messages.

Users can use this to configure for which request "direction" message audit logging is enabled. (ie leave in only incoming to enable request logging)

Will change this to request and response as that makes it easier to understand.

* ExceptionInterceptor intercepts exceptions throw when handling GRPC messages. Intercepts uncaught
* exceptions and logs to audit logger and returns the appropriate Grpc statuses.
*/
public class GrpcExceptionInterceptor implements ServerInterceptor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be in a separate PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will do.


// Attempt Extract current authenticated identity.
Authentication authentication = SecurityContextHolder.getContext().getAuthentication();
String identity = (authentication == null) ? "" : authentication.getPrincipal().toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for you? I have been pulling the identity from claims.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

call =
new SimpleForwardingServerCall<ReqT, RespT>(call) {
@Override
public void sendMessage(RespT message) {
Copy link
Member

@woop woop Jul 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to apply to GetOnlineFeatures as well, because in that case we will probably experience a serious performance hit. Audit logging is normally distinct from request/response logging, since with the former you expect durability and consistency but with the latter you want availability and performance. Logging the complete request/response is still useful as part of the audit log, but we need to have a way to deal with the latency sensitive case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AuditLogger can be configured via application.yml via AuditLogProperties and thus can be disabled for the Online Serving instance from application.yml for cases where latency is more important that visibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but we need request/response logging in online serving as well. We just want to make sure that this logging happens in async fashion and doesn't affect latency of the method


/** LogEvent describes a specific event that a log is capturing */
@AutoOneOf(LogEvent.Kind.class)
public abstract class LogEvent {
Copy link
Member

@woop woop Jul 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial reaction when seeing this code is that it seems quite overwhelming. A lot of decisions have been made here in terms of the structure of our audit logging and I am not sure if we are being too ambitious here. I am quite worried that these abstractions will end up changing quite soon once people start trying to use the audit logger. I was hoping we could start with a simpler key/value API.

Did you follow some kind of convention when structuring the audit logger and its associated types, or was everything custom made?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you follow some kind of convention when structuring the audit logger and its associated types, or was everything custom made?

No, Yes.

I was hoping we could start with a simpler key/value API.

Reasons that we should go with a Key Value API.

  • Infinitely more flexible and extensible as no constraints are applied.

Reasons that we should not go with a Key Value API:

  • Java is statically typed. Going with a generic Key Value would mean Map<String, Object> which implies lots of ugly casting.
  • With no enforced structure in place it will be difficult to parse the logs in upstream systems as there is no expected structure.
  • Enforces some rigidity to logging so that we don't break upstream systems unknowing built upon audit logging.

The structured log will be depended on by upstream logging systems and should be treated as a user facing API that should be clearly defined, the same way Core and Serving Service API are clearly defined.

I am quite worried that these abstractions will end up changing quite soon once people start trying to use the audit logger

The important thing is that we are fully aware of changes instead of being blind sighted by a map.put()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is not to throw away all static typing and just accept arbitrary string/object keys and values. Parts of what you have implemented makes sense, but other parts do not.

From a functional perspective what we want is for specific parts of the Feast code base to create a message that can be serialized and printed/sent to the audit log. so having static methods like ofMessage(MessageLogEvent.Kind kind, String method, Message message, String identity) makes sense.

However, I question the value of this LogEvent as a container. It seems like a catchall class that provides little in terms of a contract. When would you have both a MessageLogEvent and ActionLogEvent in the same object? It doesnt seem like that should be possible and so its not clear what the scope of this LogEvent class should be. Will it just grow infinitely as a catchall class that can contain any subclass?

If we agree that maintaining state is a bad thing, then do we need to maintain LogEvents as an object? What is the SLF4J contract that we need to meet. Does it just expect strings?

What prevents us from having an AuditLogger class with static methods like ofMessage that converts a tight contract of objects into a Map or string that can be serialized?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored away LogEvent into MessageAuditLogEntry, ActionAuditLogEntry, TransitionAuditLogEntry subclasses.

@Test
public void shouldReturnStringRepresentationOfAuditLog() {
for (AuditLogEntry auditLog : getTestAuditLogs()) {
System.out.println(auditLog.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this testing? Wouldnt toString() always return something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess I was just trying to exercise the code path. Would remove.

responseObserver.onError(
Status.INTERNAL.withDescription(e.getMessage()).withCause(e).asRuntimeException());
}
GetFeastCoreVersionResponse response =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks much better with the exception handling in the interceptor, but I don't think its in scope for this PR is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, moving out of this PR.

public class CreateJobTask extends JobTask {

public CreateJobTask(Job job, JobManager jobManager, AuditLogger auditLog) {
super(job, jobManager, auditLog);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we add this constructor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally this used the @Builder constructor from lombok and had fields duplicated across all 4 Job Task implementations. I move the fields up to JobTask abstract class however the @builder annotation has problems with inherited fields, hence the added constructor.

import lombok.Getter;
import lombok.Setter;
import lombok.extern.log4j.Log4j2;
import org.apache.logging.log4j.Level;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bothers me a bit that we are pulling in logging dependencies directly into our code. Shouldn't this be abstracted away?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with SL4J.

* name and assumed authenticated identity (if authentication is enabled).
*/
public class GrpcMessageInterceptor implements ServerInterceptor {
private AuditLogger auditLog;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to keep things simpler by using SLF4J for logging instead of having our own AuditLogger? Example: https://github.com/yidongnan/grpc-spring-boot-starter/blob/master/examples/local-grpc-server/src/main/java/net/devh/boot/grpc/examples/local/server/LogGrpcInterceptor.java#L34

I can see some value in having a way to standardize the message format through static methods, especially if we need to have actions/resources/identities, but having to pass around an audit logger when SLF4J is already available seems like an unnecessary overhead to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to keep things simpler by using SLF4J for logging instead of having our own AuditLogger? Example: https://github.com/yidongnan/grpc-spring-boot-starter/blob/master/examples/local-grpc-server/src/main/java/net/devh/boot/grpc/examples/local/server/LogGrpcInterceptor.java#L34

Why we have our own Audit Logger:

  • Allow us to configure the audit logger from application.yml which is valuable as:
    • Allows us to configure the logger via application.yml (ie disable request logging for Online Serving without hard coding something.)
    • We can inject Feast specific elements into the Audit Log (ie Feast Component, Feast Version, or in the future git hash.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows us to configure the logger via application.yml (ie disable request logging for Online Serving without hard coding something.)

What have we implemented that can't be implemented using the normal logging configuration https://howtodoinjava.com/spring-boot2/logging/configure-logging-application-yml/

Also, it seems like your answer focuses on configuration but the AuditLogger is more than just a configuration holder right? We can have custom configuration in the FeastProperties for enabling/disabling request/response logging if needed, but that is a separate question to whether we need to maintain the AuditLogger and pass it around.

We can inject Feast specific elements into the Audit Log

Since we have an application context just floating around inside of Feast, we can easily grab these settings at any time without needing to pass around an AuditLogger right? I am just trying to make sure we arent reinventing something that already exists.

Copy link
Collaborator Author

@mrzzy mrzzy Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated AuditLogger's logging methods to static to remove the need of passing around the AuditLogger instance.

serving/pom.xml Outdated
<artifactId>grpc-spring-boot-starter</artifactId>
<groupId>net.devh</groupId>
<artifactId>grpc-server-spring-boot-starter</artifactId>
<version>${grpc.spring.boot.starter.version}</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to set this? Shouldnt this be set in the parent-pom?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was set as the interceptors do not work properly under the lognet grpc starter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed version part to be set in the parent POM.

@woop
Copy link
Member

woop commented Jul 19, 2020

I think it is unlikely that we will get this PR merged in any time soon in its current state. It's trying to do too much.

  1. New AuditLogger/LogEvents with custom messages
  2. Global exception handler
  3. Request/response logging
  4. Job transition logging
  5. Replacement of exception logging

The MVP we need for audit logging is not much different from what we have in our existing AuditLogger. We need a way to log incoming requests to track user actions. Tracking internal state changes of jobs are not necessary yet, and even in that case I think it can be tracked at the entity level.

I would recommend that you try and separate these into different PRs. The first and most important PR implements a logging interceptor that logs authenticated user requests and the incoming payload using a standardized audit log format (hopefully with SLF4J). This might serve as inspiration https://cloud.google.com/logging/docs/reference/audit/auditlog/rest/Shared.Types/AuditLog

@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 20, 2020

  1. Job transition logging.

This already exists with the current AuditLogger

Next steps:

  • Move exception logging/Global exception handler to a separate PR.
  • Make logging code dependent on SLF4J instead of Log4j2 Standardize logging libraries for Java components #277
  • Make AuditLogger's logging methods static to remove the need of passing it around in code.
  • Refactor away LogEvent into MessageAuditLogEntry, ActionAuditLogEntry, TransitionAuditLogEntry subclasses.
  • Refactor to make sure that AuditLogger's is an facade to audit logging functionality.

@@ -46,7 +47,7 @@
@Getter
@Setter
@Configuration
@ConfigurationProperties(prefix = "feast", ignoreInvalidFields = true)
@ConfigurationProperties(prefix = "feast", ignoreInvalidFields = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted.

@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 27, 2020

/test test-end-to-end-batch

1 similar comment
@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 27, 2020

/test test-end-to-end-batch

@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 27, 2020

/test test-end-to-end-batch-dataflow

private boolean enabled;

// Whether to enable/disable message level (ie request/response) audit logging.
private boolean messageEnabled;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this something more descriptive? enableMessageLogging?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. updated.

@mrzzy mrzzy closed this Jul 28, 2020
@mrzzy mrzzy reopened this Jul 28, 2020
@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 28, 2020

/test test-end-to-end

@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 28, 2020

/test test-end-to-end-redis-cluster

@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 28, 2020

/test test-end-to-end-batch-dataflow

1 similar comment
@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 28, 2020

/test test-end-to-end-batch-dataflow

// Spring runs this constructor when creating the AuditLogger bean,
// which allows us to populate the AuditLogger class with dependencies.
// This allows us to use the dependencies in the AuditLogger's static methods
AuditLogger.properties = loggingProperties.getAudit();
instance = FeastInstance.of(buildProperties.getArtifact(), buildProperties.getVersion());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure why we need to persist with FeastInstance.

@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 30, 2020

/test test-end-to-end-batch-dataflow

@woop
Copy link
Member

woop commented Jul 30, 2020

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrzzy, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 30, 2020

/test test-end-to-end-batch-dataflow

1 similar comment
@mrzzy
Copy link
Collaborator Author

mrzzy commented Jul 30, 2020

/test test-end-to-end-batch-dataflow

@feast-ci-bot feast-ci-bot removed the lgtm label Jul 30, 2020
@feast-ci-bot
Copy link
Collaborator

New changes are detected. LGTM label has been removed.

@woop woop merged commit 8acab49 into feast-dev:master Jul 30, 2020
@mrzzy mrzzy mentioned this pull request Aug 2, 2020
pyalex pushed a commit that referenced this pull request Aug 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants