Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote configuration support #488

Merged
merged 31 commits into from
Aug 16, 2024
Merged

Remote configuration support #488

merged 31 commits into from
Aug 16, 2024

Conversation

bwoebi
Copy link
Contributor

@bwoebi bwoebi commented Jun 14, 2024

Also including Dynamic Configuration, but it's a small part, just 100 lines of code, describing the structures it contains.

As to more specific changes to current code:

  • The sidecar is now per user (on windows it already uses the ConsoleSession), instead of per system.
    • On Windows it was necessary as memory can only easily be shared within a ConsoleSession.
    • On Linux it's now necessary as the notification mechanism for changes to the remote config are distributed via signals, which require either same euid or root-like capabilities.
      Note AsBytes in ddcommon currently requires &'a AsBytes<'a>, while only &AsBytes<'a> (independent lifetime) is required.

General notes:

  • Code in remote-config/ is meant to be reusable and also accessible via FFI as needed. For most users the bare ConfigFetcher should be enough, and for some the SharedFetcher is needed. Fewest should actually need the MultiTargetFetcher. A FFi interface hasn't been defined yet as it wasn't needed:
    • ConfigFetcher in its simplest usage is just one function with two callbacks (store, update) essentially. StoredFile should be a wrapper for a type native to the FFI user (a pointer probably).
    • SharedFetcher is similar, but presents a run-loop. To be used in anyway multi-threaded applications, now three callbacks: store, update and on_fetched.
  • The sidecar will notify participating processes whenever remote config changes.
  • The sidecar uses shared memory to transmit the stuff received by the remote-config stuff.

@bwoebi bwoebi requested review from a team as code owners June 14, 2024 14:53
@bwoebi bwoebi force-pushed the bob/remote-config branch 5 times, most recently from 1618c4c to f92d3f1 Compare June 14, 2024 17:53
@codecov-commenter
Copy link

codecov-commenter commented Jun 14, 2024

Codecov Report

Attention: Patch coverage is 73.92501% with 758 lines in your changes missing coverage. Please review.

Project coverage is 71.83%. Comparing base (dd36a81) to head (22cbb0b).
Report is 101 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #488      +/-   ##
==========================================
+ Coverage   71.17%   71.83%   +0.66%     
==========================================
  Files         220      237      +17     
  Lines       30000    32871    +2871     
==========================================
+ Hits        21351    23612    +2261     
- Misses       8649     9259     +610     
Components Coverage Δ
crashtracker 21.25% <ø> (+0.05%) ⬆️
datadog-alloc 98.73% <ø> (ø)
data-pipeline 50.00% <ø> (ø)
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 82.11% <0.00%> (-0.67%) ⬇️
ddcommon-ffi 68.11% <0.00%> (-1.61%) ⬇️
ddtelemetry 59.02% <ø> (ø)
ipc 84.29% <ø> (+0.10%) ⬆️
profiling 84.26% <ø> (ø)
profiling-ffi 77.42% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 40.47% <54.82%> (+6.43%) ⬆️
sidecar-ffi 0.00% <0.00%> (ø)
spawn-worker 54.87% <ø> (ø)
trace-mini-agent 70.88% <ø> (ø)
trace-normalization 98.25% <ø> (ø)
trace-obfuscation 95.73% <ø> (ø)
trace-protobuf 77.67% <79.24%> (+0.51%) ⬆️
trace-utils 92.97% <ø> (-0.43%) ⬇️

@bwoebi bwoebi force-pushed the bob/remote-config branch 2 times, most recently from e16f00b to cb13b36 Compare June 14, 2024 18:16
remote-config/src/parse.rs Outdated Show resolved Hide resolved
remote-config/src/dynamic_configuration/data.rs Outdated Show resolved Hide resolved
remote-config/src/dynamic_configuration/data.rs Outdated Show resolved Hide resolved
}

#[cfg(feature = "test")]
pub mod tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having the helpers isolated in another module for the sake of clarity when included in another crate but seems a bit odd having a tests module with no actual tests, if you don't plan to add tests to this module it would probably be clearer to name it 'helpers' or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather add a submodule to tests then, like mod tests { mod helpers { ... } } for clarity that it's test-only code

@brettlangdon
Copy link
Member

Can you add some examples to the codebase/comments so example/expected usage is shown in cargo doc for the remote-config library?

@bwoebi bwoebi force-pushed the bob/remote-config branch 4 times, most recently from b89fdee to 95c7998 Compare June 25, 2024 16:11
@bwoebi
Copy link
Contributor Author

bwoebi commented Jun 25, 2024

@brettlangdon Implemented some helper functionality to auto-parse, diff and locally store remote config files.
These all weren't needed for the purposes of the sidecar (sidecar doesn't parse most things, but just writes them to shared memory, no explicit backing storage needed; diffing happened based on shared memory contents in target processes.), but they are probably useful for most implementers.

Please have a look at examples/remote_config_fetch.rs.

@mellon85
Copy link

mellon85 commented Jul 3, 2024

Please split this up in separate PR, nobody can review this much code at once.

Also RC should be in the code owners of the RC implementation like for other tracers

@bwoebi
Copy link
Contributor Author

bwoebi commented Jul 3, 2024

Please consider the files in remote-config individually:
fetch/fetcher.rs as the primary RC client.
fetch/shared.rs for sharing file storage.
fetch/multitarget.rs for multiple targets.
fetch/single.rs for users just needing a single client, packaged in an a bit nicer API. Uses file_change_tracker.rs and file_storage.rs

Then parse.rs to process remote config paths / provide an entry point to the individual products.

I don't intend to split this RC beyond the point where I can do minimal end-to-end testing.

EDIT: Updated CODEOWNERS accordingly.

Signed-off-by: Bob Weinand <[email protected]>
@brettlangdon brettlangdon requested a review from a team July 11, 2024 17:50
Copy link

@mellon85 mellon85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split the review so that it can reviewed

}

#[derive(Default)]
pub struct OpaqueState {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This state is opaque and not exposed outside of RC backends remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementing document says about the opaque_backend_state:

MUST contain the opaque_backend_state field extracted from targets.
This is used to make the backend stateless by saving some opaque lightweight data directly in the agent and tracers.
For tracer clients this is mainly used for accurate tracking of where configurations are sent.

I.e. that I'll have to extract and submit it unmodified with each request?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Dario is trying to capture is that you need to extract the byte sequence representing the opaque_backend_state, but you are not to parse it or attempt to interpret it. Tracers never have to utilize the information contained within it, and the RC backend is free to store whatever it wants inside with no contract established with tracers. As a result storing it as simply a Vec is all you should do.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, ok I think I see now. client_state in here refers to the field targets.signed.custom.opaque_backend_state extracted from the signed targets. The other data in here is what you are using to track the current state to build the next fetch request. I think you should rename client_state to opaque_backend_state and the struct itself should not be OpaqueState and instead be ClientState. On initial glance it looks like you're trying to parse the opaque state and store it in this struct, but that seems to be not the case.

@mellon85 mellon85 self-requested a review July 16, 2024 10:07
And avoid computing the RemoteConfigPath string with every HashMap operation, but do some rust magic so that it will consider owned RemoteConfigPaths and unowned RemoteConfigPathRefs equivalent.

Signed-off-by: Bob Weinand <[email protected]>
Signed-off-by: Bob Weinand <[email protected]>
Copy link

@ameske ameske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core loop looks good to me, aside from a few small minor comments. (and one question, that might have actually uncovered a gap in our system tests that RC should address)

I do want to go over the multi target setup more though. Is there a design doc on how you are handling this setup at a high-level? I think that would really aid my review, and also any future maintenance that must be done on this client. If there isn't such a doc, we can meet and talk about this but I think we want to get something like this on file just in case there are any needs to collaborate with RC on debugging as this is a new concept being introduced.

}

#[derive(Default)]
pub struct OpaqueState {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what Dario is trying to capture is that you need to extract the byte sequence representing the opaque_backend_state, but you are not to parse it or attempt to interpret it. Tracers never have to utilize the information contained within it, and the RC backend is free to store whatever it wants inside with no contract established with tracers. As a result storing it as simply a Vec is all you should do.

remote-config/src/fetch/fetcher.rs Outdated Show resolved Hide resolved
}

#[derive(Default)]
pub struct OpaqueState {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, ok I think I see now. client_state in here refers to the field targets.signed.custom.opaque_backend_state extracted from the signed targets. The other data in here is what you are using to track the current state to build the next fetch request. I think you should rename client_state to opaque_backend_state and the struct itself should not be OpaqueState and instead be ClientState. On initial glance it looks like you're trying to parse the opaque state and store it in this struct, but that seems to be not the case.

let computed_hash = hasher(decoded.as_slice());
if hash != computed_hash {
warn!("Computed hash of file {computed_hash} did not match remote config targets file hash {hash} for path {path}: file: {}", String::from_utf8_lossy(decoded.as_slice()));
continue;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had a system test for this, which you hooked up today and passed, but it looks like if a computed hash doesn't match the loop just continues - when an invalid file should fail the whole payload with an error. Just for my own comfort, where is that handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make the whole thing an error too, sure.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking over the system tests and I actually don't think have a test for this 👀.

In theory this should never happen unless somebody is manipulating data in the middle. If it was happening due to an RC bug we have a catastrophic error in our system. (it's likely to be caught well before it gets to the tracer)

Since we don't have a system test, we can't really hold tracers to any implementation at this time. I think I should add one and we should get convergence from tracers - my recommendation is to follow the RFC and the behavior we will eventually enforce via tests, and if one hash is bad fail the entire update so that this gets fast reported up the chain. (we have logs for client failures and monitoring for major issues like this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promoted to error.

Signed-off-by: Bob Weinand <[email protected]>
fetcher: ConfigFetcher::new(sink, Arc::new(ConfigFetcherState::new(invariants))),
target: Arc::new(target),
runtime_id,
config_id: uuid::Uuid::new_v4().to_string(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this config_id represent?

A given client doesn't know about its config ID's ahead of time, and that can constantly be changing depending on configurations being created or deleted by users.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is used as the client_id in the fetch logic, so I think you want to rename this for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to client_id.

pub client_id: String,
cancellation: CancellationToken,
/// Interval used if the remote server does not specify a refetch interval, in nanoseconds.
pub default_interval: AtomicU64,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the review of the primary RFC loop - that info is for agents and not tracer clients so this can remain constant. (I believe most tracers poll at 1s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 488 to 497
if endpoint.api_key.is_some() {
if parts.scheme.is_none() {
parts.scheme = Some(Scheme::HTTPS);
parts.authority = Some(
format!("{}.{}", subdomain, parts.authority.unwrap())
.parse()
.unwrap(),
);
}
parts.path_and_query = Some(PathAndQuery::from_static("/api/v0.1/configurations"));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's ever a use case where a tracer client should be talking to /api/v0.1/configurations as that's the RC backend. The RC backend is not designed to talk directly to tracers and only to what we call the "core remote config service" running in the agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sort of unused, just makes the code do something valid at this place, the actual run loop has if self.state.endpoint.api_key.is_some() { return Err() }.

I don't intend to use it without thorough discussion with RC team.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no plans to allow tracers to talk directly to the remote config backend, and the payload is fundamentally different so you wouldn't even be able to handle the payload with the current code. This needs to be removed.

Copy link
Contributor Author

@bwoebi bwoebi Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever, it's not doing anything, but removed.

@ameske ameske self-requested a review August 12, 2024 13:42
@mellon85 mellon85 dismissed their stale review August 12, 2024 13:55

Kyle reviewed it

Copy link

@ameske ameske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context for review: Remote Config platform does not own and maintain tracer clients, but we do own the contract between the agent and tracer clients as defined by the [Tracer Client RFC].(https://docs.google.com/document/d/1u_G7TOr8wJX0dOM_zUDKuRJgxoJU_hVTd5SeaMucQUs/edit?usp=sharing) My review focused on whether the code conformed with the RFC.

I have reviewed in detail the single fetcher use case for the RFC RC owns (and requested and received confirmation of it passing our system tests for the RFC, thanks for doing that! 🥳 ). I also reviewed the multi-fetcher use case at a high level, primarily how it layers everything together on top of the aforementioned reviewed single fetcher.

@bwoebi bwoebi merged commit 31b6854 into main Aug 16, 2024
34 checks passed
@bwoebi bwoebi deleted the bob/remote-config branch August 16, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants