Remote configuration support #488

bwoebi · 2024-06-14T14:53:03Z

Also including Dynamic Configuration, but it's a small part, just 100 lines of code, describing the structures it contains.

As to more specific changes to current code:

The sidecar is now per user (on windows it already uses the ConsoleSession), instead of per system.
- On Windows it was necessary as memory can only easily be shared within a ConsoleSession.
- On Linux it's now necessary as the notification mechanism for changes to the remote config are distributed via signals, which require either same euid or root-like capabilities.
  Note AsBytes in ddcommon currently requires &'a AsBytes<'a>, while only &AsBytes<'a> (independent lifetime) is required.

General notes:

Code in remote-config/ is meant to be reusable and also accessible via FFI as needed. For most users the bare ConfigFetcher should be enough, and for some the SharedFetcher is needed. Fewest should actually need the MultiTargetFetcher. A FFi interface hasn't been defined yet as it wasn't needed:
- ConfigFetcher in its simplest usage is just one function with two callbacks (store, update) essentially. StoredFile should be a wrapper for a type native to the FFI user (a pointer probably).
- SharedFetcher is similar, but presents a run-loop. To be used in anyway multi-threaded applications, now three callbacks: store, update and on_fetched.
The sidecar will notify participating processes whenever remote config changes.
The sidecar uses shared memory to transmit the stuff received by the remote-config stuff.

codecov-commenter · 2024-06-14T18:05:00Z

Codecov Report

Attention: Patch coverage is 73.92501% with 758 lines in your changes missing coverage. Please review.

Project coverage is 71.83%. Comparing base (dd36a81) to head (22cbb0b).
Report is 101 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #488      +/-   ##
==========================================
+ Coverage   71.17%   71.83%   +0.66%     
==========================================
  Files         220      237      +17     
  Lines       30000    32871    +2871     
==========================================
+ Hits        21351    23612    +2261     
- Misses       8649     9259     +610

Components	Coverage Δ
crashtracker	`21.25% <ø> (+0.05%)`	⬆️
datadog-alloc	`98.73% <ø> (ø)`
data-pipeline	`50.00% <ø> (ø)`
data-pipeline-ffi	`0.00% <ø> (ø)`
ddcommon	`82.11% <0.00%> (-0.67%)`	⬇️
ddcommon-ffi	`68.11% <0.00%> (-1.61%)`	⬇️
ddtelemetry	`59.02% <ø> (ø)`
ipc	`84.29% <ø> (+0.10%)`	⬆️
profiling	`84.26% <ø> (ø)`
profiling-ffi	`77.42% <ø> (ø)`
serverless	`0.00% <ø> (ø)`
sidecar	`40.47% <54.82%> (+6.43%)`	⬆️
sidecar-ffi	`0.00% <0.00%> (ø)`
spawn-worker	`54.87% <ø> (ø)`
trace-mini-agent	`70.88% <ø> (ø)`
trace-normalization	`98.25% <ø> (ø)`
trace-obfuscation	`95.73% <ø> (ø)`
trace-protobuf	`77.67% <79.24%> (+0.51%)`	⬆️
trace-utils	`92.97% <ø> (-0.43%)`	⬇️

remote-config/src/parse.rs

remote-config/src/dynamic_configuration/data.rs

hoolioh · 2024-06-21T10:10:27Z

remote-config/src/dynamic_configuration/data.rs

+}
+
+#[cfg(feature = "test")]
+pub mod tests {


I like the idea of having the helpers isolated in another module for the sake of clarity when included in another crate but seems a bit odd having a tests module with no actual tests, if you don't plan to add tests to this module it would probably be clearer to name it 'helpers' or something like that.

I'd rather add a submodule to tests then, like mod tests { mod helpers { ... } } for clarity that it's test-only code

brettlangdon · 2024-06-24T12:45:08Z

Can you add some examples to the codebase/comments so example/expected usage is shown in cargo doc for the remote-config library?

bwoebi · 2024-06-25T16:18:46Z

@brettlangdon Implemented some helper functionality to auto-parse, diff and locally store remote config files.
These all weren't needed for the purposes of the sidecar (sidecar doesn't parse most things, but just writes them to shared memory, no explicit backing storage needed; diffing happened based on shared memory contents in target processes.), but they are probably useful for most implementers.

Please have a look at examples/remote_config_fetch.rs.

mellon85 · 2024-07-03T14:49:11Z

Please split this up in separate PR, nobody can review this much code at once.

Also RC should be in the code owners of the RC implementation like for other tracers

remote-config/examples/remote_config_fetch.rs

bwoebi · 2024-07-03T15:03:07Z

Please consider the files in remote-config individually:
fetch/fetcher.rs as the primary RC client.
fetch/shared.rs for sharing file storage.
fetch/multitarget.rs for multiple targets.
fetch/single.rs for users just needing a single client, packaged in an a bit nicer API. Uses file_change_tracker.rs and file_storage.rs

Then parse.rs to process remote config paths / provide an entry point to the individual products.

I don't intend to split this RC beyond the point where I can do minimal end-to-end testing.

EDIT: Updated CODEOWNERS accordingly.

Signed-off-by: Bob Weinand <[email protected]>

mellon85

Please split the review so that it can reviewed

mellon85 · 2024-07-12T09:45:18Z

remote-config/src/fetch/fetcher.rs

+}
+
+#[derive(Default)]
+pub struct OpaqueState {


This state is opaque and not exposed outside of RC backends remove it.

The implementing document says about the opaque_backend_state:

MUST contain the opaque_backend_state field extracted from targets.
This is used to make the backend stateless by saving some opaque lightweight data directly in the agent and tracers.
For tracer clients this is mainly used for accurate tracking of where configurations are sent.

I.e. that I'll have to extract and submit it unmodified with each request?

I think what Dario is trying to capture is that you need to extract the byte sequence representing the opaque_backend_state, but you are not to parse it or attempt to interpret it. Tracers never have to utilize the information contained within it, and the RC backend is free to store whatever it wants inside with no contract established with tracers. As a result storing it as simply a Vec is all you should do.

Ahhh, ok I think I see now. client_state in here refers to the field targets.signed.custom.opaque_backend_state extracted from the signed targets. The other data in here is what you are using to track the current state to build the next fetch request. I think you should rename client_state to opaque_backend_state and the struct itself should not be OpaqueState and instead be ClientState. On initial glance it looks like you're trying to parse the opaque state and store it in this struct, but that seems to be not the case.

Signed-off-by: Bob Weinand <[email protected]>

And avoid computing the RemoteConfigPath string with every HashMap operation, but do some rust magic so that it will consider owned RemoteConfigPaths and unowned RemoteConfigPathRefs equivalent. Signed-off-by: Bob Weinand <[email protected]>

…config Signed-off-by: Bob Weinand <[email protected]>

Signed-off-by: Bob Weinand <[email protected]>

ameske

The core loop looks good to me, aside from a few small minor comments. (and one question, that might have actually uncovered a gap in our system tests that RC should address)

I do want to go over the multi target setup more though. Is there a design doc on how you are handling this setup at a high-level? I think that would really aid my review, and also any future maintenance that must be done on this client. If there isn't such a doc, we can meet and talk about this but I think we want to get something like this on file just in case there are any needs to collaborate with RC on debugging as this is a new concept being introduced.

ameske · 2024-08-01T18:10:16Z

remote-config/src/fetch/fetcher.rs

+}
+
+#[derive(Default)]
+pub struct OpaqueState {


I think what Dario is trying to capture is that you need to extract the byte sequence representing the opaque_backend_state, but you are not to parse it or attempt to interpret it. Tracers never have to utilize the information contained within it, and the RC backend is free to store whatever it wants inside with no contract established with tracers. As a result storing it as simply a Vec is all you should do.

remote-config/src/fetch/fetcher.rs

ameske · 2024-08-02T20:47:00Z

remote-config/src/fetch/fetcher.rs

+}
+
+#[derive(Default)]
+pub struct OpaqueState {


Ahhh, ok I think I see now. client_state in here refers to the field targets.signed.custom.opaque_backend_state extracted from the signed targets. The other data in here is what you are using to track the current state to build the next fetch request. I think you should rename client_state to opaque_backend_state and the struct itself should not be OpaqueState and instead be ClientState. On initial glance it looks like you're trying to parse the opaque state and store it in this struct, but that seems to be not the case.

ameske · 2024-08-02T20:54:26Z

remote-config/src/fetch/fetcher.rs

+                    let computed_hash = hasher(decoded.as_slice());
+                    if hash != computed_hash {
+                        warn!("Computed hash of file {computed_hash} did not match remote config targets file hash {hash} for path {path}: file: {}", String::from_utf8_lossy(decoded.as_slice()));
+                        continue;


I thought we had a system test for this, which you hooked up today and passed, but it looks like if a computed hash doesn't match the loop just continues - when an invalid file should fail the whole payload with an error. Just for my own comfort, where is that handled?

I can make the whole thing an error too, sure.

I'm looking over the system tests and I actually don't think have a test for this 👀.

In theory this should never happen unless somebody is manipulating data in the middle. If it was happening due to an RC bug we have a catastrophic error in our system. (it's likely to be caught well before it gets to the tracer)

Since we don't have a system test, we can't really hold tracers to any implementation at this time. I think I should add one and we should get convergence from tracers - my recommendation is to follow the RFC and the behavior we will eventually enforce via tests, and if one hash is bad fail the entire update so that this gets fast reported up the chain. (we have logs for client failures and monitoring for major issues like this)

Promoted to error.

Signed-off-by: Bob Weinand <[email protected]>

ameske · 2024-08-07T18:24:44Z

remote-config/src/fetch/single.rs

+            fetcher: ConfigFetcher::new(sink, Arc::new(ConfigFetcherState::new(invariants))),
+            target: Arc::new(target),
+            runtime_id,
+            config_id: uuid::Uuid::new_v4().to_string(),


What does this config_id represent?

A given client doesn't know about its config ID's ahead of time, and that can constantly be changing depending on configurations being created or deleted by users.

It looks like this is used as the client_id in the fetch logic, so I think you want to rename this for clarity.

Renamed to client_id.

ameske · 2024-08-07T18:33:36Z

remote-config/src/fetch/shared.rs

+    pub client_id: String,
+    cancellation: CancellationToken,
+    /// Interval used if the remote server does not specify a refetch interval, in nanoseconds.
+    pub default_interval: AtomicU64,


As mentioned in the review of the primary RFC loop - that info is for agents and not tracer clients so this can remain constant. (I believe most tracers poll at 1s)

ameske · 2024-08-07T20:56:22Z

remote-config/src/fetch/fetcher.rs

+    if endpoint.api_key.is_some() {
+        if parts.scheme.is_none() {
+            parts.scheme = Some(Scheme::HTTPS);
+            parts.authority = Some(
+                format!("{}.{}", subdomain, parts.authority.unwrap())
+                    .parse()
+                    .unwrap(),
+            );
+        }
+        parts.path_and_query = Some(PathAndQuery::from_static("/api/v0.1/configurations"));


I don't think there's ever a use case where a tracer client should be talking to /api/v0.1/configurations as that's the RC backend. The RC backend is not designed to talk directly to tracers and only to what we call the "core remote config service" running in the agent.

It's sort of unused, just makes the code do something valid at this place, the actual run loop has if self.state.endpoint.api_key.is_some() { return Err() }.

I don't intend to use it without thorough discussion with RC team.

We have no plans to allow tracers to talk directly to the remote config backend, and the payload is fundamentally different so you wouldn't even be able to handle the payload with the current code. This needs to be removed.

Whatever, it's not doing anything, but removed.

Signed-off-by: Bob Weinand <[email protected]>

…config

Kyle reviewed it

ameske

Context for review: Remote Config platform does not own and maintain tracer clients, but we do own the contract between the agent and tracer clients as defined by the [Tracer Client RFC].(https://docs.google.com/document/d/1u_G7TOr8wJX0dOM_zUDKuRJgxoJU_hVTd5SeaMucQUs/edit?usp=sharing) My review focused on whether the code conformed with the RFC.

I have reviewed in detail the single fetcher use case for the RFC RC owns (and requested and received confirmation of it passing our system tests for the RFC, thanks for doing that! 🥳 ). I also reviewed the multi-fetcher use case at a high level, primarily how it layers everything together on top of the aforementioned reviewed single fetcher.

…config

…/remote-config

bwoebi requested review from a team as code owners June 14, 2024 14:53

github-actions bot added mini-agent sidecar common labels Jun 14, 2024

bwoebi force-pushed the bob/remote-config branch 5 times, most recently from 1618c4c to f92d3f1 Compare June 14, 2024 17:53

bwoebi force-pushed the bob/remote-config branch 2 times, most recently from e16f00b to cb13b36 Compare June 14, 2024 18:16

bwoebi mentioned this pull request Jun 19, 2024

Add Live debugging & Remote Configuration #398

Closed

10 tasks

iamluc reviewed Jun 21, 2024

View reviewed changes

remote-config/src/parse.rs Outdated Show resolved Hide resolved

hoolioh reviewed Jun 24, 2024

View reviewed changes

bwoebi force-pushed the bob/remote-config branch 4 times, most recently from b89fdee to 95c7998 Compare June 25, 2024 16:11

bwoebi force-pushed the bob/remote-config branch from b82b2a8 to 7472b3a Compare June 27, 2024 18:27

mellon85 reviewed Jul 3, 2024

View reviewed changes

remote-config/examples/remote_config_fetch.rs Show resolved Hide resolved

bwoebi force-pushed the bob/remote-config branch from 7472b3a to d9a4927 Compare July 8, 2024 15:42

bwoebi force-pushed the bob/remote-config branch from 40e1135 to 7515252 Compare July 10, 2024 20:59

Set version on remote config

59fd7d5

Signed-off-by: Bob Weinand <[email protected]>

brettlangdon requested a review from a team July 11, 2024 17:50

mellon85 previously requested changes Jul 12, 2024

View reviewed changes

Increase log-level of remote config received message

5af589c

Signed-off-by: Bob Weinand <[email protected]>

mellon85 self-requested a review July 16, 2024 10:07

Allow setting config apply state explicitly

1f50100

And avoid computing the RemoteConfigPath string with every HashMap operation, but do some rust magic so that it will consider owned RemoteConfigPaths and unowned RemoteConfigPathRefs equivalent. Signed-off-by: Bob Weinand <[email protected]>

bwoebi force-pushed the bob/remote-config branch from 4c4ca10 to 1f50100 Compare July 17, 2024 16:57

Merge branch 'main' of github.com:DataDog/libdatadog into bob/remote-…

a52f0d1

…config Signed-off-by: Bob Weinand <[email protected]>

bwoebi force-pushed the bob/remote-config branch from 733ddfa to d297ee7 Compare July 19, 2024 15:18

Implement timeouts

37a4fea

Signed-off-by: Bob Weinand <[email protected]>

bwoebi force-pushed the bob/remote-config branch from d297ee7 to 37a4fea Compare July 19, 2024 18:29

bwoebi added 2 commits August 2, 2024 11:55

Do not submit client_agent ever

9e3ef20

Signed-off-by: Bob Weinand <[email protected]>

Fully reject a partially invalid RC payload

a8ebc2e

Signed-off-by: Bob Weinand <[email protected]>

ameske reviewed Aug 2, 2024

View reviewed changes

Adjust comment slightly

60ca0f6

Signed-off-by: Bob Weinand <[email protected]>

ameske reviewed Aug 7, 2024

View reviewed changes

bwoebi added 2 commits August 9, 2024 23:07

Apply suggestions from code review

1916a6b

Signed-off-by: Bob Weinand <[email protected]>

Merge branch 'main' of github.com:DataDog/libdatadog into bob/remote-…

08fcbdd

…config

ameske self-requested a review August 12, 2024 13:42

ameske approved these changes Aug 12, 2024

View reviewed changes

bwoebi and others added 2 commits August 13, 2024 09:16

Merge branch 'main' of github.com:DataDog/libdatadog into bob/remote-…

9740a6a

…config

Fix windows build

da37f2e

bwoebi force-pushed the bob/remote-config branch from 993e00d to da37f2e Compare August 16, 2024 11:35

Merge branch 'main' of https://github.com/DataDog/libdatadog into bob…

22cbb0b

…/remote-config

bwoebi merged commit 31b6854 into main Aug 16, 2024
34 checks passed

bwoebi deleted the bob/remote-config branch August 16, 2024 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote configuration support #488

Remote configuration support #488

bwoebi commented Jun 14, 2024 •

edited

Loading

codecov-commenter commented Jun 14, 2024 •

edited

Loading

hoolioh Jun 21, 2024

bwoebi Jun 24, 2024

brettlangdon commented Jun 24, 2024

bwoebi commented Jun 25, 2024

mellon85 commented Jul 3, 2024 •

edited

Loading

bwoebi commented Jul 3, 2024 •

edited

Loading

mellon85 left a comment

mellon85 Jul 12, 2024

bwoebi Jul 12, 2024

ameske Aug 1, 2024

ameske Aug 2, 2024

ameske left a comment

ameske Aug 1, 2024

ameske Aug 2, 2024

ameske Aug 2, 2024

bwoebi Aug 5, 2024

ameske Aug 5, 2024

bwoebi Aug 9, 2024

ameske Aug 7, 2024

ameske Aug 7, 2024

bwoebi Aug 9, 2024

ameske Aug 7, 2024

bwoebi Aug 9, 2024

ameske Aug 7, 2024

bwoebi Aug 8, 2024

ameske Aug 8, 2024

bwoebi Aug 9, 2024 •

edited

Loading

ameske left a comment

Remote configuration support #488

Remote configuration support #488

Conversation

bwoebi commented Jun 14, 2024 • edited Loading

codecov-commenter commented Jun 14, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brettlangdon commented Jun 24, 2024

bwoebi commented Jun 25, 2024

mellon85 commented Jul 3, 2024 • edited Loading

bwoebi commented Jul 3, 2024 • edited Loading

mellon85 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ameske left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwoebi Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

ameske left a comment

Choose a reason for hiding this comment

bwoebi commented Jun 14, 2024 •

edited

Loading

codecov-commenter commented Jun 14, 2024 •

edited

Loading

mellon85 commented Jul 3, 2024 •

edited

Loading

bwoebi commented Jul 3, 2024 •

edited

Loading

bwoebi Aug 9, 2024 •

edited

Loading