Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config changes for UI metrics #8694

Merged
merged 6 commits into from
Oct 1, 2020
Merged

Add config changes for UI metrics #8694

merged 6 commits into from
Oct 1, 2020

Conversation

banks
Copy link
Member

@banks banks commented Sep 16, 2020

This PR adds new configuration for the UI to support up-coming work on integrating metrics. None of the new configurations do anything yet this is just the wiring for them. Building the features will come in subsequent PRs.

Following the checklist in /contributing:

Adding a Simple Config Field for Client Agents

  • Add the field to the Config struct (or an appropriate sub-struct) in
    agent/config/config.go.
  • Add the field to the actual RuntimeConfig struct in
    agent/config/runtime.go.
  • Add an appropriate parser/setter in agent/config/builder.go to
    translate.
  • Add the new field with a random value to both the JSON and HCL blobs in
    TestFullConfig in agent/config/runtime_test.go, it should fail now, then
    add the same random value to the expected struct in that test so it passes
    again.
  • Add the new field and it's default value to TestSanitize in the same
    file. (Running the test first gives you a nice diff which can save working
    out where etc.)
  • If your new config field needed some validation as it's only valid in
    some cases or with some values (often true).
    • Add validation to Validate in agent/config/builder.go.
    • Add a test case to the table test TestConfigFlagsAndEdgeCases in
      agent/config/runtime_test.go.
  • If your new config field needs a non-zero-value default.
    • Add that to DefaultSource in agent/config/defaults.go.
    • Add a test case to the table test TestConfigFlagsAndEdgeCases in
      agent/config/runtime_test.go.
  • If your config should take effect on a reload/HUP.
    • Add necessary code to to trigger a safe (locked or atomic) update to
      any state the feature needs changing. This needs to be added to one or
      more of the following places:
      • ReloadConfig in agent/agent.go if it needs to affect the local
        client state or another client agent component.
      • ReloadConfig in agent/consul/client.go if it needs to affect
        state for client agent's RPC client.
    • Add a test to agent/agent_test.go similar to others with prefix
      TestAgent_reloadConfig*.
  • Add documentation to website/source/docs/agent/options.html.md.

TODO

As can be seen above I've not yet added the documentation for these new configurations. I intend to do that in a later PR once more of the feature is working and we can be more accurate about the actual effect in case any of it doesn't work out as planned!

@banks banks requested a review from a team September 16, 2020 11:06
@banks banks added this to the 1.9 milestone Sep 16, 2020
agent/http.go Outdated Show resolved Hide resolved
agent/agent.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions about how these new fields are used, and some suggestions for improving the tests. Our test suite is already quite slow, and if we want to make room for more tests I think we need to be careful about what we are running in each test.

agent/agent.go Outdated Show resolved Hide resolved
agent/agent.go Outdated Show resolved Hide resolved
agent/agent.go Outdated Show resolved Hide resolved
agent/config/builder.go Show resolved Hide resolved
Comment on lines 3503 to 3504
a := NewTestAgent(t, hcl)
defer a.Shutdown()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very expensive test for what is essentially testing that an atomic works. Do we really need a running agent to be able to test this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 This is really a test of the plumbing and glue code around reloading. It follows the pattern of the other reload tests. IIRC for a long time we didn't have tests for much of this stuff and often had bugs where code had been added to a reload method but some part of the plumbing missed.

Sadly since the agent isn't more modular it's pretty hard to test any of the plumbing around one without starting one.

I'm with you on wanting to make tests faster and reduce the number of times we spin up a full agent but that seems like something we need to tackle by actually breaking up the agent rather than just not testing new stuff we do add to it?

Do we really need a running agent to be able to test this?

Maybe not but if we hacked together an agent struct bypassing all the normal startup code and called an internal method to reload a pointer... then it would be only a test of whether the atomic.Value works as expected! If we had a way to start an agent using the normal code paths but skipping some expensive elements like not starting serf or raft maybe that might work?

Overall though, as discussed above I'd rather that this config reload plumbing was all encapsulated in a separate package and not on the agent itself but that's out of scope here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 - this comment made me realise that this test was actually failing...

Because I'd forgotten to actually add the Store in the reloadConfigInternal method!

Perhaps there is a cheaper way if we change the reload logic a little like you propose in #8623 since then reloaders are not relying on any other state in the agent to be a certain way and we can have a cheaper test that just ensure s all the expected reloaders have indeed been registered after a "normal" startup?

Plumbing tests always seem wasteful since the code is trivial unless it's not there!

Comment on lines +1412 to +1418
// NOTE: Never read from this field directly once the agent has started up
// since the UI config is reloadable. The on in the agent's config field may
// be out of date. Use the agent.getUIConfig() method to get the latest config
// in a thread-safe way.
//
// hcl: ui_config { ... }
UIConfig UIConfig
Copy link
Contributor

@dnephin dnephin Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime reference on the agent is updated on a reload, so wouldn't it be safe to read it from this field as well?

Edit: I thought it was supposed to be updated, but maybe I'm wrong about that?

Copy link
Contributor

@dnephin dnephin Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am wrong about the main config reference. The current ReloadConfig is a bit confusing because it generates a new consul.Config based on a.config, which is not updated... It's not clear to me if delegate.ReloadConfig actually does anything. It seems like all it does is pass the same original config to the Client and Server. Maybe I am still missing something.

edit: Ah, it's because loadLimits updates the a.config. That is super not obvious. I'll look at making this a little more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned this up a bit in #8696

Comment on lines 1222 to 1228
a := NewTestAgent(t, `
ui_config {
enabled = true
metrics_provider_options_json = "{\"foo\": 1}"
}
`)
defer a.Shutdown()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a full running agent to test the HTTPServer. I think this test could be a lot less expensive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it could for sure.

Our current testing methodology in this package with a full agent is a know area we'd like to improve but that seems out of scope for a feature PR.

The HTTPServer is very limited but most of the methods in it call into the agent state and similar so decoupling the server from the agent state is not trivial. Even in cases like this where we could fake it, I'm not very confident that provides a very robust test. If we had a more holistic refactor - for example make the server only have access to the (reloadable) config from the agent and specific interfaces where it needs access to RPCs and Serf, then we could make tests cheaper without making them brittle but that seems out of scope and fix all the tests not just this one new one!

@dnephin dnephin added theme/config Relating to Consul Agent configuration, including reloading theme/ui Anything related to the UI labels Sep 16, 2020
// ensures we get an object.
var dummyMap map[string]interface{}
err := json.Unmarshal([]byte(rt.UIConfig.MetricsProviderOptionsJSON),
&dummyMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@banks banks Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tend to agree with you. When I started here it was tribal knowledge that there is a preference in our projects to limit lines to 80 chars max and I was asked to stick to this in several PRs.

I suggested at the time that if this is important we codify it as it's not a general Go community thing but that document still doesn't exist. I don't know how widespread the practice is but since I setup those line length guides in my editor I still use them.

I'd be happy if the consensus has moved on from that - I know it's not rigorously adhered to anyway. Not sure if anyone still has strong opinions either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBF this one was super border line anyway so I just fixed it - obviously more readable on one line. 😄

Comment on lines 1149 to 1152
// Attempt to parse the JSON to ensure it's valid, parsing into a map
// ensures we get an object.
var dummyMap map[string]interface{}
err := json.Unmarshal([]byte(rt.UIConfig.MetricsProviderOptionsJSON),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We weren't happy with inlined JSON for connect escape hatches. Is this a similar idea? should we make this opaque config instead and serialize it to the UI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this during RFC and I'm OK with this as it is for now - it's a highly experimental interface. Somewhat unlike the escape hatches where the JSON is protobuf we decode and merge, this is literally passed through to the front end which passes it directly back to the custom code the operator is using in JavaScript as a parameter. The decode here is just to try to ensure valid formatting so it doesn't just silently break the whole UI if there's a typo.

The UX of writing explicit JSON for a value that is passed to your javascript code doesn't seem as bad to me as writing raw JSON encoded proto3 Envoy config. The complexities of conversions between HCL and JSON seem to potentially make the UX more confusing?

That said, I'm open to iterating on this if it proves its value but I'm not sure the extra complexity of opaque config makes sense for now either from UX or from a "getting an MVP out and seeing if it's useful" perspective?

Comment on lines 1170 to 1180
return fmt.Errorf("ui_config.dashboard_url_templates key names can only "+
"contain lowercase alphanumeric or _ characters. received: %q", k)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated a couple times. Instead of sharing just the regex, possibly expose the functionality with a validateBasicName(...) so that others can re-use it without the same duplication?

@banks
Copy link
Member Author

banks commented Sep 17, 2020

Thanks @rboyer and @dnephin for the reviews!

I think I fixed up a few things including one of the new tests actually failing!

I've replied inline to a few comments - I think I agree with all of them at some level but several I'm not sure I see a good solution for in this PR without blowing out the scope. Happy to chat more about that though! I've generally followed an existing convention for most parts of this PR, plenty of room for improvement overall but I think to do so substantially would move into major refactor territory.

@banks
Copy link
Member Author

banks commented Sep 23, 2020

@dnephin I've refactored this a bunch following your lead on ReloadConfig. I think it's much better although I hit two plumbing issues that are not ideal but I couldn't afford to do a lot more refactoring to fix just now:

  1. I had to memoize HTTPHandlers.handlers since some components setup there are now stateful (i.e. have reloadable state) which means that having every test run against a different instance wasn't really OK.

  2. On the same lines, it took me a long time to track down why agent package reload tests were failing even after fixing 1 above though the uiserver reload works fine. It turns out that as part of agent: shutdown if the http server goroutine exits #8234 you moved TestAgent.srv to be a different instance rather than one of the actual instances of the agent's Server/handlers. That makes total sense but is broken now that the HTTPHandlers is somewhat stateful. Just plumbing that separate instance of HTTPHandlers into reloading in TestAgent would be possible but then it would be impossible to verify in a test that the actual agent does in fact reload http server state correctly as part of the reloading code paths would be only in test agent code. I opted for now for an unpleasant option - re-introducing an explicit reference to (one of) the HTTPHandlers. I think this is least bad for now and can go away when we do actually get around to moving apiServers to be an injected dep of Agent since the test agent will be able to inject it's own instance then. Does that seem good to you? I'm keen not to scope creep this refactor any more as it's on the critical path for some features that should be complete in O(week).

{
// TODO: is this really what we want? It's what we've always done but
// seems a bit odd to not do an actual 301 but instead serve the
// index.html from every path... It also breaks the UI probably.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is intentional because the UI is a single-page app that parses the path to decide what to render, like when navigating to http://localhost:8500/ui/dc1/services we don't actually want to redirect, but rather serve the UI code and let JavaScript control rendering the dc1 services "page".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. I'll remove the TODO.

Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question about the error handling

agent/uiserver/uiserver.go Outdated Show resolved Hide resolved
@@ -55,8 +55,6 @@ func (s *HTTPHandlers) rewordUnknownEnterpriseFieldError(err error) error {
return err
}

func (s *HTTPHandlers) addEnterpriseUIENVVars(_ map[string]interface{}) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to clean up this line's counterpart.

agent/http_test.go Outdated Show resolved Hide resolved
@@ -15,7 +15,7 @@
// pkg/web_ui/assets/codemirror/mode/ruby/ruby-ea43ca3a3bdd63a52811e8464d66134b.js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need to also edit the CI configs in oss or ent, as this file is specially handled in merges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.circleci/config.yml:            - ./agent/bindata_assetfs.go
.circleci/config.yml:          name: commit agent/bindata_assetfs.go if there are changes
.circleci/config.yml:            if ! git diff --exit-code agent/bindata_assetfs.go; then
.circleci/config.yml:              git add agent/bindata_assetfs.go
.circleci/config.yml:              git commit -m "auto-updated agent/bindata_assetfs.go from commit ${short_sha}"
GNUmakefile:ASSETFS_PATH?=agent/bindata_assetfs.go
build-support/functions/20-build.sh:   local container_id=$(docker create -it -e GIT_COMMIT=${GIT_COMMIT} -e GIT_DIRTY=${GIT_DIRTY} ${image_name} make static-assets ASSETFS_PATH=bindata_assetfs.go)
build-support/functions/20-build.sh:         status "Copying back artifacts" && docker cp ${container_id}:/consul/bindata_assetfs.go ${sdir}/agent/uiserver/bindata_assetfs.go
build-support/functions/30-release.sh:         git add "${sdir}/agent/bindata_assetfs.go"
build-support/functions/40-publish.sh:               # bindata_assetfs.go will make these meaningless
build-support/functions/40-publish.sh:               git_diff "$(pwd)" ":!agent/bindata_assetfs.go"|| ret 1
codecov.yml:  - "agent/bindata_assetfs.go"

@banks
Copy link
Member Author

banks commented Sep 30, 2020

@dnephin @rboyer Thanks again for the reviews.

I've fixed up those last few issues noticed and rebased on master to resolve a few conflicts.

I've just spotted a legit test failure and lint issue though so I'll fix those up ready to merge.

@banks
Copy link
Member Author

banks commented Oct 1, 2020

OK... I think this is good to go. The test fail was legitimately a regression although quite an edge case (when using a custom UI dir that didn't have an index.html (which would be somewhat broken anyway because we redirect index requests to index.html).

Still this is handled slightly more nicely now - we still serve the custom UI and disable attempts to redirect to a non-existent /index.html. I also added a warning log to make the UX a little nicer.

"time"
)

// bufferedFile implements os.File and allows us to modify a file from disk by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more correct to say "implements similar methods as *os.File" since os.File is not an interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. I meant. Implements http.File which is what http.FileServer.Open returns. Thanks!

// have to match the encoded double quotes around the JSON string value that
// is there as a placeholder so the end result is an actual JSON bool not a
// string containing "false" etc.
re := regexp.MustCompile(`%22__RUNTIME_BOOL_[A-Za-z0-9-_]+__%22`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a package constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's not on a hot path (typically called once per agent lifetime) and very unlikely to be useful in any other place do you think it's preferable to define it out of context on the package? I don't mind but my feeling was it's easier to read this code with it inline so you can see what it's doing all at once.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I don't have a strong opinion here.

One thing that might slightly sway me towards the package constant is that MustCompile may panic (if for some reason the regex got busted). As long as we have tests that exercise this specific code path then it's fine. If we didn't then moving it to a package constant ensures that any stray changes will totally trigger a panic during development.

Copy link
Member

@rboyer rboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

You still have a few dangling references to the old agent/bindata_assetfs.go name to fixup. The rest of my comments were not blocking.

@banks
Copy link
Member Author

banks commented Oct 1, 2020

You still have a few dangling references to the old agent/bindata_assetfs.go name to fixup.

Oh thanks. I totally missunderstood that in your previous comments - yes this would have broken lots of CI stuff! I've updated, will push up...

@banks banks merged commit d0c1601 into master Oct 1, 2020
@banks banks deleted the ui-config-metrics branch October 1, 2020 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/config Relating to Consul Agent configuration, including reloading theme/ui Anything related to the UI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants