Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querier/Sidecar/StoreGW: Implement Info service and add --endpoint flag in Querier #4282

Merged
merged 16 commits into from
Nov 9, 2021

Conversation

hitanshu-mehta
Copy link
Contributor

@hitanshu-mehta hitanshu-mehta commented May 27, 2021

This PR implements a Unified Endpoint Proposal. Open for reviews :)

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Implemented Info server.
  • Added Info server in Sidecar and StoreGW.
  • Added --endpoint flag in querier.
  • Modified storeset tests to verify discovery using endpoint flag. Completely removed storeset and added endpointset query: add endpointset flow #4421

Verification

  • Manually tested using grpccurl.
  • Modified storeset tests to verify discovery using endpoint flag.

TODO

  • E2E tests.
  • Add Info server in remaining components.

pkg/info/info.go Outdated Show resolved Hide resolved
pkg/query/storeset_test.go Outdated Show resolved Hide resolved
Copy link
Member

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far; some ideas from our weekly 1-1

cmd/thanos/query.go Outdated Show resolved Hide resolved
pkg/info/info.go Outdated Show resolved Hide resolved
pkg/info/info.go Outdated Show resolved Hide resolved
pkg/query/storeset.go Outdated Show resolved Hide resolved
type InfoSpec interface {
// Addr returns InofAPI Address for the info spec. It is used as its ID.
Addr() string
// Metadata returns current labels, min/max ranges for StoreAPI and ExemplarAPI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with the smaller abstraction and add this later if we see we need to generalize it to all Info types only if they all need it really. Otherwise, we can keep it in the StoreSpec

@bwplotka
Copy link
Member

bwplotka commented Jun 2, 2021

Is it rdy for review? (:

@hitanshu-mehta
Copy link
Contributor Author

I have to add e2e tests. Apart from that, PR is ready for review 🚀

MaxTime: maxt,
}
},
func() *infopb.ExemplarsInfo { return nil },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use nil here? What's the difference between nil and a function returning nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry missed it. Yes, we can pass nil as we are doing with other parameters. We are allowing to pass nil for convenience. #4282 (comment)

test/e2e/info_api_test.go Outdated Show resolved Hide resolved
return &infopb.ExemplarsInfo{
MinTime: time.Unix(math.MinInt64/1000+62135596801, 0).UTC().Unix(),
MaxTime: time.Unix(math.MaxInt64/1000-62135596801, 999999999).UTC().Unix(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use global variable for the two values to avoid calculating them each time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hitanshu-mehta We are still calling .Unix() every time. Can't we just initialize v1.MinTime and v1.MaxTime as int64? Are they used anywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, they are not being used anywhere else. Agree, I should've used int64. I'll correct it. Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is we can have kind of constants somewhere in global level. Can we do that? It's still to be imp[roved?

@hitanshu-mehta
Copy link
Contributor Author

I had to rename messages InfoRequest and InfoResponse because messages with these names are already defined in proto file of storeAPI. So, a warning for duplicate proto type registered was given. This warning message was also being printed in component-specific files when we run make docs.

@hitanshu-mehta hitanshu-mehta marked this pull request as ready for review June 13, 2021 16:55
@hitanshu-mehta hitanshu-mehta changed the title [WIP] Implement Info service and add --endpoint flag in Querier Querier/Sidecar/StoreGW: Implement Info service and add --endpoint flag in Querier Jun 13, 2021
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments/suggestions but overall LGTM 💪 nice work!

@@ -312,6 +329,21 @@ func (s *storeRef) Update(labelSets []labels.Labels, minTime int64, maxTime int6
s.exemplar = exemplar
}

func (s *storeRef) UpdateWithStore(labelSets []labels.Labels, minTime int64, maxTime int64, storeType component.StoreAPI, store storepb.StoreClient, rule rulespb.RulesClient, target targetspb.TargetsClient, metadata metadatapb.MetadataClient, exemplar exemplarspb.ExemplarsClient) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe this could be folded into Update and then we could simply add:

if store != nil {
  s.StoreClient = store
}

This would save a bit of code duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it store at this point? sounds really like just endpointRef.

Also the amount of arguments sounds like code smell. I wonder if we hide those behind better abstractions

pkg/query/storeset_test.go Outdated Show resolved Hide resolved
test/e2e/info_api_test.go Outdated Show resolved Hide resolved
test/e2e/info_api_test.go Outdated Show resolved Hide resolved
test/e2e/info_api_test.go Outdated Show resolved Hide resolved
@GiedriusS
Copy link
Member

I had to rename messages InfoRequest and InfoResponse because messages with these names are already defined in proto file of storeAPI. So, a warning for duplicate proto type registered was given. This warning message was also being printed in component-specific files when we run make docs.

Maybe let's call them InfoAPIRequest and InfoAPIResponse or something to differentiate between those two? We can discuss this in #thanos-dev, maybe someone has even better suggestions

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering how would this API work with routing only receivers? IMO routing only receivers shouldn't expose any store API. Can you add a test case for this? cc @yashrsharma44 @hitanshu-mehta

Edit: Sorry, this pr doesn't deal with receiver part.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work everyone!

I wonder only - what's the deprecation plan and how to not add complexity to storeset if it has to be removed at some point and instead focus on endpointset that supports old StoreAPI and new separate info

cmd/thanos/query.go Show resolved Hide resolved
@@ -379,6 +382,25 @@ func runStore(
cancel()
})
}

infoSrv := info.NewInfoServer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the deprecation plan for ths old Info methods? Can we unify implementation somehow at least?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really sorry for delay.
Here is the deprecation plan I had in mind.

  1. Remove old flags (i.e. --store, --rule, --metadata, --exemplar, --target) and all the boilerplate code we have for all commands. ( eg. duplication check, dnsProviders ...)
  2. Remove storeset.go file ( I am working on to move new endpoint flow in new file as discussed here ) and all dependencies of this file from the code base.
  3. Remove Info rpc from proto file of Store API and all its corresponding implementations. ( to be specific there 5 different implementations we have of StoreClient - BucketStore, MultiTSDBStore, PrometheusStore, ProxyStore and TSDBStore)

All these steps will be done in single PR. I think this plan should work .
Please let me know if im missing something :)

Copy link
Contributor Author

@hitanshu-mehta hitanshu-mehta Jul 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we unify implementation somehow at least?

Why do we need to unify old info methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. When we want to change flag we usually deprecate flag first (add in help that it will be removed after next 2 releases). Then add new flags that works simultaneously. Fail if both new and old way is used. Map old flags to new one really before starting the program.
  2. Agree
  3. Generally makes sense, but we need to make sure that e.g old Querier having a client against old StoreAPI will work with new InfoAPI. Probably we will need to leave old Info Method (add deprecated comment) in old APIs. In the same time the new Querier should work with old APIs.
    • Potential flow: 1. Try (reflect)? InfoAPI end endpoint. If gRPC server does not support it we fallback to store, if store does it work then ruler ..... only for first call / error.
    • e2e test
    • IMPORTANT: This does not mean we need to implement separate info method for every server. We can implement one Info method and use the same one for both InfoAPI and e.g StoreAPI.

@@ -23,12 +23,12 @@ option (gogoproto.goproto_sizecache_all) = false;
/// Info represents the API that is responsible for gathering metadata about the all APIs supported by the component.
service Info {
/// Info returns the metadata (Eg. LabelSets, Min/Max time) about all the APIs the component supports.
rpc Info(InfoRequest) returns (InfoResponse);
rpc Info(InfoReq) returns (InfoResp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time I see this practice. While it helps with shorter name it violates coding style guide + consistency. Do we think it's worth it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't rename it then a warning message will also be printed in our documentations as I mentioned here. I was not able to find a way to remove them. @GiedriusS has suggested better names InfoAPIRequest and InfoAPIResponse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because our pacakge is thanos. Do you mind changing this to thanos.info?

@@ -312,6 +329,21 @@ func (s *storeRef) Update(labelSets []labels.Labels, minTime int64, maxTime int6
s.exemplar = exemplar
}

func (s *storeRef) UpdateWithStore(labelSets []labels.Labels, minTime int64, maxTime int64, storeType component.StoreAPI, store storepb.StoreClient, rule rulespb.RulesClient, target targetspb.TargetsClient, metadata metadatapb.MetadataClient, exemplar exemplarspb.ExemplarsClient) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it store at this point? sounds really like just endpointRef.

Also the amount of arguments sounds like code smell. I wonder if we hide those behind better abstractions

@@ -597,6 +630,105 @@ func (s *StoreSet) getActiveStores(ctx context.Context, stores map[string]*store
level.Warn(s.logger).Log("msg", "ignored rule store", "address", ruleAddr)
}
}

// Gather healthy stores map concurrently using info addresses. Build new store if does not exist already.
for _, infoSpec := range s.infoSpec() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is baked in storeset but I would see it really part of endpoints

I would really just copy and add a new endpoint flow - it will be easier to main new thing than to try to understand later what to remove from mixed thing.

Then if someone wants to use endpoints store flag should be empty. WDYT?

Copy link
Contributor Author

@hitanshu-mehta hitanshu-mehta Jun 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree! I will refactor this code and maybe create a new flow in a different endpointset.go file. WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still minor nits, but looks really close to be done!

cmd/thanos/query.go Show resolved Hide resolved
cmd/thanos/query.go Show resolved Hide resolved
return &infopb.ExemplarsInfo{
MinTime: time.Unix(math.MinInt64/1000+62135596801, 0).UTC().Unix(),
MaxTime: time.Unix(math.MaxInt64/1000-62135596801, 999999999).UTC().Unix(),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is we can have kind of constants somewhere in global level. Can we do that? It's still to be imp[roved?

@@ -379,6 +382,25 @@ func runStore(
cancel()
})
}

infoSrv := info.NewInfoServer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

@@ -23,12 +23,12 @@ option (gogoproto.goproto_sizecache_all) = false;
/// Info represents the API that is responsible for gathering metadata about the all APIs supported by the component.
service Info {
/// Info returns the metadata (Eg. LabelSets, Min/Max time) about all the APIs the component supports.
rpc Info(InfoRequest) returns (InfoResponse);
rpc Info(InfoReq) returns (InfoResp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because our pacakge is thanos. Do you mind changing this to thanos.info?

@yeya24
Copy link
Contributor

yeya24 commented Jul 1, 2021

Quick question about this and our usecase. We have a central thanos query and multiple edge thanos sidecars. If I use --endpoint flag to specify sidecars, then is there a way to disable metadata or it is enabled automatically?
I'd like to have a way to disable metadata queries for performance purposes rather than enabling them automatically. Fanouting these queries to edges are too slow.

@hitanshu-mehta
Copy link
Contributor Author

hitanshu-mehta commented Jul 4, 2021

@yeya24 Currently there isn't any way to disable specific api queries like metadata, targets etc.

@hitanshu-mehta hitanshu-mehta mentioned this pull request Jul 6, 2021
2 tasks
@hitanshu-mehta hitanshu-mehta marked this pull request as draft July 6, 2021 21:22
GiedriusS
GiedriusS previously approved these changes Nov 3, 2021
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last three suggestions before merging 👍 Also, please, please, please avoid force-pushing and overriding previous commits because when this happens the reviewers need to look through the whole code again to see whether it is still the same. That's why there was this big delay before I got to this again 👍

pkg/store/prometheus.go Outdated Show resolved Hide resolved
cmd/thanos/query.go Outdated Show resolved Hide resolved
test/e2e/info_api_test.go Show resolved Hide resolved
@hitanshu-mehta
Copy link
Contributor Author

@GiedriusS Really sorry for the trouble. I will avoid force pushing from now on :)

Signed-off-by: Hitanshu Mehta <[email protected]>
@GiedriusS
Copy link
Member

@GiedriusS Really sorry for the trouble. I will avoid force pushing from now on :)

CI failure seems related (:

Signed-off-by: Hitanshu Mehta <[email protected]>
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💪 next steps are to actually use the new InfoServer & add tests for it? Plus, we should probably also add --endpoint-strict? Finally, remove the old flags? (: And that feature to ignore certain service(-s) would be nice too.

Note to others that --endpoint with this PR is semantically equivalent to all of the other flags since we prefill the expected APIs (we don't check the InfoServer, yet) so it is completely safe to merge this!

@GiedriusS GiedriusS merged commit ac2cc26 into thanos-io:main Nov 9, 2021
GiedriusS added a commit to GiedriusS/thanos that referenced this pull request Nov 9, 2021
With thanos-io#4282 merged, it is time to
unhide these very useful pages from the navigation bar. 🎉

Signed-off-by: Giedrius Statkevičius <[email protected]>
@GiedriusS GiedriusS mentioned this pull request Nov 9, 2021
2 tasks
yeya24 pushed a commit that referenced this pull request Nov 10, 2021
With #4282 merged, it is time to
unhide these very useful pages from the navigation bar. 🎉

Signed-off-by: Giedrius Statkevičius <[email protected]>
GiedriusS added a commit to GiedriusS/thanos that referenced this pull request Nov 16, 2021
With thanos-io#4282 merged, it is time to
unhide these very useful pages from the navigation bar. 🎉

Signed-off-by: Giedrius Statkevičius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants