-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object Storage: support Baidu Object Storage (BOS) as provider #1329
Object Storage: support Baidu Object Storage (BOS) as provider #1329
Conversation
@tianyuansun Thank you for your PR! Awesome! Will review later today. |
Prometheus' /series end-point returns an empty array no matter what: if there were results or not. We need to follow the same principle in Thanos as well because our users depend on it. The same fix has been applied like here: https://github.com/prometheus/prometheus/blob/fef150f1b5e48652ec6779e2129ae9a0cf13db8a/web/api/v1/api.go#L506 Tests have been updated to account for this difference and plus one test has been added just for this case. Closes thanos-io#1325.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM, Thanks!
Some suggestions.
PS: Please add this into CHANGELOG. It is a major changes.
PS2: Add cloud.baidu.com
to the exclude hosts list in the check-docs
block in Makefile, it would not generate the documents for cloud.baidu.com
, because circleci
do not access the site within China.
pkg/objstore/bos/bos.go
Outdated
case objectsCh <- objectInfo{ | ||
err: err, | ||
}: | ||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't need select here, we can check ctx.Err()
to determine the iterator has been canceled.
for {
if ctx.Err() != nil {
return
}
}
Further information is here.
pkg/objstore/bos/bos.go
Outdated
case objectsCh <- objectInfo{ | ||
key: object.Key, | ||
}: | ||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
pkg/objstore/bos/bos.go
Outdated
case objectsCh <- objectInfo{ | ||
key: obj, | ||
}: | ||
case <-ctx.Done(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
pkg/objstore/bos/bos.go
Outdated
// Exists checks if the given object exists in the bucket. | ||
// TODO(bplotka): Consider removing Exists in favor of helper that do Get & IsObjNotFoundErr (less code to maintain). | ||
func (b *Bucket) Exists(ctx context.Context, name string) (bool, error) { | ||
obj, err := b.getRange(ctx, b.name, name, 0, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why did you use getRange
to check the object? Could we use the GetObjectMeta
API?
pkg/objstore/bos/bos.go
Outdated
|
||
// Upload the contents of the reader as an object into the bucket. | ||
func (b *Bucket) Upload(ctx context.Context, name string, r io.Reader) error { | ||
if _, err := b.client.PutObject(b.name, name, r, nil, nil); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw the official documents that the sample upload interface could not upload the file large than 5GB, sometimes the tsdb files (index) would exceed 5GB. Could you double check this interface would not be impact by the limitation?
pkg/objstore/bos/bos.go
Outdated
"gopkg.in/yaml.v2" | ||
) | ||
|
||
// DirDelim is the delimiter used to model a directory structure in an object store bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line, the parameter is not exported.
pkg/objstore/bos/bos.go
Outdated
return nil | ||
} | ||
|
||
// Iter calls f for each entry in the given directory (not recursive.). The argument to f is the full |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Iter calls f for each entry in the given directory (not recursive.). The argument to f is the full | |
// Iter calls f for each entry in the given directory (not recursive). The argument to f is the full |
pkg/objstore/bos/bos.go
Outdated
"github.com/baidu/baiducloud-sdk-go/bce" | ||
"github.com/baidu/baiducloud-sdk-go/bos" | ||
"github.com/go-kit/kit/log" | ||
"github.com/improbable-eng/thanos/pkg/objstore" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this line here:
"time"
"github.com/improbable-eng/thanos/pkg/objstore"
"github.com/baidu/baiducloud-sdk-go/bce"
Brings everything from the 0.6.0 tag into master + appropriate VERSION value. Also, remove some unneeded whitespace from CHANGELOG.md.
…nos-io#1338) * querier: Allows single store in case of duplicates. Drop others. Fixes thanos-io#1337 Signed-off-by: Bartek Plotka <[email protected]> * Updated CHANGELOG. Signed-off-by: Bartek Plotka <[email protected]>
* thanos-io * update * More seds. Signed-off-by: Bartek Plotka <[email protected]> * Go mod tidy. Signed-off-by: Bartek Plotka <[email protected]>
…hanos-io#1342) * Moved CI to thanos-io org and docker image to quay.io/thanos/thanos Signed-off-by: Bartek Plotka <[email protected]> * Get format back. Signed-off-by: Bartek Plotka <[email protected]>
Signed-off-by: Bartek Plotka <[email protected]>
* runutil: add Exhaust* fns, initial users * runutil: explicitly ignore return values * runutil: fix according to comments * pkg/*: convert more users to Exhaust*() * runutil: fix errcheck * CHANGELOG: add item * runutil: inform the user if exhaustion fails * CHANGELOG: fix up the merge * runutil: add missing `if err != nil`
…io#1340) * pool: add pool raciness tests, fix p.usedTotal keeping I have noticed with these tests that the `p.usedTotal` value keeps jumping like this: 6 3 6 6 18446744073709551604 3 18446744073709551607 18446744073709551604 It could happen that the value of it would underflow as the tests show: ``` unexpected error: goroutine barbazbaz: pool exhausted ``` * pool: pool_test: simplify * pool: add locking around p.usedTotal, fix test * pool: add safe-guard against slices bigger than buckets * pool: pool_test: make bool chan buffered * pool: add missing '.' * pool: revert last change We still want to modify `p.usedTotal` if the slice's capacity is outside of any bucket.
Fixes thanos-io#1354 Signed-off-by: beorn7 <[email protected]>
Removing this after gossip removal, also, is there anything that needs to be added?
Signed-off-by: Gabriela S. Soria <[email protected]>
* Allow for setting custom part size Some S3 compatible setups may have a different limit for maximum file size in multipart uploads, so allow for overriding it. Signed-off-by: Lukasz Jernas <[email protected]> * Add config test Signed-off-by: Lukasz Jernas <[email protected]> * Drop omitempty on PartSize field Signed-off-by: Lukasz Jernas <[email protected]> * Add documentation for the new variable Signed-off-by: Lukasz Jernas <[email protected]> * Add review suggestions Extract calculation to a constant and fixup comments. * Rename part size variable to be more descriptive Signed-off-by: Lukasz Jernas <[email protected]>
Signed-off-by: Lukasz Jernas <[email protected]>
Signed-off-by: Martin Chodur <[email protected]>
Signed-off-by: YaoZengzeng <[email protected]>
* Expose hashring metrics * Fix metric names * Update changelog * Register new gauges
Ensure we log unexpected errors in the Thanos receive component. This commit also ensures we properly count errors in the thanos_receive_forward_requests_total metrics.
Signed-off-by: Bartek Plotka <[email protected]>
The command `make build` fails when `$GOPATH` contains multiple paths separated by `:`: Example: ``` $ echo $GOPATH /go:/opt/rh/go-toolset/root/usr/share/gocode $ make build Makefile:99: *** target pattern contains no `%'. Stop.
Signed-off-by: ethan <[email protected]>
these questions are about merged PRs Signed-off-by: Brett Jones <[email protected]>
As agreed thanos-io#1688 (comment) Signed-off-by: Bartek Plotka <[email protected]>
* Added official Governance page for Thanos. All maintainers, please review and approve *only* if you agree. Signed-off-by: Bartek Plotka <[email protected]> * Addressed comments. Signed-off-by: Bartek Plotka <[email protected]> * Addressed commments (only FAQ update). Signed-off-by: Bartek Plotka <[email protected]>
* Inroduce graceful shutdown for gRPC Signed-off-by: Kemal Akkoyun <[email protected]> * Add missed cancel branch Signed-off-by: Kemal Akkoyun <[email protected]> * Remove stutter from server structs Signed-off-by: Kemal Akkoyun <[email protected]> * Close servers immediately if grace period is not specified Signed-off-by: Kemal Akkoyun <[email protected]> * Update CHANGELOG Signed-off-by: Kemal Akkoyun <[email protected]> * Rename TLS methods, clarify log messages Signed-off-by: Kemal Akkoyun <[email protected]> * Document public functions Signed-off-by: Kemal Akkoyun <[email protected]> * Fix review issues Signed-off-by: Kemal Akkoyun <[email protected]> * Update bucket docs Signed-off-by: Kemal Akkoyun <[email protected]> * trigger checks Signed-off-by: Kemal Akkoyun <[email protected]> * trigger checks Signed-off-by: Kemal Akkoyun <[email protected]>
While standardizing some flags, commit 9d4d0bf accidentally added a `--http-address` flag to the bucket web component that is never used. However, rather than remove this new flag, this commit removes the `--listen` flag that the component defines so that the component is in line with the flags defined by other components. Signed-off-by: Lucas Servén Marín <[email protected]>
This commit fixes a regression introduced in thanos-io#1702, where the signature of the bucketui.Register method was changed. The bug was caused because one of the registered paths relied on the HTTP parameter variable support from github.com/julienschmidt/httprouter, which the standard library does not support. The fix is instead to implement the change recommended in thanos-io#1702 (comment). Signed-off-by: Lucas Servén Marín <[email protected]>
The receive component currently has duplicate spans for requests handled by its HTTP server. This is due to the fact that the server is instrumented once using the global singleton tracer and then a second time with the tracer configured via the CLI. This commit eliminates the duplicate instrumentation via the global singleton in favor of the explicit specified tracer, which is consistent with the practice of other Thanos components. Signed-off-by: Lucas Servén Marín <[email protected]>
- append to s.lset caused reallocs and overallocs - this is now fixed by properly setting capacity on s.lset - pprof shows "-316.22MB, 5.00% of 6322.82MB total" for sample data set and query I used Signed-off-by: Philip Panyukov <[email protected]>
Signed-off-by: Bartek Plotka <[email protected]>
While debugging thanos-io#1721, I found that when thanos receive bails, there is a race in a select statement, where the non-returning branch may be chosen. This branch will deadlock if selected twice because the channel reader has already exited. The way to prevent this is by checking if we need to exit on every loop. Signed-off-by: Lucas Servén Marín <[email protected]>
Every time thanos receive is started, it has to replay the WAL three times, namely: 1. open the TSDB; 2. close the TSDB; open the ReadOnly TSDB and Flush; and 3. open the TSDB These WAL replays can take a very long time if the WAL has lots of data. With the fix from thanos-io#1654, the third time will be instantaneous because the WAL will be empty. That still leaves two potentially long WAL replays. We can cut this down to just one long replay if we do the following operations instead: 1. with a closed TSDB, open the ReadOnly TSDB and Flush; and 2. open the TSDB Now, the second step will be a fast replay because the WAL is empty, leaving just one potentially expensive WAL replay. This commit eliminates explicit opening of the writable TSDB during startup, and instead opens it after flushing the read-only TSDB. Signed-off-by: Lucas Servén Marín <[email protected]>
Signed-off-by: Bartek Plotka <[email protected]>
) * store the first raw value of a chunk during downsampling As discussed in thanos-io#1568, storing only the last raw value of a chunk will lose a counter reset when: a) the reset occurs at a chunk boundary, and b) the last raw value of the earlier chunk is less than the first aggregated value of the later chunk. This commit stores the first raw value of a chunk during the initial raw aggregation, and retains it during subsequent aggregations. This is similar to the existing handling for the last raw value of a chunk. With this change, when counterSeriesIterator iterates over a chunk boundary, it will see the last raw value of the earlier chunk, then the first raw value of the later chunk, and then the first aggregated value of the later chunk. The first raw value will always be less than or equal to the first aggregated value, so the only difference in counterSeriesIterator's output will be the possible detection of a reset and an extra sample after the chunk boundary. Fixes: thanos-io#1568 Signed-off-by: Alfred Landrum <[email protected]> * changelog for thanos-io#1709 Signed-off-by: Alfred Landrum <[email protected]> * adjust existing downsampling tests Signed-off-by: Alfred Landrum <[email protected]> * add counter aggregation comments to CounterSeriesIterator Signed-off-by: Alfred Landrum <[email protected]>
Signed-off-by: Sebastian Herzberg <[email protected]>
* updates ruler docs Signed-off-by: Ian Billett <[email protected]> * Update docs/components/rule.md Co-Authored-By: Bartlomiej Plotka <[email protected]> Signed-off-by: Ian Billett <[email protected]> * removes newline Signed-off-by: Ian Billett <[email protected]>
* compact: add metric thanos_compactor_iterations_total Add a metric called thanos_compactor_iterations_total that is a counter and will get increased by 1 every time an iteration gets executed successfully. This is needed in case --wait is specified and then our Compactor could die. We need to alert on such a case. One thing would be to alert on a restart of the container however that is not the most flexible thing - it might still be OK as long as it successfully finishes its job in time. However, it is impossible to know that exact part ATM. Add this metric so that users could add alerts like: ``` rate(thanos_compactor_iterations_total[1d]) == 0 FOR 3d ``` Signed-off-by: Giedrius Statkevičius <[email protected]> * CHANGELOG: add entry Signed-off-by: Giedrius Statkevičius <[email protected]> * compact: simplify wait check Signed-off-by: Giedrius Statkevičius <[email protected]> * cmd: thanos: compact: remove wait check Let's register the metric no matter what since if it is run as a batch job then this metric does not matter either way. Signed-off-by: Giedrius Statkevičius <[email protected]> * CHANGELOG: add period Add a period at the end of an item in the CHANGELOG to keep it uniform. Signed-off-by: Giedrius Statkevičius <[email protected]>
* Do not generate XXX fields in protobufs We don't need them. By gettig rid of them we achieve: - smaller memory footprint - smaller payloads on the wire - same in-memory layout with core Prometheus structs opening path for all sorts of optimisations such as unsafe casts, leading to better memory use and faster :) - see related issue: prometheus/prometheus#6029 The tests show SG memory allocations dropped 5% for query execution: - Showing nodes accounting for -359.20MB, 6.19% of 5798.93MB total Signed-off-by: Philip Panyukov <[email protected]> * comment to explain why do don't generate XXX fields in protobufs Signed-off-by: Philip Panyukov <[email protected]>
) Signed-off-by: Martin Chodur <[email protected]>
* Use exponential buckets for compactor histogram metrics Signed-off-by: Kemal Akkoyun <[email protected]> * Update buckets Signed-off-by: Kemal Akkoyun <[email protected]> * Adjust histogram buckets Signed-off-by: Kemal Akkoyun <[email protected]> * Adjust store gate bucket Signed-off-by: Kemal Akkoyun <[email protected]> * Adjust http duration buckets Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Bartek Plotka <[email protected]>
Changes
Baidu Cloud is a public cloud provider in China. Baidu Cloud.
Its object storage product is BOS (Baidu Object Storage).
We are testing Thanos with BOS as storage backend.
This pr implements a BOS client and it seems to be ok.
I would love to become the maintainer for the BOS provider and keep development for Thanos.
Verification
here is acceptance_e2e_test result
go test pkg/objstore/objtesting/acceptance_e2e_test.go pkg/objstore/objtesting/foreach.go
ok command-line-arguments 3.275s