-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform indexing in background #4819
Conversation
e1319ae
to
674b79e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 38 files reviewed, 1 unresolved discussion (waiting on @golangcibot)
posting/index.go, line 457 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 101 characters (from
lll
)
Done.
674b79e
to
fd7fc59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments. Overall, looking good.
Reviewed 11 of 21 files at r1, 22 of 22 files at r2.
Reviewable status: 11 of 40 files reviewed, 8 unresolved discussions (waiting on @golangcibot, @mangalaman93, @manishrjain, @martinmr, and @pawanrawal)
posting/index.go, line 718 at r2 (raw file):
// GetInterimSchema returns the scheme that can be served while indexes are getting built. func (rb *IndexRebuild) GetInterimSchema() *pb.SchemaUpdate {
QuerySchema
schema/schema.go, line 49 at r2 (raw file):
) // GetWriteContext returns a context that sets the schema context for writting.
s/writting/writing
schema/schema.go, line 50 at r2 (raw file):
// GetWriteContext returns a context that sets the schema context for writting. func GetWriteContext(ctx context.Context) context.Context {
Make it private.
schema/schema.go, line 56 at r2 (raw file):
var ( // WriteCtx is used to get the schema used for writing. WriteCtx = GetWriteContext(context.Background())
Generally concerned about these pre-generated contexts.
schema/schema.go, line 235 at r2 (raw file):
s.RLock() defer s.RUnlock() isWrite, _ := ctx.Value(isWrite).(bool)
move it to top? like Get.
worker/mutation.go, line 120 at r2 (raw file):
func runSchemaMutation(ctx context.Context, update *pb.SchemaUpdate, startTs uint64) error { // wait until schema modification for this predicate is complete. for {
Add a comment about the fact that this is a bit of a race condition: We typically won't propose an index update if something is going on. In this case, looks like the receiver of the update had probably finished the previous index update, but some follower (or perhaps leader) had not finished it.
Still better than before, where we were blocking on the entire index update to be done.
worker/mutation.go, line 147 at r2 (raw file):
CurrentSchema: update, } intermUpdate := rebuild.GetInterimSchema()
GetQuerySchema
Might call it QuerySchema or MutationSchema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 11 of 40 files reviewed, 8 unresolved discussions (waiting on @golangcibot, @manishrjain, @martinmr, and @pawanrawal)
posting/index.go, line 718 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
QuerySchema
Done.
schema/schema.go, line 49 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
s/writting/writing
Done.
schema/schema.go, line 50 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Make it private.
Using this function now.
schema/schema.go, line 56 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Generally concerned about these pre-generated contexts.
Done.
schema/schema.go, line 235 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
move it to top? like Get.
Done.
worker/mutation.go, line 120 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Add a comment about the fact that this is a bit of a race condition: We typically won't propose an index update if something is going on. In this case, looks like the receiver of the update had probably finished the previous index update, but some follower (or perhaps leader) had not finished it.
Still better than before, where we were blocking on the entire index update to be done.
Done.
worker/mutation.go, line 147 at r2 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
GetQuerySchema
Might call it QuerySchema or MutationSchema.
Done.
schema/schema.go
Outdated
@@ -139,6 +159,20 @@ func (s *state) Set(pred string, schema *pb.SchemaUpdate) { | |||
s.elog.Printf(logUpdate(schema, pred)) | |||
} | |||
|
|||
// SetMutSchema sets the in memory schema for the predicate that has indexing going on in background. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is 101 characters (from lll
)
6c736ce
to
672bc43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 41 files reviewed, 13 unresolved discussions (waiting on @golangcibot, @mangalaman93, @manishrjain, @martinmr, and @pawanrawal)
dgraph/cmd/alpha/http_test.go, line 296 at r4 (raw file):
return nil, nil, err } case retries > 0:
Is this intentional? It could return anytime randomly b/w 1-20 retries.
dgraph/cmd/alpha/reindex_test.go, line 192 at r4 (raw file):
func checkSchema(t *testing.T, query, key string) { for i := 0; i < 10; i++ {
i < 10
might not be necessary.
dgraph/cmd/alpha/run_test.go, line 171 at r5 (raw file):
break } else { return nil
waitForAlter
will return even if one of the predicates of schema s
has same index as in queried schema. Is this the case? Is this intentional?
edgraph/server.go, line 244 at r5 (raw file):
} // If a background task for this predicate is already running,
this
predicate or any predicate ?
Code never lies, comments sometimes do
😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 41 files reviewed, 15 unresolved discussions (waiting on @golangcibot, @mangalaman93, @manishrjain, @martinmr, and @pawanrawal)
schema/schema.go, line 46 at r5 (raw file):
const ( isWrite contextKey = iota
Can you explain what does it represent in a comment.
systest/bgindex/common_test.go, line 58 at r5 (raw file):
} return dg, nil
Even in case of no-retriable errors we are returning nil
error.
99b1d94
to
e87ee7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 41 files reviewed, 14 unresolved discussions (waiting on @animesh2049, @golangcibot, @mangalaman93, @manishrjain, @martinmr, and @pawanrawal)
dgraph/cmd/alpha/run_test.go, line 171 at r5 (raw file):
Previously, animesh2049 (Animesh Chandra Pathak) wrote…
waitForAlter
will return even if one of the predicates of schemas
has same index as in queried schema. Is this the case? Is this intentional?
Done.
edgraph/server.go, line 244 at r5 (raw file):
Previously, animesh2049 (Animesh Chandra Pathak) wrote…
this
predicate or any predicate ?Code never lies, comments sometimes do
😛
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 41 files reviewed, 13 unresolved discussions (waiting on @animesh2049, @golangcibot, @manishrjain, @martinmr, and @pawanrawal)
schema/schema.go, line 162 at r3 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 101 characters (from
lll
)
Done.
schema/schema.go, line 46 at r5 (raw file):
Previously, animesh2049 (Animesh Chandra Pathak) wrote…
Can you explain what does it represent in a comment.
Done.
systest/bgindex/common_test.go, line 58 at r5 (raw file):
Previously, animesh2049 (Animesh Chandra Pathak) wrote…
Even in case of no-retriable errors we are returning
nil
error.
Done.
f4b5262
to
5541187
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 5 of 44 files reviewed, 14 unresolved discussions (waiting on @animesh2049, @danielmai, @golangcibot, @mangalaman93, @manishrjain, @martinmr, @MichaelJCompton, and @pawanrawal)
schema/schema.go, line 288 at r6 (raw file):
} x.AssertTruef(su != nil, "schema state not found for %s", pred) var tokenizers []tok.Tokenizer
pre-allocate the slice maybe?
schema/schema.go, line 329 at r6 (raw file):
} if schema, ok := s.predicate[pred]; ok { return schema.Directive == pb.SchemaUpdate_REVERSE
Instead of this can we simply write
schema != nil && schema.Directive == pb.SchemaUpdate_REVERSE
?
We can use this type of expression at lot of places.
worker/mutation.go, line 120 at r6 (raw file):
func runSchemaMutation(ctx context.Context, updates []*pb.SchemaUpdate, startTs uint64) error { // wait until schema modification for any predicate is complete.
wait untill schema modification for all predicates is complete ?
worker/mutation.go, line 208 at r6 (raw file):
// If everything goes fine, we do all the background work in separate goroutines. complete := false bgTasks := make([]func(), 0, len(updates))
Instead of having slice of functions, can we simply return rebuild
from fgWork
and use it to call bgWork
in seperate goroutine. Something like
for _, su := range updates {
rb, err := fgWork(su)
if err != nil {
return err
}
defer func() {
if complete {
return
}
undoSchemaUpdate(su.Predicate)
}()
go bgWork(su, rb)
}
worker/mutation.go, line 215 at r6 (raw file):
} defer func() { if complete {
Nice hack 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 38 unresolved discussions (waiting on @danielmai, @golangcibot, @manishrjain, @martinmr, @MichaelJCompton, and @pawanrawal)
systest/1million/1million_test.go, line 9275 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
::
remove the double colon
Done.
systest/1million/1million_test.go, line 9278 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
::
remove the double colon
Done.
systest/bgindex/common_test.go, line 35 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
why can't you use one of the functions already in testutil?
I am connecting to all the alphas.
systest/bgindex/count_test.go, line 64 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
use
t.Logf
t.Logf will print once test is complete and we won't observe the progress of the test
systest/bgindex/count_test.go, line 88 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
use
t.logf
same here
systest/bgindex/count_test.go, line 106 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
switch rand.Intn(1000) % 3 {
maybe add a short comment in each case to summarize what each change is doing.
Done.
systest/bgindex/count_test.go, line 172 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
use
t.logf
same
systest/bgindex/count_test.go, line 249 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
use
t.logf
same
systest/bgindex/count_test.go, line 258 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
use
t.logf
same
systest/bgindex/reverse_test.go, line 87 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
add a short comment to summarize what each case is doing
Done.
worker/mutation.go, line 121 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// wait until
Wait until
Done.
worker/mutation.go, line 121 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
for all predicate
for all predicates
Done.
worker/mutation.go, line 123 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// wait until schema modification for all predicate is complete. We cannot have two // background tasks running for the same predicate. This is a race condition, we typically // won't propose an index update if an index update is already going on. In this case, looks
Rephrase this to something like:
There cannot be two background tasks running for the same predicate as this is a race condition. We typically won't propose an index update if another is already going on.
Done.
worker/mutation.go, line 125 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// won't propose an index update if an index update is already going on. In this case, looks // like the receiver of the update had probably finished the previous index update, but some // follower (or perhaps leader) had not finished it.
If that's not the case, then the receiver of the update had probably finish the previous index update ....
Done.
worker/mutation.go, line 128 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// In other words, before reaching here, the proposer P would have checked that no indexing // is in progress (could also be because proposer was done earlier than others). If P was still // indexing when the req was received, it would have rejected the Alter request. Only if P is // not indexing, it would accept and propose the request.
In other words, the proposer checks whether there is another indexing in progress. If that's the case, the alter request is rejected. Otherwise, the request is accepted.
Done.
worker/mutation.go, line 131 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// It is possible that a receiver R of the proposal is still indexing. In that case, R would // block here and wait for indexing to be finished.
It's possible that a receive of a previous proposal is still indexing so we block here until all receivers are done processing it.
Well, we will only wait until this receiver is done.
worker/mutation.go, line 160 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// we should only start building indexes once this function has returned. // This is in order to ensure that we do not call DropPrefix for one predicate // and write indexes for another predicate simultaneously. because that could // cause writes to badger to fail leading to undesired indexing failures.
We should only ....
Done.
worker/mutation.go, line 197 at r11 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// Sets only in memory, we will update it on disk only after // schema mutations are successful and written to disk.
Sets the schema only in memory. The schema is written to disk only after the schema ...
Done.
4a18588
to
7298fab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 12 files at r9.
Reviewable status: 33 of 43 files reviewed, 37 unresolved discussions (waiting on @danielmai, @golangcibot, @mangalaman93, @manishrjain, @martinmr, @MichaelJCompton, and @pawanrawal)
worker/mutation.go, line 155 at r11 (raw file):
var wg sync.WaitGroup wg.Add(1) defer wg.Done()
Might add a comment here.
worker/mutation.go, line 156 at r11 (raw file):
wg.Add(1) defer wg.Done() buildIndxes := func(update *pb.SchemaUpdate, rebuild posting.IndexRebuild) {
buildIndexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 33 of 43 files reviewed, 37 unresolved discussions (waiting on @danielmai, @golangcibot, @manishrjain, @martinmr, @MichaelJCompton, and @pawanrawal)
worker/mutation.go, line 155 at r11 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Might add a comment here.
Done.
worker/mutation.go, line 156 at r11 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
buildIndexes
Done.
This change is needed after background indexing was introduced in #4819. Otherwise, the schema check can be flaky due to timing if the schema query runs before indexing is finished.
This fixes flaky tests with the following changes: * Use the same "dgraph" Docker Compose project as the other tests. * Wait for background indexing to finish (introduced in #4819). test-bulk-schema.sh was creating its Dgraph cluster using its own Docker Compose project, which doesn't work nicely with the other custom cluster tests which all utilize the same "dgraph" project name. Using the same name allows Docker Compose to cleanly recreate or delete containers. This change makes test-bulk-schema.sh use the same "dgraph" project. Because Dgraph re-indexes in the background, the test must wait for the update to finish to properly compare the schema before and after. Otherwise, this error in the test can happen: test-bulk-schema.sh: verifying schema is same before export and after bulk import [00:23:41][./dgraph/cmd/bulk/systest/test-bulk-schema.sh] [Test Error Output] test-bulk-schema.sh: schema incorrect test-bulk-schema.sh: *** unexpected error *** [00:23:41][./dgraph/cmd/bulk/systest/test-bulk-schema.sh] [Test Output] 7a8 > "index": true, 8a10,12 > "tokenizer": [ > "exact" > ],
This fixes flaky tests with the following changes: * Use the same "dgraph" Docker Compose project as the other tests. * Wait for background indexing to finish (introduced in #4819). test-bulk-schema.sh was creating its Dgraph cluster using its own Docker Compose project, which doesn't work nicely with the other custom cluster tests which all utilize the same "dgraph" project name. Using the same name allows Docker Compose to cleanly recreate or delete containers. This change makes test-bulk-schema.sh use the same "dgraph" project. Because Dgraph re-indexes in the background, the test must wait for the update to finish to properly compare the schema before and after. Otherwise, this error in the test can happen: test-bulk-schema.sh: verifying schema is same before export and after bulk import [00:23:41][./dgraph/cmd/bulk/systest/test-bulk-schema.sh] [Test Error Output] test-bulk-schema.sh: schema incorrect test-bulk-schema.sh: *** unexpected error *** [00:23:41][./dgraph/cmd/bulk/systest/test-bulk-schema.sh] [Test Output] 7a8 > "index": true, 8a10,12 > "tokenizer": [ > "exact" > ], (cherry picked from commit 7d92d2e)
Todo
This change is
Docs Preview: