-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/badger db v3 #96
Conversation
This upgrades the versions and removes options which were removed upstream.
Looks like we ready to merge with develop. I decided to skip GC for By the way, gateway db has a lot of interfaces which seems to be copy pasted from |
Fine to me.
Right. Might use badger also for as persistent cache for overlay caching. In that process I might add a generic KV-util, if I think the abstraction is worth the effort. |
catfs/db/database_badger.go
Outdated
var beforeGC uint64 = 0 | ||
var total uint64 = 0 | ||
var err error | ||
log.Infof("Performin GC for badger DB in %s", opts.Dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "Performing".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, but there is a new patch which makes it obsolete.
log.Debugf("badger DB in %s has nothing for GC",opts.Dir) | ||
} | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of still running RunValueLogGC(0.5)
after Close? Is it essential to call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the Note in dgraph-io/badger#767 (comment)
the compactors, which collect statistics for GC, are run only on DB Close()
at least on small size DB.
So Close
is needed for our DB size.
But your comment made me do more digging. So some things about badger
DB GC collector was wrongly reversed engineered by me: RunValueLogGC()
actually compacts on the fly, no need to Close()
to shrink the DB. The problem that RunValueLogGC
does not collect stats by itself, but uses prepared by compactors.
Which leads for necessity of one Close()
to make sure that compactors are forced. Apparently, even Close
does not force it and need about 1 minute delay between DB modification and Close
for compactors to kick in.
So new 3cc667c patch get rid of cycled Close/Open (it does it only once per runGC
call), also I get rid
of the timer since the oldest DB flush and the modification flag.
If only there were reliable way to check if DB was modified, then we would not do unnecessary Close()
. Though I noticed that filesystem GC
constantly messing with the metadata DB even on idle brig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Code looks better to me now.
One simple to gather statistic we could display would be to display how long the GC run took. I'm a little worried that it blocks all of brigs operation for considerable time, even if there weren't many writes in that time period.
catfs/db/database_badger.go
Outdated
} | ||
// Badger people use os.FileInfo to get total size of lsm and vlog files | ||
// This is not actual but allocated of hdd size | ||
lsm, vlog := bdb.db.Size() // values could be stale even after Close/Open cycle, badger updates it once per minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need all this complicated logic if it's only for outputting info messages?
catfs/db/database_badger.go
Outdated
@@ -323,6 +400,10 @@ func (db *BadgerDatabase) Flush() error { | |||
|
|||
db.txn = nil | |||
db.haveWrites = false | |||
if time.Now().After(db.gcNeededAt) && !db.needsGC { | |||
db.gcNeededAt = time.Now().Add(120 * time.Second) // badger updates stats once per minute, we give ample time for stats to settle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a style thing: Please put the comments above the line they're referring to to avoid extra long lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Reading badger DB issues list, yielded the following. RunValueLogGC() does clean up online. But on small databases (150MB) is not big enough, the only way to update stats for GC is to close DB. see Note in dgraph-io/badger#767 (comment) Based on this information the logic is redone, to call Close only if RunValueLogGC did not succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two optional subjects that we should discuss before merging. Otherwise looks good to me 👍
// to update stats after DB was modified. But eventually GC stats will be ready. | ||
err := bdb.db.Close() | ||
if err != nil { | ||
log.Fatalf("Could not close DB in %s", opts.Dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally see calling log.Fatalf
as bad practice, especially if it's not in quick & dirty programs and especially in library code (I have a story to tell here). I'm not fully sure what the proper error handling here is, but we could enter a retry loop when fail to close or open the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to mitigate with 91d4826. Error in Close
is not fatal, we can try later. If we cannot open, then we do ten attempts with timeouts and then Fail. We cannot run without DB. Also, fixed Close which did not check for an active transaction. GC could be done or running system, but Close definitely not.
I would love to hear this story :) I was always thinking that if something going wrong the program should faill with error message so user would send an angry bug report and we fix the problem. If we are running industrial process, then the story is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error in Close is not fatal
Is it? What if it was partially closed and we're left with a zombie db conn?
I would love to hear this story :) If we are running industrial process, then the story is different.
Story is: We once used a gpio library to check some pins on our device. Sadly I didn't look into the library implementation, so that fault is on me, but their error handling was "log & exit". Which is a bad strategy as a library. I noticed this first when the program didn't first run on my work machine (obvious cause was that the gpio was not available there), but at some point we made the grave mistake to update the Go version to 1.14 and all hell broke loose. All services reading this pin suddenly crashed randomly after the update. And why was that? Because since 1.14 select()
might return EINTR
(rightly so) when it's execution was interrupted by the scheduler, which might happen at any point. The gpio library was not prepared for this and just jumped off the cliff by calling os.Exit(1)
. Fix was easy, but what we can learn from this:
- Never ever
os.Exit()
in library code. - Check the quality of the library you're using.
- Error handling is hard.
I was always thinking that if something going wrong the program should faill with error message so user would send an angry bug report and we fix the problem.
That's fine for interactive programs. Long running programs (and brig daemon counts as one) should do this only as last resort since the error gets buried in some logs (in the best case). But not being able to open the main database is one such case, so 91d4826 is fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now why it was a painful experience. Indeed fail
is super special and should be used as a last line of defense.
log.Debugf("badger DB in %s has nothing for GC",opts.Dir) | ||
} | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Code looks better to me now.
One simple to gather statistic we could display would be to display how long the GC run took. I'm a little worried that it blocks all of brigs operation for considerable time, even if there weren't many writes in that time period.
…f we failed a few times
Run GC only after some DB update with a delay to give time for GC relevant stats to be updated by `badger` This prevent doing useless DB Close/Open for the cases when we know for sure that no changes in DB were done after last garbage collection.
extra change, retry loop done with ``` for retries := 0; retries < 10; retries++ { ```
You are right: |
Addresses #92
badger v3.2011.1
made a few old options obsolete so they are removed.I have adjusted the value log file to 10MB (from default 2GB), and memory table size to 10MB from default 128MB. If we do our job right we do not need that much anyway.
Another reason for this changes.
badger
allocate above values on hard drive, if user does not use modern FS which allow sparse files (with zeros). He or she would see 2GB of space disappeared before any operation withbrig
starts.On modern FS, one would see actual allocation. Still 2GB file size looks scary to me.
So far, I see no negative effects from the badger version upgrade.
To to see compression improvements, the DB need to be garbage collected which can be done only at db.Close().