Skip to content
This repository has been archived by the owner on Dec 6, 2018. It is now read-only.

plan: level-sublevel@6 REWRITE #46

Open
dominictarr opened this issue Dec 9, 2013 · 32 comments
Open

plan: level-sublevel@6 REWRITE #46

dominictarr opened this issue Dec 9, 2013 · 32 comments

Comments

@dominictarr
Copy link
Owner

Okay, I've been wanting to rewrite sublevel for a while now.
but I think almost all the pieces have come together now.

The main key, is that instead of creating sublevel objects,
you will be able to perform an operation into any sublevel by
putting a sublevel: path option in the options argument.
(which every operation has) path will be an array of sublevel names.

The current way of creating a sublevel will still work, but that will just be sugar on top of the new way. This change will mean that you do not to create a persistent in-memory object to have a sublevel, so you'll be able to create them dynamically, or from the client.

since I plan to change the way prefixes work, you will not be able to open a sublevel that uses level-sublevel@5 keys with level-sublevel@6. Probably, a migration script will be possible, though.

I think the prefix handling method should be a plugin (it's just about comparing strings/objects) that converts a standard api (an array of objects/strings/buffers) to a binary/string/object representation, and orders them with the correct properties.

all these issues will be addressed:

any comments, people who use sublevel?
@stagas @juliangruber @hij1nx @rvagg @mcollina @mikeal @timoxley @thlorenz @TehShrike

@mcollina
Copy link

mcollina commented Dec 9, 2013

Let me throw one more thing in the battlefield: support hooks but stop depending directly on them. I'm not sure if this is possible or it is just a side effect of #38.

@juliangruber
Copy link
Contributor

I wonder if this can only be implemented by monkey patching or this would be a good time to think about

  • extending levelup
  • writing something more high level on top of levelup

@juliangruber
Copy link
Contributor

imo monkeypatching is the worst thing we currently do with level-* modules

@mcollina
Copy link

mcollina commented Dec 9, 2013

@juliangruber I agree. It remembers me of old rails plugins.

@timoxley
Copy link

timoxley commented Dec 9, 2013

This all sounds great.

Could sublevels simply be a leveldown interface over a levelup instance?

In this scenario, creating a sublevel would just mean passing the db parameter to level/levelup, and wouldn't require any monkey-patching to levelup or hooks, and sublevels would always maintain the same api as levelup.

e.g.

var db = levelup('/whatever')
var sublevelDb = levelup('users', {
  db: sublevelify(db)
})

Not sure if plausible though. Just a thought.

@dominictarr
Copy link
Owner Author

@timoxley, I've thought about that... but it just seems too complicated when I think about it...

@juliangruber as of recently, this no longer mutates levelup, only wraps it.

@mcollina what don't you like about hooks?

@juliangruber @mcollina monkeypatching is different. modifying an instance is very different to modifying a prototype (monkey patching). Modifying a prototype, or a shared object is dangerous because you can't know what the other user depends on. Modifying one instance is fine, though, because it's your instance and you can do what you like with it.

It's necessary to extend the sublevel instances for multilevel to be able to see them on the other side.

You could make a new instance who's proto pointed to the original sublevel,
but you'd have to keep a reference, and if you wanted that to work with multilevel you'd have to do
db.sublevels[name] = child_of_sublevel which would be the same difference.

Can anyone give me a concrete example of how this can cause a problem? or is this just a vague code style thing?

@mcollina
Copy link

mcollina commented Dec 9, 2013

@dominictarr hooks may have some problems when creating a lot of sublevels (e.g. if we create them based on user inputs). If you place a hook on a sublevel, you want it to stick even if the sublevel is GCed, or (worse) you loose its reference. That's one of the reasons we need a close() method.
One way out of this is to allow hooks to be local to the sublevel, i.e. hooks attached to a sublevel will not be triggered by changes in the parent db.

@dominictarr we patch a global instance, our database. I think that's global enough as we likely have not that many of these. However, I think it's mostly harmless in this case, as it's done in a prototype chain.

@mikeal
Copy link

mikeal commented Dec 9, 2013

are you going to bytewise encode the array and then write it as hex?

is it possible to introspect the leveldown instance to find if it is using binary encoding or not? then you could be much more agnostic of the leveldown that gets passed to you.

@dominictarr
Copy link
Owner Author

@mikeal you'll be able to pass in a thing that handles the key/prefix serialization, it will be global, and if you change it you'd need to rewrite all the keys, but it won't be mixed in with the rest of the sublevel code, so we'll be able to experiment with different approaches.

I think in node at least, binary keys work fine, the browser will need strings, and other things strings maybe.

@mcollina that depends strongly on what sorts of things you are doing with dynamic sublevels and hooks.
If you have patterns in the sublevels ($user/$post/$comment, etc) then it will be possible to have a hook that triggers dynamically across many sublevels, so there will be no memory problem...

If a remote client creates a sublevel, that won't create a sublevel on the server, it will just start making calls that pass a sublevel: path option.

regards global-ness, if you are putting all the data in one database, then that is global anyway, so if your data collides you are in worse trouble - this is still a lot more modular than any other database.

@mcollina
Copy link

mcollina commented Dec 9, 2013

@dominictarr so hooks will not be triggered across multilevel, right? If the server write something on 'my' db, and I have an hook on 'my' web page, that will never be called.

Regarding global-ness I agree.

@mikeal
Copy link

mikeal commented Dec 9, 2013

I think in node at least, binary keys work fine, the browser will need strings, and other things strings maybe.

I worry about the performance characteristics of this across implementations, in addition to the availability at all of binary key support. Ideally sublevel just works as much as possible out of the box with various leveldown configurations but has overrides available to the user as you suggest. One the most annoying usability issues with sublevel right now is that passing it certain leveldowns will fail in really unpredictable ways because it assumes a particular encoding, that isn't fixed by allowing me to override the logic, it'll still fail in the same unpredictable way and the only differences it that I can fix it in my own code without resorting to changing my key encoding. While that's a step in the right direction I would caution against this move that people seem to be making towards configurations/overrides rather than "just working" as they harm usability and adoption.

@gilesbradshaw
Copy link

forgive my newbie comment just getting into this but should I be able to query sublevels with level-queryengine? If a sublevel implemented leveldown presumably this would be possible?

@dominictarr
Copy link
Owner Author

@gilesbradshaw yes, that is already possible.

@mikeal that's impossible. Because if that was real, I know @mikeal would have posted an issue about it, because I know @mikeal, and @mikeal knows how to do open source. I don't know who you are, I can only presume you might be the pre-coffee version of @mikeal or something like that. If this is real, please go find the real @mikeal and have him post an issue about the encodings of sublevels.

@mcollina yes. hooks need to be atomic, so they can only work within a process. however, you can use hooks within a process to create a stream (like level-live-stream) that can be exposed over multilevel.
Also, you could also use a scheme like https://github.com/mikeal/SLEEP over a stream, and use this to build indexes asynchronously. https://github.com/dominictarr/level-trigger could also be used in a similar way.

@mikeal
Copy link

mikeal commented Dec 10, 2013

@dominictarr in current sublevel "bad things happen" when you pass a leveldown that is set with binary encoding. I hit this once and figured it was by design, that's what lead to me writing byteslice.

@dominictarr @mcollina could hooks be implemented as a stream? @maxogden did the work recently to move sleep-ref to a stream interface and it's quite nice.

@dominictarr
Copy link
Owner Author

The abstraction could not be cleanly supported over a distributed system,
and prehooks, which trigger before the record is inserted would need to become async,
to begin with, which means locks, etc.

post hooks would be simpler, but if you are adding post hooks dynamically in the client
the order of the listens will not be reliable.

basically, the reason for not having hooks over multilevel is that the things you can build with hooks
directly on the level instance would not work (correctly) if you tried to apply those to the multilevel client,
so, I think it's better to just leave that part out. The right way is to add the plugins on the server,
and then expose the functions you have added using a multilevel manifest.

@mikeal I thought I had fixed that, if you create a sublevel with encoding db.sublevel('mikeal', {encoding: 'json'}) it uses that encoding in that sublevel. can you post an issue on that with example code, so that we can at least discuss it separately to this issue?

@rvagg
Copy link
Contributor

rvagg commented Dec 10, 2013

fwiw this sounds rad to me

we probably need a beefed up test suite too

@mikeal
Copy link

mikeal commented Dec 10, 2013

@dominictarr it could definitely be fixed already, this was a while back that i tried, pre-NodeConf.eu for sure.

@dominictarr
Copy link
Owner Author

@mikeal oh, right. yeah, it's probably fix then, it used to inherit the parent sublevel encoding, but now each sublevel gets it's own encoding

@tarruda
Copy link

tarruda commented Dec 18, 2013

@dominictarr, I've recently written a module which may be useful to sublevel:
https://github.com/tarruda/buffer-prefix-range

What that module does is simple: Given a bytestring 'X', it will return two other
byte strings which can be used as gte/lt parameters of a leveldb query. This
query will filter out keys that dont have 'X' as a prefix.

The module's implementation is simple(about 50 LOC) but heres an REPL session
to illustrate whats being done:

> bufferPrefixRange=require('buffer-prefix-range')
[Function: bufferPrefixRange]
> bufferPrefixRange(new Buffer([1, 2, 3]));
{ start: <Buffer 01 02 03>, end: <Buffer 01 02 04> }

In other words, it increases the last byte of the string by 1 to obtain the
exclusive upper bound. Its intuitive to see that all range queries that have 'gte'
equal to 'start' and 'lt' equal to 'end' will only return keys that start with
[1, 2, 3]:

// example of keys returned by the range query(assume arrays are buffers)
[1, 2, 3]
[1, 2, 3, 4] // greater that [1, 2, 3] and less than [1, 2, 4]
// example of keys that wont be returned by the range query
[1, 2] // less that [1, 2, 3]
[1, 2, 4]

The purpose is enable 'prefix' queries in the spirit of 'SQL LIKE' queries (see
the module README for examples). I may be wrong, but it seems to me that the
problem level-sublevel is solving is a special case of that kind of query.
As an example, consider we want to create a sublevel named 'foo':

var foo = db.sublevel('foo');

foo.put('key1', 'value1');

Now sublevel's put method would convert 'key1' to 'foo\0key1'.
Query parameters would be handled by prepending the prefix:

foo.createReadStream({start: 'k1', end: 'k2'}...)
// would be translated to 
db.createReadStream({start: 'foo\0k1', end: 'foo\0k2'})

And the prefix would be removed when streaming keys from the db.

The '\0' terminator is to avoid ambiguities when dealing with a sublevels that
has another sublevel as a prefix(eg: a 'foobar' sublevel), assuming sublevel
only deals with strings as prefixes

@dominictarr
Copy link
Owner Author

@tarruda hmm, I'm confused, isn't end inclusive? does it use lt or lte?
I plan to make this part pluggable, or at least testable independently, so I may well be able to use your module.

@tarruda
Copy link

tarruda commented Dec 20, 2013

Its exclusive, so when passed to leveldb it will have to be stripped manually
in case theres a key that matches the 'end' paremeter.

Think of leveldb keys as real numbers: Whats the predicate that would match all
numbers "starting with" for example, 0.356. Here are some numbers that would
match this predicate:

  • 0.356
  • 0.3561
  • 0.356999999999

and here are numbers that wouldn't:

  • 0.357
  • 0.3559999999999999999

Its easy to see that the predicate is: 'all numbers greater than or equal to
0.356 and lesser than 0.357'. Its not possible to calculate the biggest number
matching the predicate, only the first number that doesn't. Its the same with
bytestrings.

So assume you want to select all keys that start with 'foo' in leveldb. 'foo' is
represented in ascii by [0x66, 0x6f, 0x6f] so the query would be:

db.createReadStream({
start: new Buffer([0x66, 0x6f, 0x6f]),
end: new Buffer([0x66, 0x6f, 0x70])    // <- last byte increased by 1
});

If there was a key equal to [0x66, 0x6f, 0x70](the first key not matching the
condition) then it would have to be excluded manually by sublevel's read stream.

@max-mapper
Copy link

in dat lately i've found it important (size/perf-wise) to drop the namespaces from sublevel keys, e.g. this is how sublevel does db.sublevel('foo').put('bar') (simplified example):

ÿfooÿbar

but what I do is

ÿbar

e.g. the keys arent self describing but I can keep them smaller and faster this way, and store the sublevel names (AKA column names) elsewhere and load them when the db opens

@tarruda
Copy link

tarruda commented Dec 21, 2013

That was just an example, nothing stops you from having integers as
prefixes and associate the integers with sublevel names into another
special prefix(eg: a schema namespace).

On Fri, Dec 20, 2013 at 11:07 PM, Max Ogden [email protected]:

in dat lately i've found it important (size/perf-wise) to drop the
namespaces from sublevel keys, e.g. this is how sublevel does
db.sublevel('foo').put('bar') (simplified example):

ÿfooÿbar

but what I do is

ÿbar

e.g. the keys arent self describing but I can keep them smaller and faster
this way, and store the sublevel names (AKA column names) elsewhere and
load them when the db opens


Reply to this email directly or view it on GitHubhttps://github.com//issues/46#issuecomment-31054355
.

@dominictarr
Copy link
Owner Author

@maxogden how much does it effect performance? do you have a benchmark?

@tarruda aha, yes. sublevel internally does this too, but creates the end key differently.
instead of doing {start: [1,2,3], end: [1,2,4]} it makes a longer end key:
{start: [1,2,3], end: [1,2,3, 255]}

Which has the same effect, without needing the extra filter.
That said - the new lte, gte, gt, le means the filter is already in leveldown.
I havn't been doing much level stuff recently, but last I checked the other leveldowns, etc,
need to support the new style ranges.

@mcollina mcollina mentioned this issue Jan 4, 2014
@dominictarr
Copy link
Owner Author

Okay, I started on a rewrite of this on the flight from Singapore to NZ,
it's not finished (currently just tests against a mock, not the real levelup)

I pushed what I have got to here: https://github.com/dominictarr/level-sublevel/tree/v6

@nrw
Copy link

nrw commented Feb 20, 2014

i'm just jumping in to say i'm also pumped about the rewrite.

@dominictarr
Copy link
Owner Author

feel free to jump in and help if you like! (that would probably get me to work on this again ;) the next thing is to make the tests run on a real leveldb (or you pass in a levelup - want the tests to run against every reasonable leveldown backend)

@snowyu
Copy link

snowyu commented Oct 27, 2014

I have add the feature to make the sublevel use the dynamic sublevels like the file path. but this is a broken changes. It will be totally changed if use the path and minimatch instead of prefix array.

the KeyPath branch try to keep compatibility, but It have to convert each other between array and path string. some attribute names have changed to make it obvious.

now you can use like this:

var LevelUp = require('levelup')
var Sublevel = require('level-sublevel')
var db = Sublevel(LevelUp('/tmp/sublevel-example'))

//old sublevel usage:
var stuff = db.sublevel('stuff')
var animal = stuff.sublevel('animal')
var plant = stuff.Sublevel('plant')

animal.put("pig", value, function () {})
plant.put("cucumber", value, function (err) {})

//new usage:
db.put("/stuff/animal/pig", value, function (err) {})
db.get("/stuff/animal/pig", function(err, value){})
animal.put("../plant/cucumber", value, function (err) {})
animal.batch([
  {key: 'key', value: 'Value', type: 'put'},
  {key: 'key', value: 'Value', type: 'put', path: plant},
  {key: '../plant/key', value: 'Value', type: 'put'},
], function (err) {...})

//list all keys in "/stuff/animal" path
db.createReadStream({path: "/stuff/animal"})

//list all keys in "/stuff/plant" path
animal.createReadStream({start: "../plant"})

//the path will always be absolute key path.
animal.setPath("/stuff/plant") //or animal.setPath(plant) 
//now the "animal" is on "plant" in fact.
animal.get("cucumber", function(err, value){})

//hooks a key, and the key can be relative or absolute key path and minimatch supports.
db.pre("p*", function (ch, add) {
  //NOTE: add(false) means do not put this key into storage.
  add({
    key: ''+Date.now(), 
    value: ch.key, 
    type: 'put',
    // NOTE: the path can be a sublevel object or a key path string.
    path: animal
  })
})

@dominictarr
Copy link
Owner Author

@snowyu this looks interesting you should publish this as a module!

@snowyu
Copy link

snowyu commented Nov 8, 2014

Do you suggest me release this as a new npm package? I have no idea. This just a little feature addition.
I am as busy as a bee to build a knowledge database base on it. Using bytewise as indexer is a good idea.

btw, codec/index.js#upperBound should be "\uffff" instead to support the unicode characters.

@dominictarr
Copy link
Owner Author

@snowyu this was released as level-sublevel@6 it was a small change, but still breaks the old way removing a few features, and internally is a complete rewrite. I recommend using require('level-sublevel/bytewise') with that you can use any js object as a key.

@snowyu
Copy link

snowyu commented Nov 9, 2014

It make me a little headache. My wish is only some sublevel uses bytewise not all. stored like this:
"/path/OneIndexer/bytewiseEncodedKey".

btw I have already published my branch as level-subkey.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests