Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boolean should be a valid key #76

Open
yathit opened this issue May 2, 2016 · 19 comments
Open

boolean should be a valid key #76

yathit opened this issue May 2, 2016 · 19 comments
Labels
feature request queries TPAC2024 Topic for discussion at TPAC 2024
Milestone

Comments

@yathit
Copy link

yathit commented May 2, 2016

Being only two possible values, boolean data type is not a good key by itself. But query do involve boolean value. For examples, we want to count numbers of records in which a field value is false or true. In this case, full table scan is required. A workaround like shadow field defeat the original argument (that boolean is not a good key).

Boolean value is also useful in compound index. It is not possible now.

Key comparison can be defined that all number values are greater then all boolean values. Should modification could work without breaking existing database.

Indexing boolean value is such a common gotcha in indexeddb usage.

Since boolean is a primitive data type, all other nontrivial data types are valid key, why not boolean? For consistency sake, boolean should be a valid key.

@inexorabletash
Copy link
Member

Thanks for linking to the StackOverflow issue!

This seems reasonable, although we'd need to define sorting precedence. If boolean sorts before number then -Infinity ceases to be the first possible key. Also, I know Gecko's implementation of key serialization is carefully crafted to allow "memcmp" sorting, so I'm not sure how easy it is for them to add a new type.

@sicking
Copy link
Contributor

sicking commented May 5, 2016

As long as bools are "grouped together" it should be fine. I.e. as long as we don't do something crazy like sort true before number and false after.

@dfahlander
Copy link

👍

@nolanlawson
Copy link
Member

CouchDB's collation order is null -> booleans -> numbers -> strings -> arrays -> objects, FWIW.

@jakearchibald
Copy link
Contributor

Hah, I just got tripped up by this when trying to create an index on ['active', 'startDate'], where active is a boolean.

@brettz9
Copy link
Contributor

brettz9 commented Apr 24, 2017

Incidentally, what about undefined or null for that matter too (as for index keypaths set to test for the presence of null or (explicit or even sparse) undefined)?

And slightly off topic but interconnected, what exactly is the reason for the IdentifierName restrictions (such as no whitespace) comprising key paths (and the lack of a mode to escape full stops to allow for targeting any path)? Is there some crucial low-level reason for this, or was the decision made just to avoid foot-guns with inadvertently untrimmed paths?

It seems to me a pity that each point in a cloneable (or at least JSON) object can't, as it appears to me, be targeted for querying (and updating for that matter).

@alastairpatrick
Copy link

I would like to see Boolean become a valid key type in IndexedDB. I have some concerns about where Boolean might appear relative to other key types in the key ordering. I am writing this against the IndexedDB spec dated 8/1/15 but am aware of the IndexedDB 2 draft.

The current IndexedDB key ordering is Number < Date < String < Array. The Booleans could be inserted in one of five places in the existing order. My notes on each option follow.

1. Boolean < Number < Date < String < Array

Currently, the least value that can be used as a key is -Infinity. If Booleans came first instead of Numbers, the least key value would become Boolean false.

More importantly, today a key with array value [1, 2, 3] is immediately followed by [1, 2, 3, -Infinity], meaning there is no key after [1, 2, 3] that comes before [1, 2, 3, -Infinity]. If Booleans were to be the first key type, [1, 2, 3] would instead be immediately followed by [1, 2, 3, false], potentially a breaking change.

Code based on the current specification that uses IDBKeyRange.bound([1, 2, 3], [1, 2, 3, -Infinity], false, true) to find records equal to [1, 2, 3] would come to spuriously yield records equal to and prefixed with [1, 2, 3, false] and [1, 2, 3, true].

The reason such code might not use the simpler IDBKeyRange.only([1, 2, 3]) for this purpose is because of composite key paths. Consider a composite key path ["a", "b"]. Attempting to find all records with "a" equal to [1, 2, 3] with IDBKeyRange.only([1, 2, 3]) would yield no results because, with the value of "b" appended, all records with "a" equal to [1, 2, 3] come after IDBKeyRange.only([1, 2, 3]) in the key ordering. To account for this, the solution is to use a closed upper bound set to the value immediately following [1, 2, 3], which can be calculated using the rules set forth in the specification.

2. Number < Boolean < Date < String < Array:

If Boolean were inserted between Number and Date, the key value immediately following Infinity would change from being Date(-8640000000000000), the earliest representable date, to being Boolean false. Additionally, the key immediately preceding Date(-8640000000000000) would change from being Infinity to being Boolean true.

Attempts by existing code to use ranges like IDBKeyRange.bound(Infinity, new Date(-8640000000000000), false, true) to find records with "a" equal to Infinity in composite keys paths ["a", "b"] would come to yield spurious results.

3. Number < Date < Boolean < String < Array:

If Boolean were inserted between Date and String, the key value immediately following Date(8640000000000000) would change from being the empty String("") to being Boolean false.

Attempts by existing code to use ranges like IDBKeyRange.bound(new Date(8640000000000000), "", false, true) to find records with "a" equal to Date(8640000000000000) in composite keys paths ["a", "b"] would come to yield spurious results.

4. Number < Date < String < Boolean < Array:

If the Booleans were inserted between String and Array, I think things look good. While there are still ranges that will yield different lists of records, there are ways to work with this change to the ordering, at least with respect to its interaction with composite key paths.

Importantly, inserting Booleans between Strings and Arrays does not result in any case where the key that immediately follows some other key is different from the way it was before. The reason is because there would be no string followed immediately by false in the key ordering; rather, the value immediately following any string in the ordering is always that string juxtaposed with the null character '\0'.

One caveat is Internet Explorer / Edge. Neither of these (henceforth IE) implement array keys. A cross-browser workaround is to use String keys with a special array encoding instead of Array keys in IE. If Booleans were inserted between Strings and Arrays, Booleans would follow Arrays (aka Strings) in IE while Booleans would precede Arrays in other browsers.

5. Number < Date < String < Array < Boolean:

As noted above, the value immediately following any array in the ordering is that array appended with the value -Infinity, e.g. [1, 2, 3] is immediately followed by [1, 2, 3, -Infinity]. Similar to Booleans following Strings, this means introducing Boolean after Array would not change the key that immediately followed any other key, and seems workable.

I am wary that where before the ordering terminated with the "infinite" Array keys, the ordering would come to terminate with the finite Boolean keys. This means the greatest key value would come to be a tangible value, specifically Boolean true. This invites code that actually uses Boolean true as a bound that is assumed to be greater than all others, which could impede adding further new key types after the Booleans in the future.

Summary

If Booleans are to become valid keys, as a general rule, I think they should follow one of the “infinite” key types, either String or Array, so that in no case does the key that immediately follows any other key change. This is so that existing key ranges work as anticipated in association with composite key paths. To ensure that the greatest key value does not have a finite representation, I think I prefer to put the Booleans between the Strings and the Arrays. However, if a future IE is expected to support Boolean keys without also supporting Array keys, there might be an argument for having Boolean follow Array, so that existing hacks to support Arrays in IE continue to work consistently in a world with Boolean keys. Of course, I would much prefer IE to support Array keys.

I suggest addressing other potentially indexable key types, such as null, at the same time, to make sure everything plays nice. Changes like this are perilous!

Going forward, this kind of issue, where the ordering of one type of key relative to others is relied upon outside of the IndexedDB implementation, could be avoided if the APIs for querying records took separate ranges for each element of a composite key path rather than a single key range for the whole composite key path.

Here is existence proof of code that would be impacted by the introduction of Boolean key values as described in the examples above. Its vulnerability to the specification of key ordering is clear in the nextUp function, which is at the top of the linked file at the time of writing. I am sure other applications and libraries would be affected too.

@nolanlawson
Copy link
Member

It looks like we can close this issue? Binary keys are now in IDB 2.0, and there is also a definition of the sorting algorithm here:

As a result of the above rules, negative infinity is the lowest possible value for a key. Number keys are less than date keys. Date keys are less than string keys. String keys are less than binary keys. Binary keys are less than array keys. There is no highest possible key value. This is because an array of any candidate highest key followed by another key is even higher.

@inexorabletash
Copy link
Member

This issue is about supporting the boolean types true and false. The binary type is shorthand for ArrayBuffer keys.

@nolanlawson
Copy link
Member

My bad. 😅 Yes, boolean keys would be useful.

@inexorabletash
Copy link
Member

Yeah, would be good to add. No rush to implement on our side.

Good first spec PR for someone once we are done with 2.0 ?

@inexorabletash
Copy link
Member

Anyone still a fan of this idea?

@pwnall
Copy link

pwnall commented Jun 17, 2019

Met with a customer today who wanted to use boolean keys. They ended up using 0/1, and it didn't seem like a big deal.

I'd be most excited about using boolean keys as an opportunity to fix the lack of explicit "smallest possible key" / "largest possible key" markers. Could we make false be the smallest key, make true the largest key, and promise we'll never break this invariant?

@brettz9
Copy link
Contributor

brettz9 commented Jun 17, 2019

I'd be interested in a reply on my earlier comment.

It would seem to me to be of much greater convenience to users if they wouldn't need to refactor, at least when using simple types like booleans (or undefined or null), if moving to IndexedDB or using kv-store. I'd imagine booleans are pretty common in local storage. (A kv-store example uses them here: https://wicg.github.io/kv-storage/#example-backingstore ; it may come as an unwelcome surprise that one can't take advantage of the "indexed" nature behind data so stored even after obtaining the backing store.)

This and IdentifierName restrictions seem to discourage taking advantage of indexes on common data store structures, e.g., those originally as JSON, and data that might not be possible or desirable to change at the source or incur the cost of transforming to (and possibly from) numeric values.

Imho, it would facilitate more indexing if consumers wouldn't need to reshape their data so as to gain the ability to add indexes.

@dfahlander
Copy link

dfahlander commented Jun 18, 2019

It would be natural to allow index for basic types: number, string, array and boolean. It is a little of a surprise for many users that booleans are not indexable. There are use cases also for indexable nulls and undefined as well, particularly when using the index for sorting. But backward compability would break if starting to index undefined or null. And there are many use cases where undefined should not be indexed because its also an optimization to having the possibility to not index every object in the table.

If ever null or undefined where to be indexable, I think this should be an opt-in flag when creating the index. For primary keys (Object stores) these types would not make sense.

Boolean index would not be as ground-breaking as null/undefined imho. IDB2.0 added support for binary keys without an opt-in flag and that did (in theory) break backward compatibility as well.

@inexorabletash inexorabletash modified the milestones: V3, vFuture Sep 20, 2019
@inexorabletash
Copy link
Member

TPAC 2019 Web Apps Indexed DB triage notes:

While interesting, it is not blocking use cases, nor is it a significant performance win, so not something we think anyone will tackle soon. Leaving it open for the future, e.g. when more advanced key paths are tackled.

@shadow-light
Copy link

It could be a significant performance win if you had for example:

{image:Blob, seen:boolean}

If you want to count how many images have been seen you have to clone them all and count manually, which is a pretty huge performance loss. I have this situation already, but fortunately the data is just strings rather than Blobs.

@dumbmatter
Copy link

I have a slow page in my app due to a full table scan to check a boolean field (looking for true values, where ~1% of values are true).

My options are:

  1. Do nothing, let it be slow.
  2. Switch from true/false to 1/0 - is that actually a problem? Kind of. It's a very slow upgrade, since I need to read every single object in the store into memory, and there are a lot of them, and they are large. Bad UX.
  3. Complain here :)

I feel like any argument against supporting booleans here is not valid unless it would also be made against supporting booleans in JavaScript in general. Because they're desired here for the same reasons they are desired anywhere.

But I understand, we can't go back in time and add it to the initial spec. Instead, we need people to spend a non-trivial amount of time to add this tiny little feature that most people will never notice. And I know you all have more important stuff to do. But if I've earned myself any goodwill in the community by writing an IndexedDB library used in thousands of projects, including some made by the major browser vendors... please do something about this issue :)

@SteveBeckerMSFT SteveBeckerMSFT added the TPAC2024 Topic for discussion at TPAC 2024 label Sep 9, 2024
@SteveBeckerMSFT
Copy link

TPAC 2024: We should fix. However, since the fix may require a schema change, it might be most expedient for browsers to wait for additional types to implement like Temporal so we can update multiple types at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request queries TPAC2024 Topic for discussion at TPAC 2024
Projects
None yet
Development

No branches or pull requests