-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: incremental-hasher #261
base: master
Are you sure you want to change the base?
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,4 +1,5 @@ | ||||||||||||||||||||||||||
// # Multihash | ||||||||||||||||||||||||||
import type { MulticodecCode } from '../block/interface.js' | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Represents a multihash digest which carries information about the | ||||||||||||||||||||||||||
|
@@ -9,7 +10,7 @@ | |||||||||||||||||||||||||
// a bunch of places that parse it to extract (code, digest, size). By creating | ||||||||||||||||||||||||||
// this first class representation we avoid reparsing and things generally fit | ||||||||||||||||||||||||||
// really nicely. | ||||||||||||||||||||||||||
export interface MultihashDigest<Code extends number = number> { | ||||||||||||||||||||||||||
export interface MultihashDigest<Code extends MulticodecCode = MulticodecCode, Size extends number = number> { | ||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Code of the multihash | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
|
@@ -23,7 +24,7 @@ export interface MultihashDigest<Code extends number = number> { | |||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* byte length of the `this.digest` | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
size: number | ||||||||||||||||||||||||||
size: Size | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Binary representation of this multihash digest. | ||||||||||||||||||||||||||
|
@@ -35,7 +36,7 @@ export interface MultihashDigest<Code extends number = number> { | |||||||||||||||||||||||||
* Hasher represents a hashing algorithm implementation that produces as | ||||||||||||||||||||||||||
* `MultihashDigest`. | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
export interface MultihashHasher<Code extends number = number> { | ||||||||||||||||||||||||||
export interface MultihashHasher<Code extends MulticodecCode = MulticodecCode> { | ||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Takes binary `input` and returns it (multi) hash digest. Return value is | ||||||||||||||||||||||||||
* either promise of a digest or a digest. This way general use can `await` | ||||||||||||||||||||||||||
|
@@ -67,6 +68,67 @@ export interface MultihashHasher<Code extends number = number> { | |||||||||||||||||||||||||
* `SyncMultihashHasher` is useful in certain APIs where async hashing would be | ||||||||||||||||||||||||||
* impractical e.g. implementation of Hash Array Mapped Trie (HAMT). | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
export interface SyncMultihashHasher<Code extends number = number> extends MultihashHasher<Code> { | ||||||||||||||||||||||||||
export interface SyncMultihashHasher<Code extends MulticodecCode = MulticodecCode> extends MultihashHasher<Code> { | ||||||||||||||||||||||||||
digest: (input: Uint8Array) => MultihashDigest<Code> | ||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Incremental variant of the `MultihashHasher` that can be used to compute | ||||||||||||||||||||||||||
* digest of the payloads that would be impractical or impossible to load all | ||||||||||||||||||||||||||
* into a memory. | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
export interface IncrementalMultihashHasher< | ||||||||||||||||||||||||||
Code extends MulticodecCode, | ||||||||||||||||||||||||||
Size extends number, | ||||||||||||||||||||||||||
Digest = MultihashDigest<Code, Size> | ||||||||||||||||||||||||||
> { | ||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Size of the digest this hasher produces. | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
size: Size | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Code of the multihash | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
code: Code | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Name of the multihash | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
name: string | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Number of bytes that were consumed. | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
count(): bigint | ||||||||||||||||||||||||||
Comment on lines
+100
to
+103
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Let's just drop this method, we can revisit if we find it really necessary. |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Returns multihash digest of the bytes written so far. Should not have | ||||||||||||||||||||||||||
* side-effects, meaning you should be able to write some more bytes and | ||||||||||||||||||||||||||
* call `digest` again to get the digest for all the bytes written from | ||||||||||||||||||||||||||
* creation (or from reset) | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
digest(): Digest | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Computes the digest of the given input and writes it into the provided | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do you mean "of the bytes written so far"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||||||||
* `output` at given `offset`. The `offset` is optional and if omitted is | ||||||||||||||||||||||||||
* implicitly `0`. The optional `asMultihash` whith is implicitly `true`, | ||||||||||||||||||||||||||
* can be use to control whether multihash prefix is written, if `false` | ||||||||||||||||||||||||||
* only the raw digest writtend omitting the prefix. | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
digestInto(output: Uint8Array, offset?: number, asMultihash?: boolean): this | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Return
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if there's a better convention for method name when receiving a parameter to mutate.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I went into this in the linked issue. In practice I have encountered many instances where I do need to leave out the prefix, I could go and use another library in those instances, but seems like we could just expose this. That sayid I agree that this method is kind of meh and we could do better. @vasco-santos suggested having a whole another method, personally I'm wondering if perhaps we should have methods just to get digest without a prefix as this is low level API anyway ?
Suggested change
Or alternatively something like this
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm happy to call this something, else I was trying to align with
Works for me although I had to google to figure out what Alternatively we could just have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I personally find mutations that return values misleading, that said I'm amendable to the idea There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
BYOB was aiming for familiarity with https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamBYOBReader etc. but I guess that was a miss 😆 |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Writes bytes to be digested. | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
write(bytes: Uint8Array): this | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typically in streaming hashers this is called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with calling it update although I do find that name confusing personally as I think of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, but streaming hashers aren't appending to a buffer they are updating their internal state with the new data you pass. |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
/** | ||||||||||||||||||||||||||
* Resets this hasher to its initial state. Can be used to recycle this | ||||||||||||||||||||||||||
* instance. It resets `count` and and discards all the bytes that were | ||||||||||||||||||||||||||
* written prior. | ||||||||||||||||||||||||||
*/ | ||||||||||||||||||||||||||
reset(): this | ||||||||||||||||||||||||||
} | ||||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if someone is hashing >9PiB of data in JS then 👏👏👏.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ... is this overkill?