Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast conversion from decimal to bytes #143

Open
dapplion opened this issue Aug 10, 2021 · 8 comments
Open

Fast conversion from decimal to bytes #143

dapplion opened this issue Aug 10, 2021 · 8 comments
Assignees

Comments

@dapplion
Copy link
Contributor

dapplion commented Aug 10, 2021

Lodestar spends significant CPU time converting uint types from decimal to bytes and back. For example, converting the balance tree to flat and back consist of doing 200_000 ops of those.

SSZ is the one hosting the code that does the conversions. This is one of the implementations that participate in this conversions.

struct_serializeToBytes(value: number, output: Uint8Array, offset: number): number {
if (this.byteLength > 6 && value === Infinity) {
for (let i = offset; i < offset + this.byteLength; i++) {
output[i] = 0xff;
}
} else {
let v = value;
const MAX_BYTE = 0xff;
for (let i = 0; i < this.byteLength; i++) {
output[offset + i] = v & MAX_BYTE;
v = Math.floor(v / 256);
}
}
return offset + this.byteLength;
}

struct_deserializeFromBytes(data: Uint8Array, offset: number): number {
this.bytes_validate(data, offset);
let isInfinity = true;
let output = 0;
for (let i = 0; i < this.byteLength; i++) {
output += data[offset + i] * 2 ** (8 * i);
if (data[offset + i] !== 0xff) {
isInfinity = false;
}
}
if (this.byteLength > 6 && isInfinity) {
return Infinity;
}
return Number(output);
}

We should investigate faster alternatives

@g11tech
Copy link
Contributor

g11tech commented Aug 10, 2021

will investigate 👍

@twoeths
Copy link
Contributor

twoeths commented Aug 11, 2021

with ChainSafe/persistent-merkle-tree#52, we can get deserialized number if it contains less than 32 bits (right now we have to convert h0..h7 to Uint8Array and from Uint8Array back to number)

  it.only("get deserialized number from h0", () => {
    const BeaconState = new ContainerType({
      fields: {
        slot: number64Type,
      }
    });
    for (let slot = 2 ** 31 - 100; slot <= 2 ** 31; slot++) {
      const tb = BeaconState.createTreeBackedFromStruct({slot});
      const node = tb.tree.getNode(BigInt(1));
      expect(Math.abs(node.h0)).to.be.equal(slot, "failed at slot " + slot);
    }
  });

it should work with slot, committee index etc. but not with balance

@dapplion
Copy link
Contributor Author

dapplion commented Aug 12, 2021

I've tested using TypedArrays to convert numbers and it's actually slower than the current implementation in as-sha256

  utils
    ✓ hashObjectToByteArray 50023 times                                   407.5760 ops/s    2.453530 ms/op   x1.406       4073 runs   10.0 s
    ✓ byteArrayToHashObject 50023 times                                   401.1899 ops/s    2.492585 ms/op   x1.174       4009 runs   10.0 
/**
 * Pass 8 numbers in an object and set that to inputArray.
 * This function contains multiple same procedures but we intentionally
 * do it step by step to improve performance a bit.
 **/
export function hashObjectToByteArray(obj, byteArr, offset) {
  const uint32 = new Uint32Array(byteArr.buffer);

  uint32[0] = obj.h0;
  uint32[1] = obj.h1;
  uint32[2] = obj.h2;
  uint32[3] = obj.h3;
  uint32[4] = obj.h4;
  uint32[5] = obj.h5;
  uint32[6] = obj.h6;
  uint32[7] = obj.h7;
}

/**
 * Parse outputArray into an object of 8 numbers.
 * This is the order that makes Uint32Array the same to Uint8Array
 * This function contains multiple same procedures but we intentionally
 * do it step by step to improve performance a bit.
 **/
export function byteArrayToHashObject(byteArr) {
  const uint32 = new Uint32Array(byteArr.buffer);

  return {
    h0: uint32[0],
    h1: uint32[1],
    h2: uint32[2],
    h3: uint32[3],
    h4: uint32[4],
    h5: uint32[5],
    h6: uint32[6],
    h7: uint32[7],
  };
}

What makes it so slow is the constructor Uint32Array. It takes 90% of the CPU time of that function. Then setting or reading numbers is very fast.

@g11tech
Copy link
Contributor

g11tech commented Aug 20, 2021

@dapplion just to confim, the endianness in ssz serialization is little?

@dapplion
Copy link
Contributor Author

@dapplion just to confim, the endianness in ssz serialization is little?

Yes it's little endian

@g11tech
Copy link
Contributor

g11tech commented Aug 22, 2021

  1. as first step, it seems that dataview based to_bytes is at least 2 times faster than the current to_bytes loop for >32 bit value for unit64 and 3.5 times better than <32 bit value for unit64
    however when i plugged it in the speedup was just 2.5%-3.5% (another 2.5%-3.5% can potentially come in the reverse leg from bytes to number so may be 5-7% overall with to bytes optimization).

image

  1. Will continue work on the other side leg (bytes to number) and corroborate the above hypo of potential 7% speedup.
  2. while figuring out the optimization and code flow, established other points for potential optimization regarding the use of dataview but will do further code drilldown before establishing the viability of this scenario.
  3. as discussed before, another potential optimization in making merkelization multicore optimized. will tackle that in separate PR.

@dapplion
Copy link
Contributor Author

@g11tech Thanks! Keep us updated

@g11tech
Copy link
Contributor

g11tech commented Aug 29, 2021

@dapplion got 88% speedup on normal uint tree backed (will need to check on > 2^32 number increment test but should be 30% plus over there), marginal 4% better performance on the new uint64 tree backed. will continue checking this further a bit more.

before opt
image

after opt:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants