Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homogenise types to int64 #36

Closed
wants to merge 2 commits into from
Closed

Homogenise types to int64 #36

wants to merge 2 commits into from

Conversation

JustinDrake
Copy link
Collaborator

See discussion here #34 (comment)

@djrtwo
Copy link
Contributor

djrtwo commented Oct 3, 2018

aw man, I liked our compact int types!
I'm going to marinate on this until the morning before merging 😆

@JustinDrake
Copy link
Collaborator Author

Do marinate :)

Don't you think it's nice to have a definition/optimisation separation of concerns? Designers need simple, clear, homogeneous definitions. Implementers who need the speed can go ahead and optimise.

@hwwhww hwwhww added the general:RFC Request for Comments label Oct 5, 2018
@mkalinin
Copy link
Collaborator

mkalinin commented Oct 5, 2018

Implementers who need the speed can go ahead and optimise.

Won't it be convenient to make those suffixes reflect real sizes that are used by SSZ encoding?
It would make the spec clearer. On the other side it would allow implementers to decide on the types they use in their structures. Going further, it might make sense to replace int type with uint as int is not supported by SSZ.

Maybe it doesn't make much sense in usage of int8 for SpecialObject.type, it would even be good to have int32 there. But it should be considered that changing committee type from [int24] to [int64] significantly increases size of its byte representation.

What do you think about using int32/64/128 in all cases where it doesn't significantly affect the size of the structure? And continue to keep more precise types like int24 for size-sensible arguments.

@JustinDrake
Copy link
Collaborator Author

JustinDrake commented Oct 5, 2018

it might make sense to replace int type with uint

I agree that uint makes semantic sense pretty much everywhere (a possible exception is balance where penalties could make the balance negative).

it should be considered that changing committee type from [int24] to [int64] significantly increases size of its byte representation

Right. I'm wondering if storing shard_and_committee_for_slots in the crystallised state is overkill and should be avoided. Indeed, shard_and_committee_for_slots is redundant information once you have the required inputs to get_new_shuffling such as randao_mix. cc @djrtwo @vbuterin

What do you think about using int32/64/128 in all cases where it doesn't significantly affect the size of the structure?

Seems reasonable. Note that int128 is current only used for balance and this pull request makes it a int64.

@JustinDrake
Copy link
Collaborator Author

JustinDrake commented Oct 8, 2018

I've reverted the committee indices back to int24, addressing @mkalinin's remark. This should hopefully make the pull request less controversial.

@djrtwo
Copy link
Contributor

djrtwo commented Oct 9, 2018

@JustinDrake regarding storing the shuffling in state. Might we want to be able to serve the ShardAndCommittees to clients without having to force them to run the shuffling alg?

@mkalinin
Copy link
Collaborator

mkalinin commented Oct 9, 2018

@JustinDrake wouldn't be int64 an overkill for shard_id, type and status fields?

@arnetheduck
Copy link
Contributor

From an implementation point of view, I believe there's little room to optimize, should the spec spell out a concrete size for an integer that is "too large" without explicitly specifying the range of valid values.

Implementations are limited in their optimizations by how other clients are guaranteed to behave, and a more precise integer type limits place practical bounds of what other clients can send.

In the situation of loosely specified sizes, implementers are left with two unsavory choices:

  • "Interpret" the spec and put their own, arbitrary constraints on accepted values - essentially what happens is that the "politically strongest" client gets to decide what is acceptable. A similar situation exists with gas in the EVM - everything is 256-bit, but practically, "sensible" gas limits are smaller and implementations simply ignore the problem (stTransactionTest/OverflowGasRequire\.json` // gasLimit > 256 bits tests#227).

  • Forgo optimizations

From a simplicity point of view, as soon as there is any integer of a different size, it is equivalent to there being many of them - the machinery to support multiple integer sizes must be put in place, regardless. For example, keeping everything as int64 helps mitigate alignment issues. If int24 is introduced, we will still need to deal with alignment - likewise, the existence of hash32 means that implementer need to tread with care - as such, I see no noteworthy simplicity gain for implementations from having some fields larger than they need to be.

Another point is that the encoding of fields and their constraints / specification are two orthogonal concerns - for example, if the shard_id is guaranteed to be within the range of an int16, the spec can spell that out, so as to create a safe space for implementers to optimize. What the serialization looks like can at that point be discussed separately - including concerns over uint vs int etc.

If keeping integers simple for designers in the spec is an explicit goal, an alternative is to simply call them integers, spelling out their valid ranges or minimum constraints on what values they must support, instead of bit sizes. This would have a couple of benefits:

  • Implementations are free to use homogeneous or optimized types as they like (ie int in python vs sized types in typed languages)
  • More freedom in choosing / revisiting serialization format as we progress with practical implementation - for example, we could investigate the use varint encodings without requiring the rest of the spec to be updated - which has several benefits from a network and storage efficiency point of view - a key area that client implementers are in a good spot to experiment with as the spec firms up.

@paulhauner
Copy link
Contributor

To echo @mkalinin, I understand the spec still needs to be concerned about integer sizing when it comes to serialization. If this is the case, it seems confusing to call some variable a unit64 then later declare it's a unit16 if you hash it or send it over the wire.

@arnetheduck's comments around either supplying exact uint sizes or replacing them with mini-specs also makes sense.

@djrtwo
Copy link
Contributor

djrtwo commented Oct 30, 2018

This seems at least contentious enough to not merge. @JustinDrake Can you bring it up on our next call for discussion if it is something you would still like to integrate?

@djrtwo djrtwo closed this Oct 30, 2018
@hwwhww hwwhww deleted the homogenised-types branch January 31, 2019 15:54
hwwhww pushed a commit that referenced this pull request Aug 17, 2020
Also update binary output due to metadata change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
general:RFC Request for Comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants