Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization of bitstype Array is not cross-platform #11135

Open
carnaval opened this issue May 5, 2015 · 22 comments
Open

Serialization of bitstype Array is not cross-platform #11135

carnaval opened this issue May 5, 2015 · 22 comments
Labels
arrays [a, r, r, a, y, s] bug Indicates an unexpected problem or unintended behavior system:32-bit Affects only 32-bit systems

Comments

@carnaval
Copy link
Contributor

carnaval commented May 5, 2015

Serializing a bitstype array is done by copying the memory directly, however the field offsets are set to conform to the local C ABI.
What we should probably do :

  • include a small header specifying the ABI things were written with OR field offsets with bitstype information if we don't want to hardcode knowledge of every ABI everywhere
  • have a fallback "slow" read which goes element by element if the offsets are not the same
  • once it is done we can serialize bitstype with memcpy (the same we do for array of those) instead of each field separatly
@ScottPJones
Copy link
Contributor

I'd agree on your ideas to fix this... I think cross-platform serialization could be important
(I'm not sure what people do in Julia now for this... use some fancy thing like HDF5?)

@nalimilan
Copy link
Member

@ScottPJones Yes, see https://github.com/timholy/HDF5.jl But please use the mailing list for general questions, to keep GitHub issues focused.

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 5, 2015

include a small header

probably a good idea. the current implementation of the serializer pretends to be stateless, so that would need to change first. (searching the github issues list should come up with a few related issues)

@simonster
Copy link
Member

You could also use a staged function to generate efficient code when offsets are not the same. This is basically the approach we take in JLD, where we currently pack everything on disk. Of course the tradeoff is codegen time vs. execution time.

@JeffBezanson
Copy link
Sponsor Member

Has this actually caused a problem, or is it just theoretical at the moment?

I think it is way too complicated to include a header and force all consumers to handle all possible formats. We should have one standard format instead, e.g. make x86_64 the official format, or require alignment padding to be removed (always something to be said for sending as few bytes as possible).

@tkelman
Copy link
Contributor

tkelman commented May 5, 2015

There have been real JLD issues raised on this. JuliaIO/HDF5.jl#209 and a few others.

If we pick a standard format then we need to at least teach every platform to write to it correctly. Maybe somewhat easier than teaching every platform to read every other format correctly.

@simonster
Copy link
Member

@tkelman The problem there is not the ABI (which is not a problem for JLD, because we store everything packed), but that, if a type has an Int field, the type itself differs between 32-bit and 64-bit systems. Handling those kinds of conversion definitely requires type metadata and is important for a data interchange format, but I'm not sure it's within the scope of serialize.

@JeffBezanson
Copy link
Sponsor Member

I think it would be nice to handle Int. Realistically, people will use serialize for more than we intend. In the current serializer it would be easy to support; we can just add a different tag to say whether each Int was a 32-bit Int or a 64-bit Int. Of course that is difficult to adapt to a more compact format for structs.

@carnaval
Copy link
Contributor Author

carnaval commented May 5, 2015

Since we use the base serializer for multiprocess communication, it is also a performance concern since it should be a lot faster (or at least easier to optimize ?) to emit a bits array in native format. In theory we should be able to bring that down to zero-copy by using platform specific stuff ?

It may be easier to have it configurable with some flags (compact/crossplatform/fast/...). But well, this is all vapor until someone steps up and implement something :-)

@ScottPJones
Copy link
Contributor

@JeffBezanson I'd dealt with this a long time ago, and simply packed integers (and scaled decimals and floats), by removing 0s (or 0xffs for negative #s), the output was always little-endian, each value had an indication of it's length... very good for storage, also very compact for transmission (in the days when the "network" could be just a serial line), no need for header... however, not so fast for multiprocess communication (but then, you wouldn't have a cross-platform issue anyway).

@JeffBezanson
Copy link
Sponsor Member

Yes I think we should use a format like that. So far the serializer has only targeted homogeneous clusters.

Actually the serializer is already very close to being able to support mismatched Ints. We just need its setfield call to accept Int32 and Int64 interchangeably (perhaps erroring out on truncation).

@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@JeffBezanson
Copy link
Sponsor Member

See also #1295

@JeffBezanson JeffBezanson modified the milestones: 1.0, 0.6.0 Jan 4, 2017
@Keno Keno modified the milestones: 1.x, 1.0 Jul 20, 2017
@Keno
Copy link
Member

Keno commented Jul 20, 2017

Moving this off the 1.0 milestone. The 1.0 stability guarantee need not cover the serialization format. What we should do however is add a version number header, so we can at least detect compatibility.

@Keno
Copy link
Member

Keno commented Jul 20, 2017

Cross ref #7909

@aplavin
Copy link
Contributor

aplavin commented Mar 2, 2024

Wow, didn't expect this issue to be that long-standing!
Recently noticed an example, reported in #53544.

Can this reasonably be fixed? Presumably, it not only affects explicit serialization, but also running Distributed with workers on another machine.

@oscardssmith
Copy link
Member

wow this is awful. It's definitely fixable, it just likely would be a pain. I think this is one where everyone forgot about it in the rush to 1.0 and never picked it back up.

@oscardssmith oscardssmith added bug Indicates an unexpected problem or unintended behavior arrays [a, r, r, a, y, s] labels Mar 2, 2024
@oscardssmith
Copy link
Member

for a possible smaller MWE, does isbitstype(typeof((a = Int128(2), b = Int128(1),))) return false on M1?

@aplavin
Copy link
Contributor

aplavin commented Mar 2, 2024

true on M2, don't know about M1.

@oscardssmith
Copy link
Member

that's very odd then. Int128 is basically the same as InlineString15 (they're both 128 bit primitive types).
Does

obj = [(a = Int128(1), b = Int128(2),)]
# computer 1:
serialize("file", obj)
# computer 2:
deserialize("file", obj)

reproduce the issue? I am having some trouble figuring out what the difference in representation is. They should have the same size, and same padding (unless the alignment is different?). @vtjnash do you know what would cause this issue to appear for this example?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 2, 2024

I am not sure. I think the new reinterpret code might help though, since it knows how to remove padding bytes. Some type sizes might still be different, but it may reduce the differences.

@aplavin
Copy link
Contributor

aplavin commented Mar 2, 2024

@oscardssmith

  • [(a = Int128(1), b = Int128(2),)] serializes-deserializes just fine
  • [(a = Int128(1), b = Int8(2),)] fails
  • [(a = InlineString15("a"), b = InlineString("b"),)] fails

So, fails when sizes are different I guess.

@oscardssmith
Copy link
Member

ok, that makes a tiny bit of sense. So the difference is where in the union the smaller value is being put.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] bug Indicates an unexpected problem or unintended behavior system:32-bit Affects only 32-bit systems
Projects
None yet
Development

No branches or pull requests