Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unions of Complex Objects #174

Closed
cisaacson opened this issue May 7, 2018 · 2 comments
Closed

Unions of Complex Objects #174

cisaacson opened this issue May 7, 2018 · 2 comments
Labels

Comments

@cisaacson
Copy link

cisaacson commented May 7, 2018

We are using avsc to serialize/deserialize to Avro objects using NodeJS and C++, your library works very well. I am working only on the NodeJS side so far. We have a need to support unions of record types. I read another issue that suggested using either Wrapped types, or a custom LogicalType. I have a couple of questions related to the best approach for this:

  • The Wrapped union works, we did something like the example below. The wrappedAvro contains our specialized LogicalType implementations. I noticed that when we use wrapped unions, the fromBuffer method also returns wrapped unions. First question:

If we used wrapped unions, is that just internal to avsc related code in Javascript? Would a C++ Avro implementation using codegen be able to read the normal union serialization?

  • The LogicalType looks like a better way to go in the long run. We understand the concept of including a type field to serve the purpose of the Wrapped union property name, however it still looks like it requires a wrapped union (the type just helps to do that). The issue here is that Avro C++ doesn't support LogicalTypes officially yet (there is a very new patch someone submitted, it's not merged yet). We are willing to try this patch out, but it would be good to have your input. Would you recommend we switch to LogicalTypes for unions? Or should we just stick with the wrapped unions (since the example I saw used wrapped unions along with the LogicalType implementation)? You can see that we are using the built-in logical types (decimal, etc.), not sure if that works in C++ or not, so we may need to try the new patch in any case as our application requires these types to be supported.

Thanks in advance for any feedback, great job on the module.

`
const avro = require('avsc');

interface A {
amount: number,
dec_amount: Decimal
}

interface B {
user_name: string,
email: string
}

interface C {
a_or_b: {A: A}|{B: B}
}

const schemaA: any = {
name: 'A',
type: 'record',
fields: [
{name: 'amount', type: 'int'},
{name: 'dec_amount', type: {type: 'bytes', logicalType: 'decimal', precision: 9, scale: 2}}
]
}

const schemaB: any = {
name: 'B',
type: 'record',
fields: [
{name: 'user_name', type: 'string'},
{name: 'email', type: 'string'}
]
}

const avroTypeA: any = avroWrapper.forSchema(schemaA);
const avroTypeB: any = avroWrapper.forSchema(schemaB);

const schemaC: any = {
name: 'C',
type: 'record',
fields: [
{
name: 'a_or_b',
type: [ schemaA, schemaB]
}
]
}

const avroTypeC: any = avroWrapper.forSchema(schemaC);

const dataA: A = {
amount: 5,
dec_amount: new Decimal(12.0)
}

const dataB: B = {
user_name: 'joe',
email: '[email protected]'
}

const dataC_A: C = {
a_or_b: {A: dataA}
}

const dataC_B: C = {
a_or_b: {B: dataB}
}

const bufC_B: Buffer = avroTypeC.toBuffer(dataC_B);
const bufC_A: Buffer = avroTypeC.toBuffer(dataC_A);

const dataC_BResult: C = avroTypeC.fromBuffer(bufC_B);
`

@mtth
Copy link
Owner

mtth commented May 8, 2018

Hey @cisaacson,

If we used wrapped unions, is that just internal to avsc related code in Javascript? Would a C++ Avro implementation using codegen be able to read the normal union serialization?

Yes to both questions. Unwrapped and wrapped unions are avsc concepts, introduced to work with JavaScript's type system. In particular they have identical encodings, so the C++ implementation is able to decode both (in fact, it won't even know which JavaScript representation was used).

Would you recommend we switch to LogicalTypes for unions? Or should we just stick with the wrapped unions (since the example I saw used wrapped unions along with the LogicalType implementation)?

I think both options are valid but have slightly different trade-offs; for example:

  • Logical types are convenient but can also introduce complexity (the user has to know what data is valid for each logical type) and, as you mention, might not be supported by all implementations (C++ code might end up looking very different from the JavaScript equivalent). Note however that the Avro spec mandates that all implementations ignore unsupported logical types (falling back to the underlying type rather than throw an exception), so in theory (I haven't used the C++ implementation much) you don't have to wait for C++ support to use them in JavaScript.
  • Logical types have a performance cost (more significant when used inside unions); so decoding and encoding will be faster using vanilla unwrapped unions.

It's hard for me to say much without knowing more about your use-case; but I would recommend keeping things as simple as possible: starting with wrapped unions and introducing logical types if/when you have an explicit need for them (e.g. when you are certain the convenience outweighs the complexity).

@mtth mtth added the question label May 8, 2018
@cisaacson
Copy link
Author

cisaacson commented May 8, 2018

Matthiue, thanks for the quick response.

Unwrapped and wrapped unions are avsc concepts, introduced to work with JavaScript's type system.
Brilliant, based on your explanation I don't see any need to use LogicalTypes for our use case.

To make things easier for finding the type to wrap I added a field like this to each record that occurs in ambiguous unions:

boolean _type_A = true;

Then it's in the record itself, and can easily be found searching for >= '_type' in the Object.keys array. This approach may be helpful for others as well.

The avsc library is really great work, it's well designed and very impressive!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants