Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files created using avro.Type.forValue can't be opened on other libraries #108

Closed
ronaldocpontes opened this issue May 17, 2017 · 4 comments

Comments

@ronaldocpontes
Copy link

On Google Data Cloud:

Errors:
file-00000000: The Apache Avro library failed to parse the header with the follwing error: Missing Json field "name": {"fields":[{"name":"city","type":"string"},{"name":"zipCodes","type":{"items":"string","type":"array"}},{"name":"visits","type":"int"}],"type":"record"} (error code: invalid)

avro-tools says:

Exception in thread "main" org.apache.avro.SchemaParseException: No name in schema: {"type":"record","fields":[{"name":"city","type":"string"},{"name":"zipCodes","type":{"type":"array","items":"string"}},{"name":"visits","type":"int"}]}

On Python

avro.schema.SchemaParseException: Named Schemas must have a non-empty name.

Steps to reproduce:

const avro = require('avsc');

const file = './example.avro';

const type = avro.Type.forValue({
city: 'Cambridge',
zipCodes: ['02138', '02139'],
visits: 2
});

function readFromFile() {
  let i = 1;
  avro.createFileDecoder(file)
    .on('metadata', function (type) {
        console.log('metadata', type);
        console.log();
     })
    .on('data', function (record) {
        console.log('record', i++, ':', record);
    });
}

const opts =  {};
const encoder = avro.createFileEncoder(file, type, opts);
encoder.write({city: 'Seattle', zipCodes: ['98101'], visits: 3});
encoder.end(
    {city: 'New York', zipCodes: ['10001'], visits: 47},
    readFromFile
);

Open file using python

pip install avro
python readAvroFile.py ./example.avro
# readAvroFile.py 
# Requires python avro library
# - pip install avro

from avro.datafile import DataFileReader
from avro.io import DatumReader
from optparse import OptionParser

parser = OptionParser()
(options, args) = parser.parse_args()

avroFile = args[0]

print
print '\n\nReading', avroFile
print '--------------------------------------'
print


reader = DataFileReader(open(avroFile, "rb"), DatumReader())
for record in reader:
    print record
reader.close()

@ronaldocpontes
Copy link
Author

ronaldocpontes commented May 18, 2017

Using a schema with a name fails with a different message:

python readAvroFile.py ./example.avro

...
avro.schema.SchemaParseException: Type property "{u'symbols': [u'CAT', u'DOG'], u'type': u'enum'}"
not a valid Avro schema: Named Schemas must have a non-empty name.
const avro = require('avsc');

const file = './example.avro';

const type = avro.Type.forSchema({
  type: 'record',
  namespace: 'catalog',
  name: 'pets',
  doc: "Pets catalog",
  fields: [
    {name: 'kind', type: {type: 'enum', symbols: ['CAT', 'DOG']}},
    {name: 'name', type: 'string'}
  ]
});

function readFromFile() {
  let i = 1;
  avro.createFileDecoder(file)
    .on('metadata', function (type) {
        console.log('metadata', type);
        console.log();
     })
    .on('data', function (record) {
        console.log('record', i++, ':', record);
    });
}

const opts =  {};

const encoder = avro.createFileEncoder(file, type, opts);
encoder.write({kind: 'CAT', name: 'Catalberto'});
encoder.write({kind: 'DOG', name: 'Dogoberto'});
encoder.end(
    {kind: 'DOG', name: 'Woof'},
    readFromFile
);

@mtth
Copy link
Owner

mtth commented May 18, 2017

You'll need to add a name to your named types (i.e. enum, fixed, and record). avsc supports anonymous types as an extension but most other implementations don't. (Note that this is the same extension that allows Type.forValue to work, since there isn't a robust way of figuring out what a correct name would be.)

There are several way to generate names, I'll suggest two:

  • The first, which gives you the most control, is to specify the schema yourself
    and instantiate types using Type.forSchema instead (you can also set its
    noAnonymousTypes option to ensure that your schema isn't missing any names):

    const type = avro.Type.forSchema({
      type: 'record',
      name: 'Trip', // Explicit record name.
      fields: [
        {name: 'city', type: 'string'},
        {name: 'zipCodes', type: {type: 'array', items: 'string'}},
        {name: 'visits', type: 'int'}
      ]
    }, {noAnonymousTypes: true}); // Ensure all types have names.

    Note that you don't have to type the entire schema from scratch, you can use the output value of Type.forValue(/* ... */).schema() as starting point (see this section for a bit more context).

  • Another option is to auto-generate placeholder names, which you can do using
    a type-hook (which is a valid option both for Type.forSchema and
    Type.forValue):

    /** Returns a function to be used as type hook option to auto-generate names. */
    function createNamingHook() {
      let index = 0;
      return function (schema) {
        switch (schema.type) {
          case 'enum':
          case 'fixed':
          case 'record':
            schema.name = `Auto${index++}`;
            break;
          default:
        }
      };
    }
    
    // Sample usage from a value:
    const type1 = avro.Type.forValue({
      city: 'Cambridge',
      zipCodes: ['02138', '02139'],
      visits: 2
    }, {typeHook: createNamingHook()}); // Note the hook.
    
    // Sample usage from a schema:
    const type2 = avro.Type.forSchema({
      type: 'record', // No record name.
      fields: [
        {name: 'kind', type: {type: 'enum', symbols: ['CAT', 'DOG']}}, // No enum name.
        {name: 'name', type: 'string'}
      ]
    }, {typeHook: createNamingHook()}); // Idem.

    Both types above will have all their names populated (with AutoXX values) and be compatible with all Avro-compliant libraries.

@ronaldocpontes
Copy link
Author

Thanks @mtth, very useful information.

Ended up using Type.forSchema and it worked fine.

@mtth
Copy link
Owner

mtth commented May 23, 2017

Great, thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants