Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollup v2 #225

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Rollup v2 #225

wants to merge 5 commits into from

Conversation

hath995
Copy link

@hath995 hath995 commented Mar 12, 2019

Following up from #224 this is the more minimal refactor. It did not improve the rolled up size much as I hoped (95k vs 68k but either way is better than browserify's 206K), however, this was sufficient that I could actually bundle it with my project. Now I'm unblocked to use avsc without making a separate browserify bundle.

One of the big issues was the crypto library. Apparently, it seems the rollup-plugin-node-builtins has trouble with the crypto module and I tried out md5.js but it had a different issue being packaged because of some circular dependency. Jshashes is entirely self contained and it was bundle-able.

Maybe someone else will find this useful.

Copy link
Owner

@mtth mtth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hath995, thanks again for the PR and the great description! IIUC the main issue preventing you from using rollup is the crypto package. There are a few other things but I believe their impact is minor (e.g. ~10% increase in bundle size). If so, I think there is a better path forward than merging this PR:

  1. Switching from avsc to @avro/types. This should be a straightforward port from avsc if you are only using the types part of it. Two main differences are that it doesn't include a fingerprint method on types (which lets us remove the crypto dependency) and the from/toString methods are now replaced by the more natural from/toJSON.
  2. Requiring this updated module and bundling it from your code. In case it's helpful, I uploaded a working example gist.

Let me know what you think.

@@ -99,17 +97,17 @@ function Type(schema, opts) {
name = qualify(name, namespace);
if (isPrimitive(name)) {
// Avro doesn't allow redefining primitive names.
throw new Error(f('cannot rename primitive type: %j', name));
throw new Error(`cannot rename primitive type: ${JSON.stringify(name)}`);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you changed this to remove the dependency on util, however I don't think the downsides--extra dependency (inherits) and loss of compatibility with node <1--are worth the benefit: the bundle size drops only by 10%, from 78kB to 70kB (23kB to 20kB after gzip). It also seems relatively likely that clients would depend on util for other things, in which case this change could actually make their bundle size go up (they end up having to include util and inherits).

return hash.read();
function getHash(str) {
var hash = new md5();
return Buffer.from(hash.hex(str), "hex");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to a comment above, this requires node 4. (utils has a simple workaround for this one: utils.bufferFrom.)

@@ -0,0 +1,29 @@
const rollup = require('rollup');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to move this logic outside of the core package, either in a dedicated one (with a peer dependency on @avro/types) or in client code:

  • It reduces bloat in the main package; most clients don't use rollup.
  • It allows clients to configure rollup options (e.g. they might not want to use terser).
  • IIUC there are also benefits to rolling up more packages together, so it's counter-productive to bundle just this one separately.

Copy link
Author

@hath995 hath995 Mar 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never meant to actually commit this file (or the updated dependencies in package.json). This was there so that you could try to run the config I was using. I planned on removing it for any final pr.

@@ -446,7 +444,7 @@ Type.isType = function (/* any, [prefix] ... */) {
};

Type.__reset = function (size) {
debug('resetting type buffer to %d', size);
//debug('resetting type buffer to %d', size);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed these because 'debug' was another node library that I didn't want to include in my bundle if I could avoid it. There were more bundle problems with it there so I removed it.

@hath995
Copy link
Author

hath995 commented Mar 17, 2019

I think I misunderstood why you recommended @avro/types initially. It is closer to what I wanted but it had the same problem with rollup issues and also the TypeScript types are not included in the distribution of @avro/types. My entire project (client and server) is in TypeScript and so having the types is quite helpful. I wasn't familiar with Avro before trying out this library and I'm still experimenting if binary transport is a useful addition or not.

The avro json schema description format was a bit confusing but thankfully the included TS types did eventually guide me to a correct description.

I'm trying to marshal a recursive representation of a virtual DOM with a type similar to this.

export interface VNode {
    sel: string | undefined;
    data: VNodeData | undefined;
    children: Array<VNode | string> | undefined;
    text: string | undefined;
    key: Key | undefined;
}
export interface VNodeData {
    props?: Props;
    attrs?: Attrs;
    class?: Classes;
    style?: VNodeStyle;
    [key: string]: any;
}

https://jsperf.com/avsc-decode-coupon/12
http://aaronelligsen.com/pageDom.js //copy of the data I'm testing here

It seems like on a large complicated data structure like this avsc takes a bit longer. It also only reduced the file size by ~26%, where as compression (with snappy) had a much bigger impact (nearly 80%). However snappy requires a buffer to either way I need to convert from json to a buffer or a encode it with avro.

@hath995
Copy link
Author

hath995 commented Mar 17, 2019

In sort of an unrelated question about union types in avro/avsc. I'm using a feature of typescript called Discriminated Unions heavily. Avro seems to support this. However, it seems I need to box these objects before avsc will encode them? Is it possible with some more effort to be able to encode the example data value directly rather than data2 since they are already distinguished by the kind property? Also lets say I were to receive several events of these types. Since they are already in binary form is there a mechanism in avsc to combine them into another binary record like updatesAvro without decoding and then re-encoding them? To save a stream of events for example.

interface IChangeMeta {
  timeOffset: number;
  updateId?: number;
}
type IPosition = {x: number, y: number};
type IMouseMoveEvent = { kind: "mouseMove", pos: IPosition } & IChangeMeta;
type IPushStateEvent = { kind: "pushstate", url: string } & IChangeMeta;

var reg = { registry: {}};
var MouseMoveAvro = avro.Type.forSchema({
  name: "mouseMove",
  type: "record",
  fields: [
    {name: "kind", type: {type: "enum", name: "mouseMoveEnum", symbols: ["mouseMove"]}},
    {name: "pos", type: {name: "IPosition", type: "record", fields: [
      {name: "x", type: "float"},
      {name: "y", type: "float"}
    ]}},
    {name: "updateId", type:"int"},
    { name: "timeOffset", type: "int" }
  ]
}, reg);

var PushStateAvro = avro.Type.forSchema({
  name: "pushstate",
  type: "record",
  fields: [
    {name: "kind", type: {type: "enum", name: "pushstateEnum", symbols: ["pushstate"]}},
    {name:"url", type:"string"},
    {name: "updateId", type:"int"},
    { name: "timeOffset", type: "int" }
  ]
},reg);

var updatesAvro = avro.Type.forSchema({
  name: "updates",
  type: "array",
  items: ["pushstate","mouseMove"]
}, reg);

var data = [{
  kind: 'pushstate',
  url: 'https://www.yelp.com/search?find_desc=delivery&find_loc=East%20Solano%20Ave%2C%20Berkeley%2C%20CA&ytp_st=delivery_default&attrs=PlatformDelivery%2CRestaurantsPriceRange2.3%2CRestaurantsPriceRange2.2&l=p%3ACA%3ABerkeley%3A%3ADowntown_Berkeley&ed_attrs=RestaurantsPriceRange2.4',
  updateId: 137,
  timeOffset: 12350
},
{
  kind: 'mouseMove',
  updateId: 16,
  pos: { x: 262, y: 152 },
  timeOffset: 3490
  }];

var data2 = [{ pushstate: data[0] }, { mouseMove: data[1] }]
updatesAvro.toBuffer(data2)

http://www.typescriptlang.org/play/index.html

@mtth
Copy link
Owner

mtth commented Mar 21, 2019

However, it seems I need to box these objects before avsc will encode them? Is it possible with some more effort to be able to encode the example data value directly rather than data2 since they are already distinguished by the kind property?

Right, by default unions with multiple object types are wrapped: avsc doesn't have a way of efficiently telling apart branches for generic unions. However, it is possible to use a logical type to handle the wrapping/unwrapping for you automatically, leveraging what you know about your data (e.g. via a "kind" property). This comment has a basic implementation which should be a good start; you might also be interested in this thread for more context.

Also lets say I were to receive several events of these types. Since they are already in binary form is there a mechanism in avsc to combine them into another binary record like updatesAvro without decoding and then re-encoding them? To save a stream of events for example.

There is no built-in utility but it is straightforward to do given how unions are encoded (Avro simply prefixes the branch's index). So, assuming you know the type of the serialized data, you can get the index of the branch it corresponds to, then use an LongType to encode it and write it to the stream just before the branch's bytes. You can use same idea for arrays. It's not ideal but if efficiency is a concern, it might be worth it.

@mtth
Copy link
Owner

mtth commented Mar 21, 2019

It seems like on a large complicated data structure like this avsc takes a bit longer. It also only reduced the file size by ~26%, where as compression (with snappy) had a much bigger impact (nearly 80%). However snappy requires a buffer to either way I need to convert from json to a buffer or a encode it with avro.

JSON de/serialization tends to perform very well on data which is very string-heavy. I'm not sure why avsc is particularly slow on this example however. It looks like it's browser-specific too: running the same benchmark on node is ~5x faster. Thank you for bringing this up, I'll look into it when I have a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants