Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34936: [JavaScript] Added Proxy for Table, RecordBatch, and Vector #34939

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Commits on Apr 6, 2023

  1. Added Proxy for Table, RecordBatch, and Vector

    Certain codebases that previously uses row-oriented way to access data
    may wish to migrate to Arrow to save serialization and deserialization
    cost, and to be able to gain access to fast column-oriented operations.
    As it stands, Arrow is sort of a drop-in replacement to row-oriented
    data such as a JavaScript Array of objects. This is great to
    incrementally migrate legacy codebases to Arrow, as it is frequently
    infeasible to rewrite the application to use the column-oriented data
    access patterns. For most data, JavaScript-object-compatible and
    row-oriented access is already provided via the `StructRowProxy`.
    However, if the structs themselves include a `Vector`, existing code
    will break as it assumes the `Vector` object to behave like a JavaScript
    array, which it does not due to the lack of index access. An example of
    such a data structure is as follows:
    
    ```
    [
      {x: 1, y: [1, 2]},
      {x: 2, y: [2, 3]},
    ]
    ```
    
    In this case, with the Arrow JS library as it is, the API consumer is
    unable to get individual element of the `y` array via `table[i].y[j]`.
    Instead, the API consumer must use the API `table.get(i).y.get(j)`. In
    the situation where we are migrating a legacy code base to Arrow, this
    requires a large refactor of the entire codebase, which is infeasible in
    a short time. This negates the advantage of using Arrow as a drop-in
    replacement and prevents incremental migration of code to Arrow.
    
    To address this problem, this patch adds a Proxy at the root of the
    prototype chain for the `Table`, `RecordBatch`, and `Vector` objects and
    allow index access for these objects for backward compatibility
    purposes. Basically, objects like `Vector` now supports `vector[i]` in
    addition to `vector.get(i)`.
    
    However, code should not be using `vector[i]` as it is ~1.5 orders of
    magnitude slower than `vector.get(i)` as ES6 Proxy objects are quite
    slow. This should only be used to provide compatibility for legacy
    codebases. For code that desires high performance, `.get` remains a much
    better solution. This is also why the Proxy object is added to the root
    of the prototype chain, as opposed to the usual pattern where a Proxy
    object is returned from a constructor.
    
    Documentation has been added to compare the performance of the various
    access.
    shuhaowu committed Apr 6, 2023
    Configuration menu
    Copy the full SHA
    bc6927a View commit details
    Browse the repository at this point in the history