Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue using esm or bundled versions of parquet-wasm using esbuild #488

Closed
cornhundred opened this issue Apr 6, 2024 · 2 comments
Closed

Comments

@cornhundred
Copy link

cornhundred commented Apr 6, 2024

Hi, I am seeing errors when I try to import parquet-wasm using the bundler esbuild.

Similar to this issue, #486, I am seeing this error

94924d21-558e-468e-97d6-52e78f9ca56d:1552 Error loading data: TypeError: Cannot read properties of undefined (reading '__wbindgen_add_to_stack_pointer')
    at TP (94924d21-558e-468e-97d6-52e78f9ca56d:1552:48578)
    at c (94924d21-558e-468e-97d6-52e78f9ca56d:1552:50503)
    at b (94924d21-558e-468e-97d6-52e78f9ca56d:1552:52078)
    at async Object.DV [as render] (94924d21-558e-468e-97d6-52e78f9ca56d:1559:1493)
    at async widget.js:363:17

when I import parquet-wasm like this

import * as pq from "parquet-wasm/esm/arrow2";

without awaiting the default function. However, when I run await pq.default(), I see this error

widget.js:237 TypeError: Failed to construct 'URL': Invalid URL
    at UP (542aeba4-c71d-47a1-8e67-2dcff8e6ca23:1553:14046)
    at Object.oz [as render] (542aeba4-c71d-47a1-8e67-2dcff8e6ca23:1560:1458)
    at async widget.js:363:17

If I try to switch to using the bundler build like this

import * as pq from "parquet-wasm/bundler/arrow2"

(which required using the wasmLoader for esbuild and setting the target for esnext to enable top-level await) I get this error

widget.js:237 TypeError: Failed to construct 'URL': Invalid URL
    at LC (8a312991-81c6-426e-b14c-067bcbe5f62b:1553:10554)
    at 8a312991-81c6-426e-b14c-067bcbe5f62b:1553:10908

and clicking LC (8a312991-81c6-426e-b14c-067bcbe5f62b:1553:10554) shows

async function LC(j, A) {
    if (typeof j == "string") {
        j.startsWith("./") && (j = new URL(j,import.meta.url).href);
        let t = await fetch(j);
        if (typeof WebAssembly.instantiateStreaming == "function")
            try {
                return await WebAssembly.instantiateStreaming(t, A)
            } catch (e) {
                if (t.headers.get("Content-Type") != "application/wasm")
                    console.warn(e);
                else
                    throw e
            }
        j = await t.arrayBuffer()
    }
    return await WebAssembly.instantiate(j, A)
}

For some background, I'm using parquet-wasm in an anywidget that is being bundled with esbuild on the suggestion from this discussion. Also, the await pq.default() function works properly if use a CDN to obtain parquet-wasm like this

import * as pq from "https://unpkg.com/[email protected]/esm/arrow2.js";

@kylebarron
Copy link
Owner

when I run await pq.default(), I see this error

widget.js:237 TypeError: Failed to construct 'URL': Invalid URL
    at UP (542aeba4-c71d-47a1-8e67-2dcff8e6ca23:1553:14046)
    at Object.oz [as render] (542aeba4-c71d-47a1-8e67-2dcff8e6ca23:1560:1458)
    at async widget.js:363:17

If you look at the generated bindings, you can see

    if (typeof input === 'undefined') {
        input = new URL('parquet_wasm_bg.wasm', import.meta.url);
    }

in the __wbg_init function exported at the very end of the file. Presumably, your import.meta.url is not set correctly, so that the new URL constructor fails.

and clicking LC (8a312991-81c6-426e-b14c-067bcbe5f62b:1553:10554) shows

async function LC(j, A) {
    if (typeof j == "string") {
        j.startsWith("./") && (j = new URL(j,import.meta.url).href);
        let t = await fetch(j);
        if (typeof WebAssembly.instantiateStreaming == "function")
            try {
                return await WebAssembly.instantiateStreaming(t, A)
            } catch (e) {
                if (t.headers.get("Content-Type") != "application/wasm")
                    console.warn(e);
                else
                    throw e
            }
        j = await t.arrayBuffer()
    }
    return await WebAssembly.instantiate(j, A)
}

I can't find this function in the generated bindings in the latest bundler build. You should try one the latest beta.

Also, the await pq.default() function works properly if use a CDN to obtain parquet-wasm like this

import * as pq from "https://unpkg.com/[email protected]/esm/arrow2.js";

So presumably import.meta.url isn't defined in Jupyter or something like that.

@cornhundred
Copy link
Author

Thanks @kylebarron, I was able to get it to work in the following way and would appreciate any advice:

I am using the 0.4.0-beta.5 version of parquet-wasm because I haven't migrated to the new API yet, so my dependencies in my package.json look like this:

"dependencies": {
	"deck.gl": "^9.0.5",
	"parquet-wasm": "0.4.0-beta.5",
	"apache-arrow": "15.0.2",
	"math.gl": "2.3.3",
	"@loaders.gl/core": "4.1.1"
},

Since parquet-wasm was working correctly with file that was obtained from unpkg, I figured I would download the file (https://unpkg.com/[email protected]/esm/arrow2.js), save it locally to /vendor/parquet-wasm/parquet-wasm_unpkg.js (along with the project licenses), and import it like this:

import * as pq from "./vendor/parquet-wasm/parquet-wasm_unpkg.js";
...

I was still getting the URL error so I added a console log to the parquet-wasm_unpkg.js file to log the import.meta.url, which ends up being the localhost that is hosting Jupyter. On my MacBook I was able to use change the URL to this 'files/js/vendor/parquet-wasm/arrow2_bg.wasm' and it was able load the file and run without error - see below:

async function init(input) {
    // console.log('here in the parquet-wasm source code');
    
    // Use a fixed path for development. You may need to adjust this path based on your project's structure and where it's served from.
    // For example, if your server serves the `vendor` directory at the root, and `arrow2_bg.wasm` is within `vendor/parquet-wasm/`,
    // the path should reflect that.
    const fixedPath = 'files/js/vendor/parquet-wasm/arrow2_bg.wasm'; // Adjust this path as necessary.

    // js/vendor/parquet-wasm

    if (typeof input === 'undefined') {
        // Assume we're in a browser environment and construct the URL relative to the server's root.
        input = new URL(fixedPath, window.location.origin);
    }
    // console.log('WASM module will be loaded from:', input);

    const imports = getImports();

    if (typeof input === 'string' || (typeof Request === 'function' && input instanceof Request) || (typeof URL === 'function' && input instanceof URL)) {
        input = fetch(input);
    }

    initMemory(imports);

    const { instance, module } = await load(await input, imports);

    return finalizeInit(instance, module);
}

However, this did not work on Google Colab and Terra.bio - probably because we can't rely on Jupyter hosting files. So I figured I would try to hardwire the WASM file into the JavaScript by converting it to a Base64 string. I saved this string to a file called wasmModuleBase464.js that looks like this:

export const wasmBase64 = `AGFzbQEAAAAB5 ...

and imported it into the init function on my local copy of parquet-wasm_unpkg.js

import { wasmBase64 } from './wasmModuleBase64.js'; 

async function init(input) {
    // No need to adjust the path, as we'll be loading the WASM from a Base64 string
    const imports = getImports();

    // Decode the Base64 string to get the binary representation
    const binaryString = window.atob(wasmBase64);
    const bytes = new Uint8Array(binaryString.length);
    for (let i = 0; i < binaryString.length; i++) {
        bytes[i] = binaryString.charCodeAt(i);
    }

    initMemory(imports);

    // Use the binary bytes to instantiate the WebAssembly module
    const { instance, module } = await WebAssembly.instantiate(bytes, imports);

    return finalizeInit(instance, module);
}

This approach seems to be working locally and on Google Colab and Terra.bio. Do you think this is a reasonable approach? If so, would it make sense to include the WASM code as a base64 string in the esm version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants