-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ETL.js mapping example to use Streams #10
Comments
After spending more time than I care to admit researching this... I've come up with several options for having a streaming API in the ETL mapping example.
Node Streams suffer from several issues that are well documented by the js-ipfs team here: ipfs/js-ipfs#362 In that discussion they chose to go with pull-streams, which addresses several of the problems they were having with Node Streams including error propagation. (RXJS was not chosen because of poor support for network backpressure). The pull-stream API looks very similar to the ES6 generator API in terms of functionality. This lead me to explore using generators in our mappings API:
The main difference between the pull-stream approach and the generator approach is that pull-streams are asynchronous while generators are synchronous. This is OK for the Ethereum web3 client because it makes synchronous HTTP calls by default, however we shouldn't expect this to be the case for all storage backends we integrate with. For example, reading from a large file on IPFS would require us supporting asynchronous IO. In general I favor using idiomatic Javascript in our API, however, it doesn't seem like the functionality is quite there yet to avoid using something like Node Streams or pull-streams. There is a proposal to "modernize" the pull-streams API using the new async generators spec, which would be preferable, however the support for that async generators in Node is still experimental I'm also not crazy about having users implement pull-stream We may not even end up going with the ETL-based implementation of the mappings API, so for now I'm going to put this investigation on pause, and submit a PR for the generator approach (since it works fine for the Ethereum-only use case). I'll leave this issue open to track future investigation. |
According to this, async iteration is available in Node 10.0 as well as nightly builds: https://node.green/#ES2018-features-Asynchronous-Iterators |
We shouldn't load all of a dataset into memory at once... we should take advantage of streams/ flow control to make sure we're only loading a manageable amount of data into memory for processing at any given time.
The text was updated successfully, but these errors were encountered: