Skip to content

Latest commit

 

History

History
246 lines (173 loc) · 9.49 KB

commands.md

File metadata and controls

246 lines (173 loc) · 9.49 KB

Commands

All the commands accept -h parameter for help, e.g.:

> glideetl export_blocks_and_transactions -h

Usage: glideetl export_blocks_and_transactions [OPTIONS]

  Export blocks and transactions.

Options:
  -s, --start-block INTEGER   Start block
  -e, --end-block INTEGER     End block  [required]
  -b, --batch-size INTEGER    The number of blocks to export at a time.
  -p, --provider-uri TEXT     The URI of the web3 provider e.g.
                              file://$HOME/Library/Ethereum/geth.ipc or
                              https://mainnet.infura.io
  -w, --max-workers INTEGER   The maximum number of workers.
  --blocks-output TEXT        The output file for blocks. If not provided
                              blocks will not be exported. Use "-" for stdout
  --transactions-output TEXT  The output file for transactions. If not
                              provided transactions will not be exported. Use
                              "-" for stdout
  -h, --help                  Show this message and exit.

For the --output parameters the supported types are csv and json. The format type is inferred from the output file name.

export_blocks_and_transactions

> glideetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc \
--blocks-output blocks.csv --transactions-output transactions.csv

Omit --blocks-output or --transactions-output options if you want to export only transactions/blocks.

You can tune --batch-size, --max-workers for performance.

Blocks and transactions schema.

export_token_transfers

The API used in this command is not supported by Infura, so you will need a local node. If you want to use Infura for exporting ERC20 transfers refer to extract_token_transfers

> glideetl export_token_transfers --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --batch-size 100 --output token_transfers.csv

Include --tokens <token1> --tokens <token2> to filter only certain tokens, e.g.

> glideetl export_token_transfers --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output token_transfers.csv \
--tokens 0x1F573D6Fb3F13d689FF844B4cE37794d79a7FF1C --tokens 0x80fB784B7eD66730e8b1DBd9820aFD29931aab03

You can tune --batch-size, --max-workers for performance.

Token transfers schema.

export_receipts_and_logs

First extract transaction hashes from transactions.csv (Exported with export_blocks_and_transactions):

> glideetl extract_csv_column --input transactions.csv --column hash --output transaction_hashes.txt

Then export receipts and logs:

> glideetl export_receipts_and_logs --transaction-hashes transaction_hashes.txt \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --receipts-output receipts.csv --logs-output logs.csv

Omit --receipts-output or --logs-output options if you want to export only logs/receipts.

You can tune --batch-size, --max-workers for performance.

Upvote this feature request openethereum/parity-ethereum#9075, it will make receipts and logs export much faster.

Receipts and logs schema.

extract_token_transfers

First export receipt logs with export_receipts_and_logs.

Then extract transfers from the logs.csv file:

> glideetl extract_token_transfers --logs logs.csv --output token_transfers.csv

You can tune --batch-size, --max-workers for performance.

Token transfers schema.

export_contracts

First extract contract addresses from receipts.csv (Exported with export_receipts_and_logs):

> glideetl extract_csv_column --input receipts.csv --column contract_address --output contract_addresses.txt

Then export contracts:

> glideetl export_contracts --contract-addresses contract_addresses.txt \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output contracts.csv

You can tune --batch-size, --max-workers for performance.

Contracts schema.

export_tokens

First extract token addresses from contracts.json (Exported with export_contracts):

> glideetl filter_items -i contracts.json -p "item['is_erc20'] or item['is_erc721']" | \
glideetl extract_field -f address -o token_addresses.txt

Then export ERC20 / ERC721 tokens:

> glideetl export_tokens --token-addresses token_addresses.txt \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --output tokens.csv

You can tune --max-workers for performance.

Tokens schema.

export_traces

Also called internal transactions. The API used in this command is not supported by Infura, so you will need a local Parity archive node (parity --tracing on). Make sure your node has at least 8GB of memory, or else you will face timeout errors. See this issue

> glideetl export_traces --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/parity.ipc --batch-size 100 --output traces.csv

You can tune --batch-size, --max-workers for performance.

Traces schema.

export_geth_traces

Read Differences between geth and parity traces.csv

The API used in this command is not supported by Infura, so you will need a local Geth archive node (geth --gcmode archive --syncmode full --txlookuplimit 0). When using rpc, add --rpc --rpcapi debug options.

> glideetl export_geth_traces --start-block 0 --end-block 500000 \
--provider-uri file://$HOME/Library/Ethereum/geth.ipc --batch-size 100 --output geth_traces.json

You can tune --batch-size, --max-workers for performance.

extract_geth_traces

> glideetl extract_geth_traces --input geth_traces.json --output traces.csv

You can tune --batch-size, --max-workers for performance.

get_block_range_for_date

> glideetl get_block_range_for_date --provider-uri=https://mainnet.infura.io/v3/7aef3f0cd1f64408b163814b22cc643c --date 2018-01-01
4832686,4838611

get_keccak_hash

> glideetl get_keccak_hash -i "transfer(address,uint256)"
0xa9059cbb2ab09eb219583f4a59a5d0623ade346d962bcd4e46b11da047c9049b

stream

> pip3 install ethereum-etl[streaming]
> glideetl stream --provider-uri https://mainnet.infura.io/v3/7aef3f0cd1f64408b163814b22cc643c --start-block 500000
  • This command outputs blocks, transactions, logs, token_transfers to the console by default.

  • Entity types can be specified with the -e option, e.g. -e block,transaction,log,token_transfer,trace,contract,token.

  • Use --output option to specify the Google Pub/Sub topic, Postgres database or GCS bucket where to publish blockchain data,

    • For Google PubSub: --output=projects/<your-project>/topics/crypto_ethereum. Data will be pushed to projects/<your-project>/topics/crypto_ethereum.blocks, projects/<your-project>/topics/crypto_ethereum.transactions etc. topics.
    • For Postgres: --output=postgresql+pg8000://<user>:<password>@<host>:<port>/<database_name>, e.g. --output=postgresql+pg8000://postgres:[email protected]:5432/ethereum.
    • For GCS: --output=gs://<bucket_name>. Make sure to install and initialize gcloud cli.
    • For Kafka: --output=kafka/<host>:<port>, e.g. --output=kafka/127.0.0.1:9092
    • Those output types can be combined with a comma e.g. --output=gs://<bucket_name>,projects/<your-project>/topics/crypto_ethereum

    The schema and indexes can be found in this repo ethereum-etl-postgres.

  • The command saves its state to last_synced_block.txt file where the last synced block number is saved periodically.

  • Specify either --start-block or --last-synced-block-file option. --last-synced-block-file should point to the file where the block number, from which to start streaming the blockchain data, is saved.

  • Use the --lag option to specify how many blocks to lag behind the head of the blockchain. It's the simplest way to handle chain reorganizations - they are less likely the further a block from the head.

  • You can tune --period-seconds, --batch-size, --block-batch-size, --max-workers for performance.

  • Refer to blockchain-etl-streaming for instructions on deploying it to Kubernetes.

Stream blockchain data continually to Google Pub/Sub:

> export GOOGLE_APPLICATION_CREDENTIALS=/path_to_credentials_file.json
> glideetl stream --start-block 500000 --output projects/<your-project>/topics/crypto_ethereum

Stream blockchain data to a Postgres database:

glideetl stream --start-block 500000 --output postgresql+pg8000://<user>:<password>@<host>:5432/<database>

The schema and indexes can be found in this repo ethereum-etl-postgres.