-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full internal message extraction #978
Comments
love it.
as much data as available - i know this will possibly get out of hand fast and people might not need everything of it so we might want to think about having some params to fine tune the process for their needs! we for our part, especially in front of the fvm background, would love to see "everything" being parsed |
I would be careful with an index on every column. Each index created will need to be updated on insert and having many indices will impact insertion speed. I suspect this will also end up being our heaviest table. Index updates will get slower as the table grows. Can we drop any of these indices? (I don't know if you intend to have capital letters in the column names, but |
Absolutely, what do you think we should change? I have updated the issue slightly wrt indexes
Not my intention, and FWIW column names are enforced via the ORM anyways |
The data engineering team has no concerns regarding the indices since we'll be following the I'd like to make a suggestion regarding the creation of indices. If at all possible, apply the proposed schema on a local DB first without any indices save for the composite primary key and another index such as |
Agreeing with Steph, I still think we should separate read schema concerns from write schema concerns. We can't have a good schema for both and is why we're generally at odds wrt indices and normalized schemas. I have an idea to separately manage the read schema requirements from the ingestion schema requirements which Lily should be managing in its migrations. Without a good place for separating read/write schemas, we can accept the cost of insertion with as few indices as we can get away with.
I would drop the
|
I just ran some small, probably impractical in real life queries against a schema that @frrist provided me with. I used an abridged schema of what he originally sent me to test the table with fewer indices. The vm_messages schemacreate table vm_messages
(
height bigint not null,
state_root text not null,
cid text not null,
source text not null,
"from" text not null,
"to" text not null,
value numeric not null,
method text not null,
actor_code text not null,
exit_code bigint not null,
gas_used bigint not null,
params jsonb,
returns jsonb,
primary key (height, state_root, cid, source)
);
alter table vm_messages
owner to postgres;
create index vm_messages_height_idx
on vm_messages (height desc);
create index vm_messages_state_root_idx
on vm_messages using hash (state_root);
create index vm_messages_cid_idx
on vm_messages using hash (cid);
create index vm_messages_source_idx
on vm_messages using hash (source); self join on vm_messagesexplain analyze
select a.height, a.method
from vm_messages a
join vm_messages b
on a.height = b.height and a.state_root = b.state_root and a.from = b.to
where a.params->'DealIDs' is not null; query plan:
vm_messages join receiptsexplain analyze verbose
select v.height, v.cid, v.method
from vm_messages v
join receipts r
on v.cid = r.message
where v.params->'DealID' is not null;
self join on methodexplain analyze verbose select v.height, x.height, v.cid, v.method from vm_messages v join vm_messages x on v.method = x.method;
I ran multiple analyses on these two queries and there was not much variance in the execution times of each. I think this demonstrates that |
so the consensus is that the read schema needs to be different from the write schema for this to be useful? does that in general apply to lilly? |
@f8-ptrk generally, we want a schema optimized for the ingestion of data as bottlenecks in writing to the database will prevent lily from keeping up with the chain; We aim to strike a balance between ingestion speed and query speed by including some indices in the schema, and recognize they may not be optimal for all use-cases. Users of Lily are free to introduce their own indices outside the provided schema or, as @kasteph alludes to, write directly to CSV files and implement a separate ingestion process. I have incorporated the feedback from this issue into #1027, which is ready for review @placer14 @kasteph (@f8-ptrk feel free to review as well). We can continue discussion around index's changes there. |
* feat: implement vm message extraction - closes #978
Background
Internal messages (also known as execution traces, or VM Messages) are messages sent by actors while executing (on-chain) messages. As the name suggests, internal messages are executed internally by the filecoin VM and therefore do not appear on-chain. One example of an Internal Message send exists within the Multsisig actor. Messages may be staged in a multsig actor and sent once they are approved by the required number of signers. The result of the multisig send does not appear on chain. A second example of an internal message send happens within the miner actor DisputeWindowPoSt method, the message that rewards the reporter is sent internally and thus doesn't appear on chain.
Collecting the information from internal-message-sends permits a granular inspection of activity within the filecoin network. In addition to the above examples, it allows for inspection of:
Currently, Lily exports two models,
internal_messages
andinternal_parsed_messages
. These tables are poorly named since they actually contain "implicit messages": messages which are implicitly applied for each block -AwardBlockRewards
andEpochTick
. In a sense, the messages tracked in these tables are internal as they do not appear on-chain, but I am unsure if the messages described above in the first couple of paragraphs belong in these tables.Model Design
Create a new table with the following schema for tracking internal messages.
Table Name:
vm_messages
Table Definition:
Table Indexes
Acceptance Criteria
The text was updated successfully, but these errors were encountered: