Synthetic PoRep (FIP-0059) #649

lucaniz · 2023-03-06T16:08:40Z

lucaniz
Mar 6, 2023
Collaborator

Background

The PoRep protocol today requires SPs to store ~12x sector size of data created during the replication step (PC1 + PC2) until the sector is proven in the on-chain proveCommit step.

Proposal

We proposed Synthetic PoRep, a FIP that drastically lowers down the usage of ssd/NVMe after PC2 is complete by reducing the set of challenges that are chosen during the proveCommit step. Note that this happens without compromising security.
More details can be found in FIP0059.

In practice, Synthetic PoRep reduces the size of the temporary data stored between PreCommit and ProveCommit (150 epochs) from ~400GiB to ≤ 25GiB

Improvement compared with the status quo

We believe that this will allow for an additional 25% sealing throughput (assuming the SP keeps the same computing setup as today) with respect to the current rate. Overall, this can traslate in cost saving for sealing CC sectors.

Indeed, today SPs need to have ~400 GiB SSD for sealing a sector. After PC1 and PC2 this storage capacity is mostly filled with the 11 layers of SDR which need to stay there for 150 epochs, before being proved at ProveCommit.
With Synthetic PoRep, only a small buffer of less than 25GiB need to be kept around from PreCommit until ProveCommit. This means that with less than 5% more SSD storage available, SPs can start sealing a new sector right after completing PC1 and PC2 of the old sector, without need to wait ProveCommit to be over.
Note that, assuming PC1 takes almost 3h and we have 150 epochs between PreCommit and ProveCommit.

Discussion

In order for Synthetic PoRep to move forward we need

To validate our estimate about increased throughput and lower cost. In particular, we would love getting data from directly the SPs on how much cost saving this FIP can get;
General feedback on the impact of this FIP. Like, is it useful? Urgent to shipped? Nice to have in the future? Not relevant now?

anorth · 2023-03-06T20:00:20Z

anorth
Mar 6, 2023
Maintainer

Thanks! This could be great. I would guess that SP setups vary quite a lot with scale, target clients, CC-only, pooling, etc, and also outsourced sealing-as-a-service. I'm aware that in many situations, the raw hardware cost of sealing might not be a large part of total costs, but the perspective of a significant increase in throughput for otherwise ~fixed costs is very attractive for growing SPs.

On that note, the "target market" for a change like this is clearly SPs that are growing, rather than those that have reached their target capacity due to other limits (e.g. racks available, retrieval bandwidth). Feedback from such SPs is probably the most valuable.

0 replies

marshyonline · 2023-03-09T13:31:27Z

marshyonline
Mar 9, 2023

In practice, Synthetic PoRep reduces the size of the temporary data stored between PreCommit and ProveCommit (150 epochs) from ~500GiB to ≤ 25GiB

This is super exciting for us, We could ~ half our current worker storage and still maintain the same sector throughput.
While we are not currently adding workers to our clusters - I could in theory remove a large chunk of our SSD's.
This would save on wear and they could be deployed into new systems, or used as spares.

0 replies

stuberman · 2023-03-09T13:36:46Z

stuberman
Mar 9, 2023

I have always (last 3 years) over provisioned NVMe for sealing workers which has proven to pay off dearly. While this is helpful to reduce the need for as much temporary storage (I allocate 8 TiB/raw for a small lotus-worker that can seal on average one sector per hour the effect this FIP will have is to lengthen the lifespan of my NVMe as they will not get as many writes per day. I support this FIP.

0 replies

benjaminh83 · 2023-03-09T15:01:44Z

benjaminh83
Mar 9, 2023

Thanks for starting this discussion @lucaniz

Some pointers from me:

Its a little bit unclear to me if this decision is about choosing to implement Synthetic PoRep, and therefore dismiss Ni Porep and Ni Porep w/ testudo for now?
I do not follow the calculations that leads to additional 25% sealing throughput. This is in my head incorrect, as you basically assume that the sealing hardware has under-provisioned scratch space in the first place, and therefore would wait for proofs to be done. This is just not how we put together the machines. I have an example we ran last year on an integrated AP,PC1,PC2 server, only doing CC. This was able to sustain 3.25TB/day - 104 sectors of 32GB per day, running 15 PC1s in parallel and running PC2 on a A5000 at the same time, which was able to keep up just fine (New CC every 830s). This server was utilising consistently around 11TiB of NVMe. And this was performed with 2x7.68TB Samsung Gen4 U.3 drives. We would never build a server like this with too little NVMe to keep the sectors until proving, while also accepting continuously new PC1 jobs. This server would be at absolute max throughput. There is no more 25% to gain, and not from this proposal. In this scenario we would have to see if we could maybe reduce the needed NVMe - maybe down to 2x3.84TB Samsung Gen4 U.3 drives. But that would be green field and not something we would retrofit, now that we have the drives in a running server... Which leads me to.
As this would be most impactful to larger SPs that is growing, we cannot look at this without looking at what is going on with SaaS. Now in SaaS world, it might be very different, as we could potentially see a decouple between serving PC1/PC2 and doing the following Snark (C2). In this scenario as a SaaS provider of PC1/PC2, I would probably like to see Ni Porep, as this would be a clean way for me to flip a sector fast and get on to the next job - keeping the pipe even more clean for efficiency while serving many clients concurrently. If we look at Synthetic PoRep, then YES, it does give an improvement here, as we might have less capacity demand on our very large quantity of NVMe storage, but really, this would have to be assessed by the team that is closest to the source. Here I'm referring to the work being spun up in DSA WG for Enterprise Architecture. While I have a pretty good idea about how we could calculate the impact (while taking into account the massive future speedups we are waiting for), I would still think that this WG should have a big say in how this would fit into this new architecture design. I am confident in saying that this future setup will be dominating large scale sealing, as its basically way more cost efficient than as is. Therefore, the real value of Synthetic PoRep should be calculated in this context. Same conclusion though: It will not give additional 25% sealing throughput, but it might lower the capacity requirement of scratch space. So it might be possible to run 3.84TB NVMe drive SKUs, rather than 7.68TB NVMe drive SKUs, as it will allow unthrottled sealing throughput with lower total scratch space capacity.

6 replies

irenegia Mar 9, 2023
Collaborator

Its a little bit unclear to me if this decision is about choosing to implement Synthetic PoRep, and therefore dismiss Ni Porep and Ni Porep w/ testudo for now?

Hey @benjaminh83, as @cryptonemo said, adding Synthetic PoRep now would not prevent us to ship NI-PoRep later on.
But two considerations here (1) NI-PoRep with testudo is hard to happen since testudo shipping timeline is vey unclear; (2) NI-PoRep (without testudo) has an 8x proving overhead in C2. Our understanding (from feedbacks from SPs) is that this overhead represents a no-go for NI-PoRep. Do you have a different opinion on this?

lucaniz Mar 9, 2023
Collaborator Author

@benjaminh83 thank you for your comment!
I have a question/comment about your second point.

Synthetic does not touch PC1/PC2 so I agree that you'd be in the same situation as today.

Nevertheless, after PC2 is done, you have the following situation

Without Synthetic: the 11 TiB of NVMe are fully occupied for additional 150 epochs (75 min)
With Synthetic: You free up 90%+ of this space for the next 150 epochs (only 25GB per sector are needed to be stored per sector)

Not an hardware expert, so I can be totally off here (please let me know if this is the case) but

Can't this freed up space be re-used to take other PC1 jobs (somehow aren't we "reducing" the time between 2 consecutive PC1 jobs)? If not, why ?

Moreover, considering you already have the hardware (so not retrofit as you said), can't you augment your throughput using this space that frees up?

benjaminh83 Mar 9, 2023

@cryptonemo and @irenegia In this case, I do not see any downside og getting this implemented and free up resources. Just don't expect SPs to get additional throughput, as the NVMe is not the bottleneck for throughput.

@lucaniz okay great we get into this level of detail. Lets hash this out.

So as you perfectly state, Synthetic PoRep will have a positive impact on current PC1/PC2 machines, as it will lower the requirement for NVMe, as layer data will be pruned earlier.

Can't this freed up space be re-used to take other PC1 jobs (somehow aren't we "reducing" the time between 2 consecutive PC1 jobs)? If not, why ?

We will need to get technical on this. Basically sealing as of today is memory bound, as each PC1 sector takes up 65GiB of memory. Normally SPs on budget would not fit more than 1024GiB of ram in a server, as that would be 16x64GiB ram, and 128GiB is still a lot of additional cost to most. So with 1024GiB of ram, you will hit a limit of 15 sectors in parallel (not 16 haha). This means that an efficient PC1 server will have to always be having 15 sectors cooking. As I mentioned on the example, I was able to do this on a combined PC1/PC2 by issues a new CC pledge every 830s, so about 13m. This means that you will not see any benefit in throughput from adding Synthetic PoRep, as my server is not limited by NVMe, but the fact that I cannot reasonable fit more than 1024GiB ram in a box.

Moreover, considering you already have the hardware (so not retrofit as you said), can't you augment your throughput using this space that frees up?

Nope, not really. I would expect my constant usage of scratch space to be reduced from 11TiB to maybe say 8TIB. If I'm lucky, I might be able to fit in in 2x3.84TiB NVMe, and therefore I could lower or maybe half my NVMe capacity and free up drives. But this does not give me more throughput, it just gives me more slack, and I might be able to collect some spare NVMe drives and refit them into more sealing servers in the future. But it will not give me more throughput as is.
And as @stuberman mentioned, most SPs might actually already have a bit more NVMe in place than what is needed - this comes in very handy when there is a problem in the pipeline and something piles up. This would suggest that SPs might just keep NVMEe as is and gain more buffer in the pipeline.

benjaminh83 Mar 9, 2023

Important additional point to make is those numbers and calculations totally changes with the upcoming SaaS improvements. If we are not bound by memory anymore, then besides NVMe IOPS, there will be a capacity dimension. Lowering the capacity requirement must be a good thing in all scenarios, and might reduce the cost when buying more hardware like NVMe to align sealing servers to these new requirements.

lucaniz Mar 10, 2023
Collaborator Author

@benjaminh83 thanks for your comments and clarifications. What i get out of it is that since ram is not gonna be increased (in your particular case, but you think that also other SPs are in the same situation), this saving in terms of space does not improve throughput (basically we would free up space which won't be "used" for adding PC1 jobs).

On the other hand, if I get your comment and @stuberman's comment right, Synthetic would be at least beneficial in terms of NVMe lifetime. Any estimation about how this could affect lifetime/costs on your side?

SBudo · 2023-03-11T10:38:25Z

SBudo
Mar 11, 2023

This would be fantastic and VERY timely as my current bottleneck is my sealing NVMe storage capacity.
At this stage, because of our sealing storage capacity, we are limited to create about ~310 sectors per day (9,920GiB).
If this goes ahead, the bottleneck would be shifted to memory, but would increase our sector production to ~392 per day (12,544GiB). That's a 26% increase without any cost increase!

Please implement this ASAP!

@benjaminh83: In our case, when talking about sectors production throughput, this would indeed increase our throughput considerably without any additional purchases.

0 replies

Fatman13 · 2023-03-20T07:26:52Z

Fatman13
Mar 20, 2023
Collaborator

I might be wrong but I remembered reading somewhere that the additional NVMe requirement was designed to counter ASIC miners so that more SP could build their own system with retail componnets?

1 reply

benjaminh83 Apr 13, 2023

11 Layers of data is still produced, requiring 400GiB per sector in itself, so we are not changing this requirement. We are basically just freeing up the space earlier, so we can seal new sectors, rather than waiting. I do not see this introducing any security risk against ASICs.

vvkio · 2023-03-28T19:39:11Z

vvkio
Mar 28, 2023

Reducing cache requirements can have substantial benefits, especially considering the upcoming PC1 improvements. The benefits depend on the storage provider setup. I will make some assumptions and try to provide a rationale for why going forward with this proposal would be beneficial.

Assumptions

We want increase the total sealing throughput of the network
We want the largest number of storage providers to be able to benefit from the changes
We want to minimize the need for additional hardware

PC1 paradigm change

PC1 is widely known to be the biggest bottleneck for sealing as the workload requires a storage medium that can support a high amount of read IOPS operations while also linearly scaling with the number of sector being sealed.

Naturally, system memory does not scale well, as it requires a new CPU for each allocatable DIMM, which often implies a new system. This, in turn, requires a new motherboard, PSU, system disk, case, rackspace, and so on. Additionally, with higher density DIMMs, cost tends to increase quickly. Therefore, most storage providers today strike a fine balance between the number of sectors they run in parallel on a single machine.

Due to this balance, reducing the cache size was not a significant concern, as PC1 was slowing down everything and did not require a large cache.

A good way to illustrate this is with the example of a river filling a reservoir that is blocked by a dam. In this case, the river represents the inflow of sectors from PC1, which is slow due to the scarcity of system memory. This means that it requires a smaller reservoir. The dam, in turn, represents the GPUs required to perform C2.

The problem arises when we replace a scarce medium like system memory with a more abundant one that can scale horizontally. This puts us in a situation where, if we were to maintain the same cache requirements, the size of the cache would block everything.

Moving away from DRAM

As NANDs continue to evolve, the gap in IOPS between DRAM and NANDs is slowly narrowing. In some cases, this allows us to take advantage of NVMe by using enough of them in parallel to achieve the necessary IOPS.

Recent improvements on PC1 will soon allow us to take advantage of NVMe for PC1. This means we will be able to perform PC1 on many more sectors in parallel. However, if the cache requirements are not reduced, it will certainly bottleneck the sealing process.

To provide a better perspective, batches of PC1 will go from 15 to 128 sectors. This implies that the cache requirement will increase from 6 TB (15 x 400GB) to 51.2 TB (128 x 400GB), which needs to be stored for 150 epochs (approximately 75 minutes) plus the time required to execute C2. While with this proposed improvement, we would only require 3.2 TB, which is half of today's average requirement.

To minimize the hardware changes that storage providers need to perform to benefit from PC1 improvements, we need to minimize the number of machines performing PC1. While if the cache is bottlenecking, multiple PC1 machines will be required, but if the cache is reduced, multiple PC1 batches results can be stacked.

So to quickly summarize, if we don’t reduce the cache requirement we are almost certainly going to hit a bottleneck on the cache.

0 replies

benjaminh83 · 2023-04-13T16:50:40Z

benjaminh83
Apr 13, 2023

Please refer my earlier feedback to address current sealing paradigm, where our PC1 throughput is limited by memory, and not the amount of NVMe, as we have plenty in our servers to sustain the pipeline. This is not the same case for @SBudo, so it will be an individual case for each SP.

Now, talking about SaaS and the PC1 improvements, promising up to 128 PC1 jobs in parallel. I basically have to echo @vvkio, and add some relevant details to the development we see on this area. We could very well be looking at a hybrid solution for SaaS, where the SP prepares and processes the actual data content (deal data), while the SaaS provider only provides the heavy lifting of generating the 11 layers of data (referred to as 400GB per sector). Basically the SaaS provider will be sitting on these 128 sector batches consuming +50TB on very expensive NVMe, and will basically not be able to continuously produce new batches of 128 sectors, unless this storage is pruned as soon as possible. Having synthetic PoRep would be fundamental to this solution! Basically its not feasible for an SaaS provider to fit PC1/PC2 servers with say 16-20x 15TB or even 30TB drives. It would produce a BIG release of resources into a SaaS model like this as @vvkio mentioned here.

From a SaaS perspective, it would be even more "clean" to run this as Ni PoRep, but this is introducing quite a lot of overhead on the snarks (more compute, more power usage). Therefore I would see synthetic PoRep as a DO NOW, which would help out SPs with underprovisioned NVMe in there pipeline, but also greatly relax PC1/PC2 server requirements SaaS / Improved sealing paradigm !

Lets get this ball rolling asap!

0 replies

joshua-ne · 2023-06-14T11:00:55Z

joshua-ne
Jun 14, 2023

While I believe this optimization will benefit a lot of storage providers, I also think some would prefer not to change the current sealing routine, especially those who have applied massive modifications to the current pipeline because such a change may break their modifier pipeline and they will have to do a lot more work to accommodate.

With this in mind, I would suggest that we keep both options. It would be very easy to make the two types of PoRep produce the same result. That is:

Synthetic PoRep will select N_verified challenges to be verified on-chain from the N_syn challenges
Original PoRep will slightly modify the calculation and directly generate N_verified challenges from PC1/PC2 output

This way, everyone will be happy. Hope this makes sense.

2 replies

cryptonemo Jun 14, 2023

Synthetic PoRep is an optional feature and doesn't have to be used. If the freed up disk space isn't needed, the current PoRep can still be used.

joshua-ne Jun 15, 2023

Synthetic PoRep is an optional feature and doesn't have to be used. If the freed up disk space isn't needed, the current PoRep can still be used.

Good to know! Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetic PoRep (FIP-0059) #649

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Synthetic PoRep (FIP-0059) #649

lucaniz Mar 6, 2023 Collaborator

Background

Proposal

Improvement compared with the status quo

Discussion

Replies: 9 comments · 9 replies

anorth Mar 6, 2023 Maintainer

irenegia Mar 9, 2023 Collaborator

lucaniz Mar 9, 2023 Collaborator Author

lucaniz Mar 10, 2023 Collaborator Author

Fatman13 Mar 20, 2023 Collaborator

Assumptions

PC1 paradigm change

Moving away from DRAM

lucaniz
Mar 6, 2023
Collaborator

Replies: 9 comments 9 replies

anorth
Mar 6, 2023
Maintainer

irenegia Mar 9, 2023
Collaborator

lucaniz Mar 9, 2023
Collaborator Author

lucaniz Mar 10, 2023
Collaborator Author

Fatman13
Mar 20, 2023
Collaborator