Synthetic PoRep (FIP-0059) #649
Replies: 9 comments 9 replies
-
Thanks! This could be great. I would guess that SP setups vary quite a lot with scale, target clients, CC-only, pooling, etc, and also outsourced sealing-as-a-service. I'm aware that in many situations, the raw hardware cost of sealing might not be a large part of total costs, but the perspective of a significant increase in throughput for otherwise ~fixed costs is very attractive for growing SPs. On that note, the "target market" for a change like this is clearly SPs that are growing, rather than those that have reached their target capacity due to other limits (e.g. racks available, retrieval bandwidth). Feedback from such SPs is probably the most valuable. |
Beta Was this translation helpful? Give feedback.
-
This is super exciting for us, We could ~ half our current worker storage and still maintain the same sector throughput. |
Beta Was this translation helpful? Give feedback.
-
I have always (last 3 years) over provisioned NVMe for sealing workers which has proven to pay off dearly. While this is helpful to reduce the need for as much temporary storage (I allocate 8 TiB/raw for a small lotus-worker that can seal on average one sector per hour the effect this FIP will have is to lengthen the lifespan of my NVMe as they will not get as many writes per day. I support this FIP. |
Beta Was this translation helpful? Give feedback.
-
Thanks for starting this discussion @lucaniz Some pointers from me:
|
Beta Was this translation helpful? Give feedback.
-
This would be fantastic and VERY timely as my current bottleneck is my sealing NVMe storage capacity. Please implement this ASAP! @benjaminh83: In our case, when talking about sectors production throughput, this would indeed increase our throughput considerably without any additional purchases. |
Beta Was this translation helpful? Give feedback.
-
I might be wrong but I remembered reading somewhere that the additional NVMe requirement was designed to counter ASIC miners so that more SP could build their own system with retail componnets? |
Beta Was this translation helpful? Give feedback.
-
Reducing cache requirements can have substantial benefits, especially considering the upcoming PC1 improvements. The benefits depend on the storage provider setup. I will make some assumptions and try to provide a rationale for why going forward with this proposal would be beneficial. Assumptions
PC1 paradigm changePC1 is widely known to be the biggest bottleneck for sealing as the workload requires a storage medium that can support a high amount of read IOPS operations while also linearly scaling with the number of sector being sealed. Naturally, system memory does not scale well, as it requires a new CPU for each allocatable DIMM, which often implies a new system. This, in turn, requires a new motherboard, PSU, system disk, case, rackspace, and so on. Additionally, with higher density DIMMs, cost tends to increase quickly. Therefore, most storage providers today strike a fine balance between the number of sectors they run in parallel on a single machine. Due to this balance, reducing the cache size was not a significant concern, as PC1 was slowing down everything and did not require a large cache. A good way to illustrate this is with the example of a river filling a reservoir that is blocked by a dam. In this case, the river represents the inflow of sectors from PC1, which is slow due to the scarcity of system memory. This means that it requires a smaller reservoir. The dam, in turn, represents the GPUs required to perform C2. The problem arises when we replace a scarce medium like system memory with a more abundant one that can scale horizontally. This puts us in a situation where, if we were to maintain the same cache requirements, the size of the cache would block everything. Moving away from DRAMAs NANDs continue to evolve, the gap in IOPS between DRAM and NANDs is slowly narrowing. In some cases, this allows us to take advantage of NVMe by using enough of them in parallel to achieve the necessary IOPS. Recent improvements on PC1 will soon allow us to take advantage of NVMe for PC1. This means we will be able to perform PC1 on many more sectors in parallel. However, if the cache requirements are not reduced, it will certainly bottleneck the sealing process. To provide a better perspective, batches of PC1 will go from 15 to 128 sectors. This implies that the cache requirement will increase from 6 TB (15 x 400GB) to 51.2 TB (128 x 400GB), which needs to be stored for 150 epochs (approximately 75 minutes) plus the time required to execute C2. While with this proposed improvement, we would only require 3.2 TB, which is half of today's average requirement. To minimize the hardware changes that storage providers need to perform to benefit from PC1 improvements, we need to minimize the number of machines performing PC1. While if the cache is bottlenecking, multiple PC1 machines will be required, but if the cache is reduced, multiple PC1 batches results can be stacked. So to quickly summarize, if we don’t reduce the cache requirement we are almost certainly going to hit a bottleneck on the cache. |
Beta Was this translation helpful? Give feedback.
-
Please refer my earlier feedback to address current sealing paradigm, where our PC1 throughput is limited by memory, and not the amount of NVMe, as we have plenty in our servers to sustain the pipeline. This is not the same case for @SBudo, so it will be an individual case for each SP. Now, talking about SaaS and the PC1 improvements, promising up to 128 PC1 jobs in parallel. I basically have to echo @vvkio, and add some relevant details to the development we see on this area. We could very well be looking at a hybrid solution for SaaS, where the SP prepares and processes the actual data content (deal data), while the SaaS provider only provides the heavy lifting of generating the 11 layers of data (referred to as 400GB per sector). Basically the SaaS provider will be sitting on these 128 sector batches consuming +50TB on very expensive NVMe, and will basically not be able to continuously produce new batches of 128 sectors, unless this storage is pruned as soon as possible. Having synthetic PoRep would be fundamental to this solution! Basically its not feasible for an SaaS provider to fit PC1/PC2 servers with say 16-20x 15TB or even 30TB drives. It would produce a BIG release of resources into a SaaS model like this as @vvkio mentioned here. From a SaaS perspective, it would be even more "clean" to run this as Ni PoRep, but this is introducing quite a lot of overhead on the snarks (more compute, more power usage). Therefore I would see synthetic PoRep as a DO NOW, which would help out SPs with underprovisioned NVMe in there pipeline, but also greatly relax PC1/PC2 server requirements SaaS / Improved sealing paradigm ! Lets get this ball rolling asap! |
Beta Was this translation helpful? Give feedback.
-
While I believe this optimization will benefit a lot of storage providers, I also think some would prefer not to change the current sealing routine, especially those who have applied massive modifications to the current pipeline because such a change may break their modifier pipeline and they will have to do a lot more work to accommodate. With this in mind, I would suggest that we keep both options. It would be very easy to make the two types of PoRep produce the same result. That is:
This way, everyone will be happy. Hope this makes sense. |
Beta Was this translation helpful? Give feedback.
-
Background
The PoRep protocol today requires SPs to store ~12x sector size of data created during the replication step (PC1 + PC2) until the sector is proven in the on-chain proveCommit step.
Proposal
We proposed Synthetic PoRep, a FIP that drastically lowers down the usage of ssd/NVMe after PC2 is complete by reducing the set of challenges that are chosen during the proveCommit step. Note that this happens without compromising security.
More details can be found in FIP0059.
In practice, Synthetic PoRep reduces the size of the temporary data stored between PreCommit and ProveCommit (150 epochs) from ~400GiB to ≤ 25GiB
Improvement compared with the status quo
We believe that this will allow for an additional 25% sealing throughput (assuming the SP keeps the same computing setup as today) with respect to the current rate. Overall, this can traslate in cost saving for sealing CC sectors.
Indeed, today SPs need to have ~400 GiB SSD for sealing a sector. After PC1 and PC2 this storage capacity is mostly filled with the 11 layers of SDR which need to stay there for 150 epochs, before being proved at ProveCommit.
With Synthetic PoRep, only a small buffer of less than 25GiB need to be kept around from PreCommit until ProveCommit. This means that with less than 5% more SSD storage available, SPs can start sealing a new sector right after completing PC1 and PC2 of the old sector, without need to wait ProveCommit to be over.
Note that, assuming PC1 takes almost 3h and we have 150 epochs between PreCommit and ProveCommit.
Discussion
In order for Synthetic PoRep to move forward we need
Beta Was this translation helpful? Give feedback.
All reactions