Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize stack related WCP operations #1242

Merged

Conversation

OlivierBBB
Copy link
Collaborator

No description provided.

@OlivierBBB OlivierBBB added the perf Increase performances label Sep 18, 2024
@OlivierBBB
Copy link
Collaborator Author

OlivierBBB commented Sep 18, 2024

Note. I was initially tempted to optimize things by changing everything to LT. This wouldn't be an optimization since we would convert the check GT( heightNew, 1024 ) (i.e. heightNew > 1024) to LT( 1024, heightNew ) (i.e. 1024 < heightNew). But there can never be collisions with the LT( height, delta ) comparisons since 0 <= delta <= 17.

A consenquence of this is that there can't be any argument collisions between the two use cases. We therefore don't need to distinguish between LT and GT operations in this set. We will be able to get rid of the CHECK_USES_LT and CHECK_USES_GT constants altogether if we so desire.

@OlivierBBB
Copy link
Collaborator Author

OlivierBBB commented Sep 18, 2024

Also an option would be to move things from Conflation to TxStack. I'm not sure whether there's any point to it. Indeed the data is pretty "transaction-related."

Signed-off-by: Francois Bojarski <[email protected]>
@OlivierBBB
Copy link
Collaborator Author

OlivierBBB commented Sep 18, 2024

@ahamlat : @letypequividelespoubelles suggests splitting the set in 2. It makes sense.

The stack underflow set will always be at most of size 1026 and only needs to record heightNew (also the possible values of heightNew are 0, 1, ..., 1024, 1025.

On the other hand if people may want to practice DUPX/SWAPX scales at all possible heights one could lead to up to (1 + 17) * 1025 = 18450 different elements in the set needed for detecting stack underflows.

Already today both sets of values for SUX/SOX are disjoint but we insert values in the same StackedSet. And @letypequividelespoubelles 's argument is that this way we would have faster search times for finding membership. The only argument against it I can see is having to deal with 2 maps rather than 1, so more complexity. But that's about it.

Should we do it ?

Copy link
Collaborator

@letypequividelespoubelles letypequividelespoubelles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do it ?

I'm in favor of it, yes ! :)

@OlivierBBB
Copy link
Collaborator Author

It is done @letypequividelespoubelles

@OlivierBBB OlivierBBB enabled auto-merge (squash) September 18, 2024 18:12

@Override
protected int computeLineCount() {
return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of 0 it should return an exception, as this line count shouldn't be called

Copy link
Contributor

@ahamlat ahamlat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just small comments but nothing blocking

* @param heightNew stack height post opcode execution
*/
public StackHeightCheck(int heightNew) {
checkArgument(0 <= heightNew && heightNew <= MAX_STACK_SIZE + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, why the height can go from 0 to 1025 ? I would expect 1024 instead of 1026 elements

Copy link
Collaborator Author

@OlivierBBB OlivierBBB Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The height is not allowed to go to anything > 1024 but certain instructions will bring it to 1025. E.g. your stack is full (height = 1024) and you do a PUSHX or TIMESTAMP or so ... some instruction that doesn't remove any items, $\mathtt{nbRemoved} = 0$, but adds onto it, $\mathtt{nbAdded} = 1$. Or in EYP terminology $\delta &lt; \alpha$. In this case the zkevm will compute heightNew = height - delta + alpha = height + 1 = 1024 + 1.

Copy link
Collaborator Author

@OlivierBBB OlivierBBB Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It leads to a stackOverflowException and so it's not a legal value from the EVM pov. From the zkevm pov the WCP module will test whether heightNew > 1024 regardless, in this case the answer would be true and so the zkevm knows to throw a SOX (STACK_OVERFLOW_EXCEPTION).

hub.transients().conflation().stackHeightChecksForStackUnderflows().add(checkForUnderflow);
if (isNewCheckForStackUnderflow) {
final boolean underflowDetected = hub.wcp().callLT(height, delta);
Preconditions.checkArgument(underflowDetected == (status == Status.UNDERFLOW));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use checkState in this case instead of checkArgument. checkArgument will return IllegalArgumentException and it is used usually to check that the argument of a method is well defined. In this case, we are more interested in the state of calculated field. checkState will return IllegalStateException if you use Guava implementation.

hub.transients().conflation().stackHeightChecksForStackOverflows().add(checkForOverflow);
if (isNewCheckForStackOverflow) {
final boolean overflowDetected = hub.wcp().callGT(heightNew, MAX_STACK_SIZE);
Preconditions.checkArgument(overflowDetected == (status == Status.OVERFLOW));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above

// stack underflow checks happen for every opcode
final StackHeightCheck checkForUnderflow = new StackHeightCheck(height, delta);
final boolean isNewCheckForStackUnderflow =
hub.transients().conflation().stackHeightChecksForStackUnderflows().add(checkForUnderflow);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can expose a get method on the hub level to access stackHeightChecksForStackUnderflows data structure. The method in Hub.java would return transients().conflation().stackHeightChecksForStackUnderflows(), if we know that transients() and conflation() will not return null. We can add checks to avoid NullPointerExceptions

hub.wcp().callGT(this.height - delta + alpha, MAX_STACK_SIZE);
final StackHeightCheck checkForOverflow = new StackHeightCheck(heightNew);
final boolean isNewCheckForStackOverflow =
hub.transients().conflation().stackHeightChecksForStackOverflows().add(checkForOverflow);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above

*/
@Test
void legacyTxWithoutChainIDAndSmallSignature() {
replay(LINEA_SEPOLIA, "254251.sepolia.json.gz");
replay(LINEA_SEPOLIA, "254251.sepolia.json.gz", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the check on the result is not needed anymore ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the replay test wasn't working when I didn't specify false. We have several tests where the state check fails at the end, @DavePearce will know more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But i think it a working now, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I didn't test it. I'll do it now 🙂

import net.consensys.linea.zktracer.container.ModuleOperation;

@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = false)
public class StackHeightCheck extends ModuleOperation {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using an Integer as the key to the StackedSet since the only field you're using here is an int. If you have lots of these kinds of checks you will create a new key every time which causes lots of memory churn.

Integer has a cache within the JVM itself from -128 to 127 but it's also allowed to run further with a flag. Taken from java.lang.Integer class:

    private static final class IntegerCache {
        static final int low = -128;
        static final int high;

        @Stable
        static final Integer[] cache;
        static Integer[] archivedCache;

        static {
            // high value may be configured by property
            int h = 127;
            String integerCacheHighPropValue =
                VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
            if (integerCacheHighPropValue != null) {
                try {
                    h = Math.max(parseInt(integerCacheHighPropValue), 127);
                    // Maximum array size is Integer.MAX_VALUE
                    h = Math.min(h, Integer.MAX_VALUE - (-low) -1);
                } catch( NumberFormatException nfe) {
                    // If the property cannot be parsed into an int, ignore it.
                }
            }
            high = h;

            // Load IntegerCache.archivedCache from archive, if possible
            CDS.initializeFromArchive(IntegerCache.class);
            int size = (high - low) + 1;

            // Use the archived cache if it exists and is large enough
            if (archivedCache == null || size > archivedCache.length) {
                Integer[] c = new Integer[size];
                int j = low;
                for(int i = 0; i < c.length; i++) {
                    c[i] = new Integer(j++);
                }
                archivedCache = c;
            }
            cache = archivedCache;
            // range [-128, 127] must be interned (JLS7 5.1.7)
            assert IntegerCache.high >= 127;
        }

        private IntegerCache() {}
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StackHeightCheck StackedSet is a small one, the maximum length is 256. I don't think it is worth using the Integer inner cache here.
The cache was specifically created to reduce the auto-boxing overhead of Integer to int, and works only with Integer.valueOf().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong but I thought WcpOperations were being created 2x per opcode execution, hence you would do lots of key lookups during tracing. Exactly! My plan was to compute the comparison int and store it in an Integer which would be then used as a cached key. It is a permanent cache so that's a downside.
Anyway this is based on what I thought the behaviour was, maybe it's worth to have another round of performance analysis before doing that

Copy link
Collaborator Author

@OlivierBBB OlivierBBB Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'm using a StackHeightCheck class which is just a wrapper around an int is that they will populate two StackedSet's which in turn require as keys things implement the ModuleOperation interface. I wasn't sure whether I can decalre Integer to follow that interface (but I also didn't try.)

Copy link
Collaborator Author

@OlivierBBB OlivierBBB Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality we could extract the StackedSet logic and make it generic. Something that has

  • a set of committed objects
  • a set of as of current objects
  • methods to enter, pop, ...

and then extend it to a more specialized StackedSetOfModuleOperations with more module specific operations like lineCount etc ... @letypequividelespoubelles @ahamlat @lu-pinto what do you think ?

This way we would have a fully general stacked set class which could accept e.g. Integer's.

Copy link
Collaborator Author

@OlivierBBB OlivierBBB Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StackHeightCheck StackedSet is a small one, the maximum length is 256

That isn't quite true. For overflows the maximum size of this set is 1026. For underflows the max size is (1 + 17) * 1025 = 18450. I take it the 256 you reference is the initial allocation. Which tbh we may as well raise to 1024.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n reality we could extract the StackedSet logic and make it generic. Something that has

a set of committed objects
a set of as of current objects
methods to enter, pop, ...

and then extend it to a more specialized StackedSetOfModuleOperations with more module specific operations like lineCount etc ... @letypequividelespoubelles @ahamlat @lu-pinto what do you think ?

Yes, it's the right solution. More we don't need to implement the line counting for the StackedSetOp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@ahamlat ahamlat Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it the 256 you reference is the initial allocation. Which tbh we may as well raise to 1024.

May be we should review the naming then, because in the constructor, the two arguments are : expectedConflationNumberOperations and expectedTransactionNumberOperations. I would suggest to replace expected with initial.

Copy link
Collaborator

@lu-pinto lu-pinto Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry somehow missed this yday.

which in turn require as keys things implement the ModuleOperation interface

Alright that makes sense. I completely missed this and just assumed StackedSet was generic. Yes I agree that having a class that is generic and another that extends from it that does more specialized checks on add is the right way to go.

@OlivierBBB OlivierBBB merged commit 29d44b3 into arith-dev Sep 20, 2024
5 checks passed
@OlivierBBB OlivierBBB deleted the 1238-optimize-stack-health-related-wcp-operations branch September 20, 2024 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Increase performances
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize stack health related WCP operations
5 participants