Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction scheduler with between-block execution #1477

Closed
michaelfig opened this issue Aug 15, 2020 · 4 comments
Closed

Transaction scheduler with between-block execution #1477

michaelfig opened this issue Aug 15, 2020 · 4 comments
Assignees
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request

Comments

@michaelfig
Copy link
Member

michaelfig commented Aug 15, 2020

What is the Problem Being Solved?

We want to make it possible to offload evaluation to a deterministic scheduling algorithm between blocks, instead of paying the price of evaluation during block voting. We also want not to have to roll back the vat's heap state when there is a failed voting round.

Description of the Design

Initially, there will be two scheduler queues (the "submit queue", and the "SDK Todo" queue) in the kernel DB. The submit queue will be fully processed before voting on the next block begins (no multi-block carryovers or escalator algorithm).

The validator commitment will be only to the submission of transactions, but the results of those transactions will be delayed at least one block by the scheduler.

The parser-like expression for the transaction state machine is:

((SIMULATE_TX* BEGIN_BLOCK BLOCK_TX* END_BLOCK)+ COMMIT_BLOCK)*;

Here is pseudocode for the design:

// Start with a pending promise.  It will resolve when we are ready for BEGIN_BLOCK.
let readyForNextBlockPK = makePromiseKit();

// Track the state of the in-memory submit queue.
let simulateSubmissions = true;
// Start with an empty submit queue.
const submitQueue = [];

// Once, on initialisation:
if (kernelDB.schedulerIsRunning) {
  // We were interrupted during running the scheduler.
  // Restart it now.  BEGIN_BLOCK will wait for it.
  runScheduler().catch(dieHorribly);
} else {
  // Initialisation after the scheduler has run completely.
  readyForNextBlockPK.resolve();
}

function enqueueSubmission(txEvent) {
  if (simulateSubmissions) {
    // It's a simulated transaction (i.e. between Cosmos blocks), don't enqueue.
    return;
  }
  // Add to the in-memory queue for processing during COMMIT_BLOCK.
  submitQueue.push(blockEvent);
}

async function execute(work) {
  // TODO: Run swingset with the work item, updating kernel and vat state.
  // Note that each SDK operation must be queued in kernel state because we don't have
  // an SDK block context (we are running between blocks).
  kernelDB.queuedSDKOperations.push(xxx);

  // The SDK operations will be processed when we have an SDK context in BEGIN_BLOCK
  // (which may happen multiple times, resetting the context between attempts, until the
  // speculative chain context is committed in COMMIT_BLOCK).
}

async function runScheduler() {
  // A simple algorithm, just process the entire queue.
  // TODO: make sophisticated.
  while (kernelDB.scheduler.length > 0) {
    const work = kernelDB.scheduler.shift();
    await execute(work);
  }

  // Done running a single block's work.
  kernelDB.schedulerIsRunning = false;
  kernelDB.commit();

  // Allow the next BEGIN_BLOCK to continue the state machine.
  readyForNextBlockPK.resolve();
}

// This is the "block manager" state machine.
async processTransaction(txEvent) {
  switch (txEvent.type) {
    case 'BEGIN_BLOCK': {
      await readyForNextBlockPK.promise;
      assert(simulateSubmissions);
      simulateSubmissions = false;
      assert(!kernelDB.schedulerIsRunning);
      // Throw away any speculative submit queues.
      submitQueue.clear();
      // Begin the block with a fresh run of the synchronous SDK operations we scheduled in the last block.
      // This includes updates to `x/swingset/store`, such as mailbox changes.
      for (const sdkOperation of kernelDB.queuedSDKOperations) {
        enqueueSubmission(executeSDKOperation(sdkOperation));
      }
      // Next comes the actual BEGIN_BLOCK event.
      enqueueSubmission(txEvent);
      break;
    }
    case 'END_BLOCK': {
      assert(!simulateSubmissions);
      enqueueSubmission(txEvent);
      // Any transactions before BEGIN_BLOCK must be simulated, not queued.
      simulateSubmissions = true;
      break;
    }
    case 'COMMIT_BLOCK': {
      assert(simulateSubmissions);
      // We replace the ready flag with a pending promise.
      readyForNextBlockPK = makePromiseKit();
      // Do whatever is necessary to add the submit queue to the kernel DB's scheduler state.
      addToScheduler(submitQueue, kernelDB.scheduler);
      kernelDB.schedulerIsRunning = true;
      kernelDB.commit();
      // Don't await the execution of the scheduler; it will run between blocks.
      runScheduler().catch(dieHorribly);
      break;
    }
  // SIMULATE_TX or BLOCK_TX
  default: {
      // An arbitrary user-level Cosmos transaction.
      enqueueSubmission(blockEvent);
      break;
    }
}

Security Considerations

The simplest runScheduler algorithm (process the entire submit queue on every block) is not resistant to denial-of-service.

Test Plan

Ensure all existing chain functionality works as before. There is expected to be an additional single-block delay for every message round trip (as they can only be answered in the next block).

@michaelfig michaelfig added enhancement New feature or request cosmic-swingset package: cosmic-swingset labels Aug 15, 2020
@michaelfig michaelfig self-assigned this Aug 15, 2020
@michaelfig michaelfig changed the title Block transaction scheduler Transaction scheduler with between-block execution Aug 15, 2020
@dtribble
Copy link
Member

Why not limit the number that gets executed from the run queue? and then carry it forward?

@michaelfig
Copy link
Member Author

Why not limit the number that gets executed from the run queue? and then carry it forward?

That's an excellent idea. Do you have any suggestion as to how many?

@warner
Copy link
Member

warner commented Aug 19, 2020

I think I could use a picture/reminder of the lifecycle of a Cosmos-SDK block, I've got about half an understanding of this approach.

If we only run the kernel between Cosmos blocks, does that mean swingset devices cannot query the Cosmos state vector? E.g. they can't fetch balances from Bank keepers? I remember thinking we had a need for synchronous read access to state outside the swingset kernel DB.

@michaelfig
Copy link
Member Author

I think I could use a picture/reminder of the lifecycle of a Cosmos-SDK block, I've got about half an understanding of this approach.

Here is the DFA, which could be unpacked into a state diagram:

((SIMULATE_TX* BEGIN_BLOCK BLOCK_TX* END_BLOCK)+ COMMIT_BLOCK)*;

If we only run the kernel between Cosmos blocks, does that mean swingset devices cannot query the Cosmos state vector? E.g. they can't fetch balances from Bank keepers? I remember thinking we had a need for synchronous read access to state outside the swingset kernel DB.

Yes, that is a thorny issue. Let's talk about it tomorrow. Personally, I am okay with either of the "only run between blocks" or the "throw failed votes away" scenarios. If we're lucky, there might be some way to get advantages of both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cosmic-swingset package: cosmic-swingset enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants