Large TVP parameter consumes memory and processing power #879

Jamezo97 · 2019-03-19T05:03:29Z

The Issue

We're trying to execute a stored procedure with approximately 100,000 rows of data, each consisting of around 20 bytes of data. This totals around 2MB.

However while assembling the RpcRequestPayload, the NodeJS process maxes out the CPU core, and starts allocating insane amounts of memory, chewing through hundreds of megabytes per second. It took over 10 minutes before it sent the request to the database.

I believe I've tracked the issue down to the 'WritableTrackingBuffer'. For all operations except a 'bulk-load', the WritableTrackingBuffer reallocates the internal buffer for every write after the initial buffer (500 bytes) is filled.

If we write 4 bytes for each operation, that's approximately 100,000*(20/4) allocations for the entire TVP, allocating 504, 508, 512, 516, 520 etc all the way up to 2,000,000. This eventually totals 465 Gigabytes of NodeJS buffers allocated, not to mention 465 Gigabytes of memory copied between each successive buffer. All for a ~2 megabyte payload!

This kind of issue seems to have been reported before but I didn't find any mention of increased processing power and memory usage: #475
There's a PR for this issue, however it doesn't seem to solve the above problem, all it does is relinquish control back to the event loop periodically: #845

Solutions

The easiest solution is to force the WritableTrackingBuffer to enable 'doubleSizeGrowth' for all instances.

  constructor(initialSize: number, encoding: ?Encoding) {
    this.initialSize = initialSize;
    this.encoding = encoding || 'ucs2';
    this.doubleSizeGrowth = true;
    this.buffer = Buffer.alloc(this.initialSize, 0);
    this.compositeBuffer = ZERO_LENGTH_BUFFER;
    this.position = 0;
  }

This is a common solution that I've seen in other applications where the size of the Buffer or array isn't known initially. It seems to cause the buffer to double in size, so for a 2 MB payload, it would allocate
512, 1024, 2048, 4096, 8192... etc, or something similar. This causes a maximum of size * 4 to be allocated for size payload.

The second, harder solution is to assemble the RpcRequestPayload twice. The first time just keep track of the size of the buffer to be created without actually allocating anything. This could be done by creating a class similar to WritableTrackingBuffer with the same writeX functions which just increment a counter. Then on the second pass, just allocate the Buffer once and fill it.

Alternatively, a third solution is to allocate new Buffers in chunks of say, 512 bytes internally within the WritableTrackingBuffer. Each time the buffer is filled or about to be filled, truncate any unused bytes, push the buffer to an array, allocate a new Buffer and reset the buffer position.

  get data() {
    if(this.buffers.length === 0) {
      return this.buffer.slice(0, this.position);
    }
    const buffer = this.buffer.slice(0, this.position);
    const newLen = this.buffers.reduce((count, buff) => {
      return count + buff.length;
    }, buffer.length);
    let compositeBuffer = Buffer.alloc(newLen);
    let offset = 0;
    for(let buff of [buffer, ...this.buffers]) {
      buff.copy(compositeBuffer, offset, 0, buff.length);
      offset += buff.length;
    }
    return compositeBuffer;
  }

  makeRoomFor(requiredLength: number) {
    if (this.buffer.length - this.position < requiredLength) {
      this.buffers.push(this.buffer.slice(0, this.position)); // trim any old data
      this.buffer = Buffer.alloc(512, 0);
      this.position = 0;
    }
  }

I'm more than happy to make a PR for any of these, or something else if there are better solutions. I haven't tested any of the above either so treat it as pseudo code.

Any thoughts or suggestions are welcome. Thanks!

The text was updated successfully, but these errors were encountered:

Jamezo97 · 2019-03-19T05:35:51Z

Using mssql 4.3.0 the following code should replicate the issue:

const mssql = require("mssql");
async function test() {
    // Send 100 people to each device.
    // Each association is 1 record
    const People = 100;
    const Devices = 1000;

    let tblParam = new mssql.Table();
    tblParam.columns.add("fkidPerson", mssql.Int, {nullable: true});
    tblParam.columns.add("fkidDevice", mssql.Int, {nullable: true});
    tblParam.columns.add("foo", mssql.Int, {nullable: true});
    tblParam.columns.add("bar", mssql.Int, {nullable: true});

    for(let i = 0; i < People; i++) {
        for(let j = 0; j < Devices; j++) {
            tblParam.rows.add(i+10000, j + 1000, i%2 == 0, j%7 == 0);
        }
    }

    let conn = new mssql.ConnectionPool({
        server: "localhost",
        user: "service.worker",
        password: "Password",
        database: "Database"
    });

    await conn.connect();
    let req = conn.request();
    req.input("paramMyTableParam", mssql.TVP, tblParam);
    await req.execute("procDoSomethingWithMyTableParam");
    await conn.close(); 
}
test();

MichaelSun90 · 2019-03-21T20:45:23Z

Hi @Jamezo97, I have tried the script that you provide, for the issue, I should expect the memory usage grow experientially right? While I am running the script that you provided, I am checking the memory usage under task manager, the memory performance looks normal. Is this the correct way to reproduce this issue, or you monitor the memory usage using a different way?

Jamezo97 · 2019-03-21T22:24:38Z

Hi @MichaelSun90 , thank you for your interest in this issue.

Sorry for not making it clear, but I believe that what you're seeing is the correct behaviour. The total memory usage of the process will remain limited to within a reasonable range as the GC cleans up the old allocated buffers. As each new buffer is allocated, the previous buffers go out of scope and are subsequently cleaned up.

The number I gave above of 465 Gigabytes was just an estimation of the total sum of all the memory the operation would allocate if run to completion.

What you should see, is the memory usage of the process fluctuating rapidly, increasing in... intensity...(?) as time goes on and the size of the buffers being allocated get larger.
You should also see abnormally high CPU usage, as it copies the data from each buffer to the next, consuming CPU cycles.

I've run the test using node v10.13.0 and v8.15.0 (versions on my Window and WSL instances respectively) and they both show the same symptoms.

I also tried it with v5.0.3 of mssql (which requires tedious v4.2.0) and the issue remains.

I wrote a quick 'hotfix' which solves the issue, so maybe try with and without the fix to see if there's a difference?

"use strict"
let TrackingBuffer = require("tedious/lib/tracking-buffer/writable-tracking-buffer");

let OldMakeRoomFor = TrackingBuffer.prototype.makeRoomFor;

// Override the existing prototype function
TrackingBuffer.prototype.makeRoomFor = function() {
  /**
   * Forcibly enable doubleSizeGrowth for all buffers created.
   * This allows us to create a TVP table with 2MB of data by only
   * allocating ~8 MB of memory for it.
   */
  this.doubleSizeGrowth = true;
  OldMakeRoomFor.apply(this, arguments);
}

Thanks!

IanChokS · 2019-12-17T20:03:19Z

Hi @Jamezo97, do you get the same performance lag when you use Bulk load operations instead? As stated in the MSDN docs, "table-valued parameters perform well for inserting less than 1000 rows."

IanChokS · 2020-01-17T21:55:37Z

Should be solved in #1037 when it's done 😄

Description: #1038

arthurschreiber · 2020-03-07T22:38:54Z

🎉 This issue has been resolved in version 8.1.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

IanChokS added discussion enhancement Follow up Recent question asked by tedious members for old issues or discussion that requires member input labels Dec 13, 2019

IanChokS mentioned this issue Dec 18, 2019

Inserts into varbinary(max) col with FILESTREAM or reads from that column takes 10's of seconds #971

Closed

arthurschreiber mentioned this issue Mar 7, 2020

Improve performance for large requests #1066

Merged

arthurschreiber closed this as completed in #1066 Mar 7, 2020

arthurschreiber added the released label Mar 7, 2020

RamYadlapalli mentioned this issue Mar 11, 2020

Bumped tedious version to latest tediousjs/node-mssql#1003

Merged

This was referenced Aug 24, 2020

Table Valued Parameter Performance Issue bestlong/node-red-contrib-mssql-plus#35

Closed

Very slow performance with large TVPs due to current dependency of tedious set at v6.x tediousjs/node-mssql#1093

Closed

dhensby mentioned this issue Sep 15, 2021

Very slow performance with TVP tediousjs/node-mssql#1308

Closed

thegoatherder mentioned this issue Sep 16, 2021

Very slow performance with TVP (500 rows insert via stored procedure TVP) #1338

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large TVP parameter consumes memory and processing power #879

Large TVP parameter consumes memory and processing power #879

Jamezo97 commented Mar 19, 2019

Jamezo97 commented Mar 19, 2019

MichaelSun90 commented Mar 21, 2019

Jamezo97 commented Mar 21, 2019

IanChokS commented Dec 17, 2019

IanChokS commented Jan 17, 2020 •

edited

Loading

arthurschreiber commented Mar 7, 2020

Large TVP parameter consumes memory and processing power #879

Large TVP parameter consumes memory and processing power #879

Comments

Jamezo97 commented Mar 19, 2019

The Issue

Solutions

Jamezo97 commented Mar 19, 2019

MichaelSun90 commented Mar 21, 2019

Jamezo97 commented Mar 21, 2019

IanChokS commented Dec 17, 2019

IanChokS commented Jan 17, 2020 • edited Loading

arthurschreiber commented Mar 7, 2020

IanChokS commented Jan 17, 2020 •

edited

Loading