attempting to GC indexes: clearing index 2: command is too large #61206

dankinder · 2021-02-26T20:49:50Z

Describe the problem

On truncating a table with about 180GB of data, the GC got this error:
attempting to GC indexes: clearing index 2: command is too large: 120227141 bytes (max: 67108864)

This was data we had imported (via IMPORT INTO ... CSV) within the past day.

Note, this is using v21.1.0-alpha3 in order to get this fix, or else our S3 reads time out.

I have a debug zip exported if you want me to upload it in the support portal.

Environment:

CockroachDB version v21.1.0-alpha3
Server OS: Centos 6

The text was updated successfully, but these errors were encountered:

blathers-crl · 2021-02-26T20:49:53Z

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

@miretskiy (author of bulkio: Implement resuming reader for AWS. #59204)
@cockroachdb/bulk-io (found keywords: IMPORT,export)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

ajwerner · 2021-02-27T00:13:07Z

Is this using interleaving somehow?

ajwerner · 2021-02-27T00:14:42Z

There is a cluster setting with the max command size; that may unstick you but it carries modest risk. Definitely set it back down afterwards. Fortunately this isn't attempting to do anything totally nuts. The other option is to decrease the max range size for the database to, say, 32 MiB (or just the default one if that database is gone now). That's a safer choice but will incur more load on the cluster as splitting and merging happens.

dankinder · 2021-02-27T03:01:20Z

Interestingly it seems like that data still got dumped regardless... so I guess there's nothing about our cluster that needs remediation per se. But I assume you would want to prevent this from happening in the future. Nothing about the dataset was really unusual.

And no, no interleaving. This was a really simple dataset with one table, a few columns with one INT primary key.

ajwerner · 2021-02-28T16:28:13Z

"Still got dumped" as in gc happened?

Also, that error, where are you seeing it? Is it in the logs or did it return from truncate itself?

dankinder · 2021-03-01T14:49:47Z

What I mean is, the number of live bytes dropped dramatically, so seemingly most of the data got cleared if not all of it.

This error is not on the TRUNCATE job, it's on the GC job that followed it, i.e. it's GC for TRUNCATE TABLE <the table> that failed.

If that failure does leave data leftover, will it eventually get cleared in the normal compaction process?

ajwerner · 2021-03-01T15:05:09Z

It's bad that that job failed. We've had recent discussions about whether we should ever let that job fail.

#55740
#59542
#59788 (comment)

I can help you to re-start that job. In the meantime, can you grab a copy of its record before it gets deleted by the system? That'd be:

SELECT id, status, created, encode(payload, 'hex'), encode(progress, 'hex') FROM system.jobs WHERE id = <relevant job id>;

ajwerner · 2021-03-01T15:05:38Z

What I don't understand is why the GC job would be sending a large raft command. The clear range operation it uses should be small.

ajwerner · 2021-03-01T19:36:01Z

Are your keys somehow absolutely gigantic?

ajwerner · 2021-03-01T19:57:34Z

I think I've got a lead on this one. We'll need to do some manual things to recover the job. Thanks for the bug report!

ajwerner · 2021-03-01T20:33:35Z

Actually we're still pretty confused. Do you have any more intel on the structure of these tables to share?

dankinder · 2021-03-01T21:39:28Z

Yeah so I can at least say it's an extremely simple table, basically a few INT columns that together are the primary key. No other columns, no indexes.

I just sent a debug zip through the support portal and tagged this issue.

dankinder · 2021-03-03T20:43:16Z

If y'all couldn't find anything and don't intend to investigate further (because it's an alpha), it's okay if you want to close this, we're good as far as our cluster.

erikgrinaker · 2022-01-11T12:19:32Z

I think we've found the cause of this, submitted a fix in #74674.

dankinder added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Feb 26, 2021

blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Feb 26, 2021

mwang1026 assigned dt Mar 1, 2021

ajwerner closed this as completed Mar 4, 2021

ajwerner reopened this Jan 13, 2022

ajwerner closed this as completed Jan 13, 2022

jlinder added the sync-me-3 label May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attempting to GC indexes: clearing index 2: command is too large #61206

attempting to GC indexes: clearing index 2: command is too large #61206

dankinder commented Feb 26, 2021

blathers-crl bot commented Feb 26, 2021

ajwerner commented Feb 27, 2021

ajwerner commented Feb 27, 2021

dankinder commented Feb 27, 2021

ajwerner commented Feb 28, 2021

dankinder commented Mar 1, 2021

ajwerner commented Mar 1, 2021

ajwerner commented Mar 1, 2021

ajwerner commented Mar 1, 2021

ajwerner commented Mar 1, 2021

ajwerner commented Mar 1, 2021

dankinder commented Mar 1, 2021

dankinder commented Mar 3, 2021

erikgrinaker commented Jan 11, 2022

attempting to GC indexes: clearing index 2: command is too large #61206

attempting to GC indexes: clearing index 2: command is too large #61206

Comments

dankinder commented Feb 26, 2021

blathers-crl bot commented Feb 26, 2021

ajwerner commented Feb 27, 2021

ajwerner commented Feb 27, 2021

dankinder commented Feb 27, 2021

ajwerner commented Feb 28, 2021

dankinder commented Mar 1, 2021

ajwerner commented Mar 1, 2021

ajwerner commented Mar 1, 2021

ajwerner commented Mar 1, 2021

ajwerner commented Mar 1, 2021

ajwerner commented Mar 1, 2021

dankinder commented Mar 1, 2021

dankinder commented Mar 3, 2021

erikgrinaker commented Jan 11, 2022