Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: VtGate fatal error: concurrent map writes #13908

Closed
mailankitkm opened this issue Sep 1, 2023 · 4 comments · Fixed by #14012
Closed

Bug Report: VtGate fatal error: concurrent map writes #13908

mailankitkm opened this issue Sep 1, 2023 · 4 comments · Fixed by #14012

Comments

@mailankitkm
Copy link

mailankitkm commented Sep 1, 2023

Overview of the Issue

We experienced random vtgate service restarts with a fatal error: concurrent map writes. below are the logs:

Reproduction Steps

We suspect to below query is causing the issue:

select job_config.job_id, job_config.cfs_config_id, job_config.storage_config_id, job_config.storage_type, job_config.direction_type, job_config.target_path, job_config.source_path, job_config.sync_state, job_config.is_deleted, job_config.creation_time, job_config.last_updated, job_config.sync_frequency, job_config.is_delete_after_copy, job_config.is_propagate_deletes, job_config.is_synchronize_all_versions, job_config.bulk_folder_mapping_enabled, job_config.root_folder_may_not_exist, job_config.preconfigurable_files, job_config.cloudsync_version, job_config.custom_metadata_namespace, job_config.external_id, job_config.external_name, job_config.job_type, job_config.source_type, job_config.is_source_cleanup_only_job, job_config.tenant_id, job_config.user_id, job_config.cfs_folders_options, job_config.storage_folders_options, job_config.purge_source_after_period_days, job_config.aggregate_id, cfs_config.cfs_config_id, cfs_config.domain, cfs_config.admin_user, cfs_config.admin_user_token, cfs_config.egnyte_oauth_token, cfs_config.type, cfs_config.tag, cfs_config.creation_time, cfs_config.last_updated from job_config join cfs_config on job_config.cfs_config_id = cfs_config.cfs_config_id where job_config.job_id = ?

Vschema:

  "sharded": true,
  "vindexes": {
    "always-same-shard": {
      "type": "null"
    },
    "any-type-nullable-hash": {
      "type": "unicode_loose_md5"
    }
  },
  "tables": {
    "DATABASECHANGELOG": {
      "columnVindexes": [
        {
          "column": "ID",
          "name": "always-same-shard"
        }
      ],
      "columns": [
        {
          "name": "ID"
        },
        {
          "name": "AUTHOR"
        },
        {
          "name": "FILENAME"
        },
        {
          "name": "DATEEXECUTED"
        },
        {
          "name": "ORDEREXECUTED"
        },
        {
          "name": "EXECTYPE"
        },
        {
          "name": "MD5SUM"
        },
        {
          "name": "DESCRIPTION"
        },
        {
          "name": "COMMENTS"
        },
        {
          "name": "TAG"
        },
        {
          "name": "LIQUIBASE"
        },
        {
          "name": "CONTEXTS"
        },
        {
          "name": "LABELS"
        },
        {
          "name": "DEPLOYMENT_ID"
        }
      ],
      "columnListAuthoritative": true
    },
    "DATABASECHANGELOGLOCK": {
      "columnVindexes": [
        {
          "column": "ID",
          "name": "always-same-shard"
        }
      ]
    },
    "cfs_config": {
      "columnVindexes": [
        {
          "column": "cfs_config_id",
          "name": "always-same-shard"
        }
      ]
    },
    "cfs_files": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "cfs_folders": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "cfs_source_config": {
      "columnVindexes": [
        {
          "column": "domain",
          "name": "always-same-shard"
        }
      ]
    },
    "cfs_source_files": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "cfs_source_folders": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "clfs_config": {
      "columnVindexes": [
        {
          "column": "id",
          "name": "always-same-shard"
        }
      ]
    },
    "clfs_files": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "clfs_folders": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "consolidated_operations": {
      "columnVindexes": [
        {
          "column": "mapping_id",
          "name": "any-type-nullable-hash"
        }
      ],
      "autoIncrement": {
        "column": "id",
        "sequence": "cloudsync_db.consolidated_operations_seq"
      }
    },
    "cs_config": {
      "columnVindexes": [
        {
          "column": "cs_config_id",
          "name": "always-same-shard"
        }
      ]
    },
    "cs_config_mapping": {
      "columnVindexes": [
        {
          "column": "cs_config_id",
          "name": "always-same-shard"
        }
      ]
    },
    "cs_files": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "cs_folders": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "custom_metadata": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "dry_run_changelog": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "dry_run_summary": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "folder_mapping_exclusions": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ],
      "autoIncrement": {
        "column": "id",
        "sequence": "cloudsync_db.folder_mapping_exclusions_seq"
      }
    },
    "folder_mappings": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "job_archive": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "always-same-shard"
        }
      ]
    },
    "job_config": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "always-same-shard"
        }
      ]
    },
    "jobs_subscriptions": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "raw_operations": {
      "columnVindexes": [
        {
          "column": "mapping_id",
          "name": "any-type-nullable-hash"
        }
      ],
      "autoIncrement": {
        "column": "id",
        "sequence": "cloudsync_db.raw_operations_seq"
      }
    },
    "remote_storage_folder_exclusions": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "rfs_config": {
      "columnVindexes": [
        {
          "column": "mapping_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "rfs_files": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "rfs_folders": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "skipped_operations": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ],
      "autoIncrement": {
        "column": "id",
        "sequence": "cloudsync_db.skipped_operations_seq"
      }
    },
    "sync_cycle": {
      "columnVindexes": [
        {
          "column": "mapping_id",
          "name": "any-type-nullable-hash"
        }
      ],
      "autoIncrement": {
        "column": "id",
        "sequence": "cloudsync_db.sync_cycle_seq"
      }
    },
    "sync_stages": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "sync_stages_ordering": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "transformed_folder_mappings": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "verification_errors": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    },
    "verification_summary": {
      "columnVindexes": [
        {
          "column": "job_id",
          "name": "any-type-nullable-hash"
        }
      ]
    }
  }
}```

### Binary Version

```sh
Version: 14.0.0-SNAPSHOT (Git revision d5ab57124082570368d7c162e58738249d93f1a6 branch 'main') built on Tue May  3 20:08:16 UTC 2022 by planetscale@codespaces-f3d137 using go1.18.1 linux/amd64

Operating System and Environment details

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"


uname -sr
Linux 3.10.0-1160.71.1.el7.x86_64

uname -m
x86_64

Log Fragments

`Aug 27 21:21:32 vtgate01 vtgate: fatal error: concurrent map writes
Aug 27 21:21:32 vtgate01 vtgate: goroutine 22336267 [running]:
Aug 27 21:21:32 vtgate01 vtgate: runtime.throw({0x1e01bc0?, 0x1c43080?})
Aug 27 21:21:32 vtgate01 vtgate: runtime/panic.go:992 +0x71 fp=0xc00048f3d8 sp=0xc00048f3a8 pc=0x436311
Aug 27 21:21:32 vtgate01 vtgate: runtime.mapassign_faststr(0x1b2ae20, 0xc011981d70, {0xc026f1e5d0, 0x18})
Aug 27 21:21:32 vtgate01 vtgate: runtime/map_faststr.go:295 +0x38b fp=0xc00048f440 sp=0xc00048f3d8 pc=0x41346b
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vtgate/engine.(*Join).TryStreamExecute.func1(0xc00e2a6b60)
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vtgate/engine/join.go:137 +0x156 fp=0xc00048f670 sp=0xc00048f440 pc=0xe15516
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vtgate/engine.(*Route).TryStreamExecute.func1(0x3641ef8?)
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vtgate/engine/route.go:295 +0x143 fp=0xc00048f708 sp=0xc00048f670 pc=0xe288c3
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice.(*wrappedService).StreamExecute.func1.1(0xc02dc86500?)
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice/wrapped.go:200 +0x27 fp=0xc00048f720 sp=0xc00048f708 pc=0xd78227
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vttablet/grpctabletconn.(*gRPCQueryClient).StreamExecute(0x4736af?, {0x2377c88?, 0xc017f601b0?}, 0x451a09?, {0xc0019b2c00, 0x3e8}, 0xc001667900?, 0x1d7d640?, 0xc03901b6c0?, 0xc018dd2c80, ...)
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vttablet/grpctabletconn/conn.go:173 +0x1f2 fp=0xc00048f800 sp=0xc00048f720 pc=0x1103252
Aug 27 21:21:32 vtgate01 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice.(*wrappedService).StreamExecute.func1({0x2377c88, 0xc017f601b0}, 0xc011e7c670?, {0x2389648, `
@mailankitkm mailankitkm added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Sep 1, 2023
@systay
Copy link
Collaborator

systay commented Sep 5, 2023

Hi @mailankitkm!

We only support Vitess versions 15, 16 and 17 at the moment. Would you be able to try this on a more recent version?

@GuptaManan100 GuptaManan100 added Component: Query Serving and removed Needs Triage This issue needs to be correctly labelled and triaged labels Sep 6, 2023
@mailankitkm
Copy link
Author

mailankitkm commented Sep 18, 2023

Hi Team, as suggested in we upgraded vitess version to 15.0.2-a914f40 but still getting the same issue.

Sep 17 13:47:04 vtgate04 vtgate: fatal error: concurrent map writes
Sep 17 13:47:04 vtgate04 vtgate: goroutine 8726178 [running]:
Sep 17 13:47:04 vtgate04 vtgate: runtime.throw({0x1e7f49d?, 0x1cb97a0?})
Sep 17 13:47:04 vtgate04 vtgate: runtime/panic.go:992 +0x71 fp=0xc001d77368 sp=0xc001d77338 pc=0x436291
Sep 17 13:47:04 vtgate04 vtgate: runtime.mapassign_faststr(0x1b9b040, 0xc007a4e120, {0xc0151b1188, 0x18})
Sep 17 13:47:04 vtgate04 vtgate: runtime/map_faststr.go:295 +0x38b fp=0xc001d773d0 sp=0xc001d77368 pc=0x41340b
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/engine.(*Join).TryStreamExecute.func1(0xc001d3e460)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/engine/join.go:138 +0x172 fp=0xc001d77620 sp=0xc001d773d0 pc=0xe6c912
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/engine.(*Route).streamExecuteShards.func1(0x372aef0?)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/engine/route.go:337 +0x143 fp=0xc001d776b8 sp=0xc001d77620 pc=0xe81d63
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice.(*wrappedService).StreamExecute.func1.1(0xc01312b600?)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice/wrapped.go:198 +0x27 fp=0xc001d776d0 sp=0xc001d776b8 pc=0xdc2227
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vttablet/grpctabletconn.(*gRPCQueryClient).StreamExecute(0x47376f?, {0x24029b8?, 0xc011cd8510?}, 0x4519a9?, {0xc002d24400, 0x3e8}, 0xc001768900?, 0x1df8800?, 0xc0014be5b0?, 0xc0051744b0, ...)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vttablet/grpctabletconn/conn.go:190 +0x1f2 fp=0xc001d777b0 sp=0xc001d776d0 pc=0x116f172
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice.(*wrappedService).StreamExecute.func1({0x24029b8, 0xc011cd8510}, 0xc006ab4850?, {0x2414828, 0xc0002a3140})
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice/wrapped.go:196 +0x162 fp=0xc001d77870 sp=0xc001d777b0 pc=0xdc2142
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate.(*TabletGateway).withRetry(0xc0001de070, {0x24029b8, 0xc011cd8510}, 0xc00902fa40, {0xc001768ac0?, 0x40d7c7?}, {0x40?, 0x1d4c940?}, 0x0, 0xc00a75d680)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/tabletgateway.go:344 +0x464 fp=0xc001d77a70 sp=0xc001d77870 pc=0xf65424
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate.(*TabletGateway).withRetry-fm({0x24029b8?, 0xc011cd8510?}, 0xc001768b50?, {0x0?, 0x0?}, {0x1e6ed44?, 0xc002d2c0f0?}, 0xd0?, 0x24029b8?)
Sep 17 13:47:04 vtgate04 vtgate: <autogenerated>:1 +0x70 fp=0xc001d77ad0 sp=0xc001d77a70 pc=0xf812d0
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice.(*wrappedService).StreamExecute(0xc00000c2d0, {0x24029b8, 0xc011cd8510}, 0xf5e4f2?, {0xc002d24400, 0x3e8}, 0xc011cd8540, 0x0, 0x0, 0xc0051744b0, ...)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vttablet/queryservice/wrapped.go:194 +0x16c fp=0xc001d77b38 sp=0xc001d77ad0 pc=0xdc1f6c
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate.(*TabletGateway).StreamExecute(0xc0116ff620?, {0x24029b8?, 0xc011cd8510?}, 0xc4?, {0xc002d24400?, 0x9733?}, 0xc00d63fc38?, 0x417473?, 0x2000?, 0xc0051744b0, ...)
Sep 17 13:47:04 vtgate04 vtgate: <autogenerated>:1 +0x64 fp=0xc001d77ba0 sp=0xc001d77b38 pc=0xf7f384
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate.(*ScatterConn).StreamExecuteMulti.func1(0xc0144f3470, 0x2, 0xc007b758e0)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/scatter_conn.go:411 +0x2ea fp=0xc001d77e90 sp=0xc001d77ba0 pc=0xf5ed6a
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate.(*ScatterConn).multiGoTransaction.func1(0xc0144f3470, 0x2402910?)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/scatter_conn.go:643 +0x1b2 fp=0xc001d77f80 sp=0xc001d77e90 pc=0xf616d2
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate.(*ScatterConn).multiGoTransaction.func2(0xc0005bef60?, 0x0?)
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/scatter_conn.go:671 +0x5b fp=0xc001d77fc0 sp=0xc001d77f80 pc=0xf6145b
Sep 17 13:47:04 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate.(*ScatterConn).multiGoTransaction.func3()
Sep 17 13:47:05 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/scatter_conn.go:672 +0x2e fp=0xc001d77fe0 sp=0xc001d77fc0 pc=0xf613ce
Sep 17 13:47:05 vtgate04 vtgate: runtime.goexit()
Sep 17 13:47:05 vtgate04 vtgate: runtime/asm_amd64.s:1571 +0x1 fp=0xc001d77fe8 sp=0xc001d77fe0 pc=0x469301
Sep 17 13:47:05 vtgate04 vtgate: created by vitess.io/vitess/go/vt/vtgate.(*ScatterConn).multiGoTransaction
Sep 17 13:47:05 vtgate04 vtgate: vitess.io/vitess/go/vt/vtgate/scatter_conn.go:669 +0x215

@harshit-gangal
Copy link
Member

Can you add more details like VSchema and Query?

@mailankitkm
Copy link
Author

it is there in the initial comment. #13908 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants