Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: VDiff does not handle multiple cells properly when picking tablets #14100

Closed
mattlord opened this issue Sep 27, 2023 · 0 comments · Fixed by #14099
Closed

Bug Report: VDiff does not handle multiple cells properly when picking tablets #14100

mattlord opened this issue Sep 27, 2023 · 0 comments · Fixed by #14099

Comments

@mattlord
Copy link
Contributor

mattlord commented Sep 27, 2023

Overview of the Issue

When selecting source and target tablets to use for getting the table data to compare on both sides, VDiff should be passing the list of cells that have been specified for the source and target to the TabletPicker (VDiff uses all cells if none are specified).

The problem is that the vdiff record stores the cell list as a comma delimited string, and VDiff was passing that string on to the TabletPicker as a single value. So it was e.g. passing zone1,zone2,zone3 as if it was a single zone name.

Reproduction Steps

git checkout main
make build
pushd examples/local

source ../common/env.sh

CELL=zone1 ../common/scripts/etcd-up.sh

CELL=zone1 ../common/scripts/vtctld-up.sh

for i in 100 101 102; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=-80 CELL=zone1 KEYSPACE=commerce TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

for i in 200 201 202; do
        CELL=zone1 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=80- CELL=zone1 KEYSPACE=commerce TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

vtctldclient InitShardPrimary --force commerce/-80 zone1-100
vtctldclient InitShardPrimary --force commerce/80- zone1-200

vtctldclient ApplyVSchema --vschema='{"sharded":true,"vindexes":{"hash":{"type":"hash"}},"tables":{"customer":{"column_vindexes":[{"column":"customer_id","name":"hash"}]}}}' commerce
vtctldclient ApplySchema --sql='create table customer(customer_id bigint not null auto_increment, email varbinary(128), primary key(customer_id))' commerce

vtctldclient AddCellInfo --root /vitess/zone0 --server-address "${ETCD_SERVER}" zone0
vtctldclient AddCellInfo --root /vitess/zone2 --server-address "${ETCD_SERVER}" zone2
vtctldclient AddCellInfo --root /vitess/zone3 --server-address "${ETCD_SERVER}" zone3

for i in 300 301 302; do
        CELL=zone2 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=-80 CELL=zone2 KEYSPACE=customer TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

for i in 400 401 402; do
        CELL=zone2 TABLET_UID=$i ../common/scripts/mysqlctl-up.sh
        SHARD=80- CELL=zone2 KEYSPACE=customer TABLET_UID=$i ../common/scripts/vttablet-up.sh
done

vtctldclient InitShardPrimary --force customer/-80 zone2-300
vtctldclient InitShardPrimary --force customer/80- zone2-400

CELL=zone1 ../common/scripts/vtgate-up.sh

echo "insert into customer(customer_id, email) values (1, '[email protected]'), (2, '[email protected]'), (3, '[email protected]'), (4, '[email protected]'), (5, '[email protected]');" | mysql

vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer create --source-keyspace commerce --tables customer --cells zone1 --cells zone2 --cells zone3

sleep 10

vtctldclient vdiff --target-keyspace customer --workflow commerce2customer --format=json create

command mysql -u root --socket=${VTDATAROOT}/vt_0000000300/mysql.sock --binary-as-hex=false -e "select * from _vt.vdiff\G"

vtctldclient vdiff --target-keyspace customer --workflow commerce2customer --format=json show last

sleep 10

vtctldclient vdiff --target-keyspace customer --workflow commerce2customer --format=json show last

sleep 10

grep -R "No healthy serving tablet found for streaming" ${VTDATAROOT}/tmp/*

grep -R "Unable to resolve cell" ${VTDATAROOT}/tmp/*

./401_teardown.sh
popd

The end results will look like this:

❯ vtctldclient vdiff --target-keyspace customer --workflow commerce2customer --format=json show last
{
  "Workflow": "commerce2customer",
  "Keyspace": "customer",
  "State": "started",
  "UUID": "5c7af381-8edf-4acc-8ce1-bc2b3f68eb9e",
  "RowsCompared": 0,
  "HasMismatch": false,
  "Shards": "-80,80-",
  "StartedAt": "2023-09-27 12:52:57",
  "Progress": {
    "Percentage": 0
  }
}

❯ grep -R "No healthy serving tablet found for streaming" ${VTDATAROOT}/tmp/*
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20230927-112624.90736:I0927 11:26:54.296284   90736 tablet_picker.go:330] No healthy serving tablet found for streaming, shard commerce.-80, cells [zone0,zone1,zone2,zone3 zone2], tabletTypes [RDONLY REPLICA PRIMARY], sleeping for 30.000 seconds.

❯ grep -R "Unable to resolve cell" ${VTDATAROOT}/tmp/*
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20230927-133113.58346:I0927 13:31:49.276897   58346 tablet_picker.go:380] Unable to resolve cell zone0,zone1,zone2,zone3, ignoring
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20230927-133113.58346:I0927 13:31:49.294060   58346 tablet_picker.go:380] Unable to resolve cell zone0,zone1,zone2,zone3, ignoring

As you can see the first zone that the tablet picker is using is zone0,zone1,zone2,zone3 and the second is the target tablet's local cell of zone2 (because we now default to local cell preference). This is... obviously wrong 🙂

Binary Version

vtgate version Version: 18.0.0-SNAPSHOT (Git revision 1c2d8a16b7f4d7ae914997492633089b022263ef branch 'vdiff_tablet_picker') built on Wed Sep 27 12:53:49 EDT 2023 by [email protected] using go1.21.1 darwin/arm64

Operating System and Environment details

N/A

Log Fragments

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant