Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue in verification related queries #2278

Closed
4 of 6 tasks
xuwang-wish opened this issue Mar 2, 2022 · 2 comments
Closed
4 of 6 tasks

Performance issue in verification related queries #2278

xuwang-wish opened this issue Mar 2, 2022 · 2 comments
Assignees
Labels
bug Something is not working.

Comments

@xuwang-wish
Copy link

xuwang-wish commented Mar 2, 2022

Preflight checklist

Describe the bug

Description

We are planning to do a load test on kratos and the first step for that is to generate 4.1million test accounts. However once those accounts are generated, we found that kratos endpoint sometimes can not serve verification requests and shows time out.

Checking the query on database side, we identified two slow queries:

SELECT identity_recovery_addresses.created_at, identity_recovery_addresses.id, identity_recovery_addresses.identity_id, identity_recovery_addresses.nid, identity_recovery_addresses.updated_at, identity_recovery_addresses.value, identity_recovery_addresses.via FROM identity_recovery_addresses AS identity_recovery_addresses WHERE nid = 'b3990cd5-a387-4f7c-860e-ac3c5730bb6f' AND via = 'email' AND LOWER(value) = '[email protected]' LIMIT 1;

SELECT identity_verifiable_addresses.created_at, identity_verifiable_addresses.id, identity_verifiable_addresses.identity_id, identity_verifiable_addresses.nid, identity_verifiable_addresses.status, identity_verifiable_addresses.updated_at, identity_verifiable_addresses.value, identity_verifiable_addresses.verified, identity_verifiable_addresses.verified_at, identity_verifiable_addresses.via FROM identity_verifiable_addresses AS identity_verifiable_addresses WHERE nid = 'b3990cd5-a387-4f7c-860e-ac3c5730bb6f' AND via = 'email' AND LOWER(value) = '[email protected]' LIMIT 1;

For these two queries, the optimizer is not able to fully use the underlying index, and as a result they both took ~40s to finish. You can check more information from explain.

Why the index is not fully used

It's because there's a lower call in the where condition

What is the fix

My proposal is to remove the lower call. This shouldn't affect the business logic since the column for value has a case insensitive collation: utf8mb4_0900_ai_ci. Even without lower call, the internal comparison on the value will be case insensitive.

Examples:

mysql> show create table t1 \G;
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `name` varchar(400) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
  UNIQUE KEY `my_index` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.05 sec)

mysql> insert into t1 values ("abc");
Query OK, 1 row affected (0.05 sec)

mysql> select * from t1 where name = "abc";
+------+
| name |
+------+
| abc  |
+------+
1 row in set (0.06 sec)

The following query also works because the collation ends in ci

mysql> select * from t1 where name = "abC";
+------+
| name |
+------+
| abc  |
+------+
1 row in set (0.05 sec)

mysql> select * from t1 where name = "abCD";
Empty set (0.04 sec)

Now modify the column to use a case sensitive collation

mysql> alter table t1 modify name varchar(400) CHARACTER SET utf8mb4 collate utf8mb4_es_0900_as_cs;
Query OK, 1 row affected (0.13 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> show create table t1 \G;
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `name` varchar(400) CHARACTER SET utf8mb4 COLLATE utf8mb4_es_0900_as_cs DEFAULT NULL,
  UNIQUE KEY `my_index` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.04 sec)

mysql> select * from t1 where name = "ABC";
Empty set (0.04 sec)

mysql> select * from t1 where name = "abc";
+------+
| name |
+------+
| abc  |
+------+
1 row in set (0.03 sec)

Reproducing the bug

As explained above.

Relevant log output

No response

Relevant configuration

No response

Version

0.8.2

On which operating system are you observing this issue?

Linux

In which environment are you deploying?

Kubernetes

Additional Context

No response

@xuwang-wish xuwang-wish added the bug Something is not working. label Mar 2, 2022
@aeneasr
Copy link
Member

aeneasr commented Mar 3, 2022

Thank you for the report! I checked in our code base and we currently do not normalize the email address in the code itself. My proposal would be to add this transformation in the Go code here

address := NewVerifiableEmailAddress(fmt.Sprintf("%s", value), r.i.ID)

address := NewRecoveryEmailAddress(fmt.Sprintf("%s", value), r.i.ID)

add a migraiton along the lines of

UPDATE identity_recovery_addresses SET value = LOWER(value)
UPDATE identity_verification_addresses SET value = LOWER(value)

and then remove the LOWER() SQL calls.

This would then be a solution all databases!

By the way, would it possible for you to publish the benchmark you have done? It would be awesome to have some performance testing in Ory Kratos available. The community would for sure love it!!

@xuwang-wish
Copy link
Author

This would then be a solution all databases!

That sounds a reasonable approach. Not all databases have the same default behavior.

For the benchmark, we are still in preparing phase actually, so no data yet. Once this is completed, we will see if we can publish it somewhere.

Thx.

@aeneasr aeneasr self-assigned this Mar 4, 2022
@aeneasr aeneasr closed this as completed Mar 4, 2022
aeneasr added a commit that referenced this issue Mar 4, 2022
aeneasr added a commit that referenced this issue Mar 6, 2022
aeneasr added a commit that referenced this issue Mar 6, 2022
aeneasr added a commit that referenced this issue Mar 7, 2022
peturgeorgievv pushed a commit to senteca/kratos-fork that referenced this issue Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working.
Projects
None yet
Development

No branches or pull requests

2 participants