Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support sql digest #32

Merged
merged 19 commits into from
Feb 13, 2019
Merged

support sql digest #32

merged 19 commits into from
Feb 13, 2019

Conversation

lysu
Copy link
Collaborator

@lysu lysu commented Nov 8, 2018

What problem does this PR solve?

Now, we are hard to identity a group of similar SQLs, but in real product most SQLs execute by program is some similar SQLs that only different in parameter part(yes they maybe not use prepare too).

This PR want to generate a digest for sql group, so later we can do some useful sth base on it.

  • online add some hint without modify user's code.
  • add some sql into blacklist(e.g. some sql are too slow, but client continue send to tidb, we need protect other vital sql's execution, so add them to blacklist)
  • help user do prepare in slient..this is complex and current code is far away from that.. - -

What is changed and how it works?

  • use lex to replace intLit, stringLit and so on...
  • sha256 result

Check List

Tests

  • Unit test

Code changes

  • Add exported method

This change is Reviewable

@lysu lysu added the DNM label Nov 8, 2018
@kennytm

This comment has been minimized.

@morgo
Copy link
Contributor

morgo commented Nov 8, 2018

could we not use MD5 if not for backward compatibility? 😂

Yes, I think it is important for compatibility. Although I will clarify that I think it should be a non-requirement to have matching digests to MySQL. It sounds just too difficult.

@morgo
Copy link
Contributor

morgo commented Nov 20, 2018

Two functions to be aware of:

Although they are from 8.0, you might find adding them useful for writing tests.

@kennytm
Copy link
Contributor

kennytm commented Nov 20, 2018

Interesting. On MySQL 8.0 the hash function is changed from MD5 to SHA-256, but the output definitely isn't the SHA-256 of the normalized string.

mysql> select statement_digest_text('Select 1');
+-----------------------------------+
| statement_digest_text('Select 1') |
+-----------------------------------+
| SELECT ?                          |
+-----------------------------------+
1 row in set (0.00 sec)

mysql> select statement_digest('Select 1');
+------------------------------------------------------------------+
| statement_digest('Select 1')                                     |
+------------------------------------------------------------------+
| d1b44b0c19af710b5a679907e284acd2ddc285201794bc69a2389d77baedddae |
+------------------------------------------------------------------+
1 row in set (0.00 sec)
>>> hashlib.sha256(b'SELECT ?').hexdigest()
'66cbb3a40d4bbd150b75825ad291a6545399f3098fc1079e4d8b5bb061a6a481'

Apparently MySQL hashes the parsed token stream directly (we could do the same here 😉), meaning we're never going to match the digest as discussed before.

@lysu
Copy link
Collaborator Author

lysu commented Nov 20, 2018

@gregwebs @kennytm thanks, I will take a look~

@morgo
Copy link
Contributor

morgo commented Nov 20, 2018

@kennytm wow, looks like it did change to sha256. The second sentence in these manual pages is different:
https://dev.mysql.com/doc/refman/8.0/en/performance-schema-statement-digests.html
https://dev.mysql.com/doc/refman/5.7/en/performance-schema-statement-digests.html

@lysu lysu changed the title [WIP] support sql digest support sql digest Jan 14, 2019
normalize.go Outdated Show resolved Hide resolved
normalize_test.go Outdated Show resolved Hide resolved
normalize_test.go Outdated Show resolved Hide resolved
@alivxxx alivxxx removed the DNM label Jan 14, 2019
Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

normalize.go Outdated Show resolved Hide resolved
normalize.go Outdated Show resolved Hide resolved
normalize.go Outdated Show resolved Hide resolved
normalize.go Outdated
// Digest generates a digest(or sql-id) for a SQL.
// the purpose of digest is to identity a group of similar SQLs, then we can do other logic base on it.
func Digest(sql string) string {
d := digesterPool.Get().(*digester)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

d := digesterPool.Get().(*digester)
defer func() {
	d.buffer.Reset()
	digesterPool.Put(d)
}() 

and remove line 48~49?

Copy link
Collaborator Author

@lysu lysu Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should better don't use defer as we can in hot-code path and I have do some refactor PTAL :D
https://medium.com/i0exception/runtime-overhead-of-using-defer-in-go-7140d5c40e32

normalize.go Outdated Show resolved Hide resolved
@lysu
Copy link
Collaborator Author

lysu commented Jan 16, 2019

@zz-jason @XuHuaiyu PTAL PTAL thx~

@lysu
Copy link
Collaborator Author

lysu commented Jan 18, 2019

@zz-jason @XuHuaiyu PTAL~

digester.go Show resolved Hide resolved
digester_test.go Outdated Show resolved Hide resolved
@lysu lysu removed the status/LGT1 LGT1 label Jan 21, 2019
@lysu
Copy link
Collaborator Author

lysu commented Jan 21, 2019

@lamxTyler @zz-jason @XuHuaiyu PTAL~

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need more comments for this commit, which will make this PR easier to be understood.

digester.go Outdated
}

func (d *sqlDigester) isPrefixByUnary(currTok int) (isUnary bool) {
if !isNumLit(currTok) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that isNumLit(-1) will return false. So isPrefixByUnary(-1) return false?

SignedLiteral:
	Literal
	{
		$$ = ast.NewValueExpr($1)
	}
|	'+' NumLiteral
	{
		$$ = &ast.UnaryOperationExpr{Op: opcode.Plus, V: ast.NewValueExpr($2)}
	}
|	'-' NumLiteral
	{
		$$ = &ast.UnaryOperationExpr{Op: opcode.Minus, V: ast.NewValueExpr($2)}
	}

NumLiteral:
	intLit
|	floatLit
|	decLit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XuHuaiyu No, this PR only use lexer without parser, so alway see token stream like [1] or [-, 1] or [+, 1]

digester.go Show resolved Hide resolved
digester.go Outdated
}

const (
genericSymbol = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add comments for these 2 constants.

digester.go Show resolved Hide resolved
digester.go Outdated Show resolved Hide resolved
digester.go Outdated Show resolved Hide resolved
digester.go Show resolved Hide resolved
lexer.go Outdated Show resolved Hide resolved
digester.go Outdated Show resolved Hide resolved
digester.go Outdated Show resolved Hide resolved
digester.go Outdated Show resolved Hide resolved
digester.go Outdated Show resolved Hide resolved
digester.go Show resolved Hide resolved
digester.go Outdated Show resolved Hide resolved
digester.go Outdated Show resolved Hide resolved
digester.go Show resolved Hide resolved
@lysu
Copy link
Collaborator Author

lysu commented Feb 12, 2019

@zz-jason @XuHuaiyu PTAL

@lysu
Copy link
Collaborator Author

lysu commented Feb 13, 2019

@zz-jason @XuHuaiyu @eurekaka PTAL

Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added the status/LGT1 LGT1 label Feb 13, 2019
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lysu lysu merged commit fa42b82 into pingcap:master Feb 13, 2019
tiancaiamao pushed a commit to tiancaiamao/parser that referenced this pull request Apr 27, 2021
lyonzhi pushed a commit to lyonzhi/parser that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants