Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log queries in JSON format #871

Closed
renecannao opened this issue Jan 7, 2017 · 20 comments
Closed

Log queries in JSON format #871

renecannao opened this issue Jan 7, 2017 · 20 comments

Comments

@renecannao
Copy link
Contributor

renecannao commented Jan 7, 2017

Query logging is already possible, but queries are logged in binary format.
Logging queries in JSON format will allow to save them into Elasticsearch, Kabana, and alike.

See http://woodygsd.blogspot.com.es/2014/07/how-do-you-log-problem-like-mysql.html for ideas

@renecannao
Copy link
Contributor Author

Cross reference to #561

@bitonp
Copy link

bitonp commented Jan 7, 2017

Something along these lines, which I used in the mysql_proxy lua version I have:

=============== Format ===============
"@timestamp":{"type":"UTCdate"}, # Timestamp the query was received form the client
"servertime":{"type":"date"}, # Timestamp on the server (could be different)
"Server":{"type":"string"}, # Database backend server .. IP or a server name... IP is safer
"client_connections":{"type":"long"}, # Number of client connections on the backend DB server 
"User":{"type":"string"}, # Database User
"proxyName":{"type":"string"}, # Proxy IP or server name
"server_version":{"type":"string"}, # DB version
"Client":{"type":"string"}, # Client IP address (Client that issued the query)
"Thread":{"type":"string"}, # DB server thread (vital...a client had 150 queries per donation... that is horrible :-) )
"QueryLen":{"type":"timestamp"}, # Actual Length of the query string (see explanation below)
"Query":{"type":"string"}, # Query (max 1k + last 30-50 bytes with a '...' between them)
"QueryType":{"type":"long"}, # MySql query type (49 => connection , 3 => query etc etc )
"timeSent":{"type":"double"}, # Timestamp proxy sent to the server              
"queryTime":{"type":"double"}, # Execute time as returned by database
"responseTime":{"type":"double"}, # response time as reported by database to get the first line of the response
"timeReceived":{"type":"double"}, # Timestamp query received by the proxy
"lockoutTime":{"type":"double"}, # Time query was locked out (see explanation below.. its a 'cheat' )
 "current":{"type":"long"} # Number of users the proxy has connected (can be different to the processlist)
=================== End Format ==================

There may well be some other interesting values we can pick up from the DB, without causing any extra work

@bdbell
Copy link

bdbell commented Jan 13, 2017

I would be very interested in this feature, it would allow me to implement ProxySQL in at least one of my customers as they require full audit logging.

@mugenbb6
Copy link

@renecannao Any update on this? A bunch of us are very eager to see this in fruition.

@srikiraju
Copy link

+1

@1399689727
Copy link

eager to see this feature.

@rkennedy-zendesk
Copy link
Contributor

rkennedy-zendesk commented Nov 26, 2018

I have a preliminary prototype of this working on a fork using the Protocol Buffers v3 JSON mapping. We explored using Protocol Buffers (as opposed to just using a JSON library) thinking we might be able to simultaneously contribute a more programming language friendly binary serialization than what's currently available before realizing that proto2 had been added and subsequently removed quite some time ago (corresponding Google Group thread).

I'm curious to hear how the maintainers feel about these changes and whether (either as-is or with modifications) it could be an acceptable solution to this feature request.

@renecannao
Copy link
Contributor Author

@rkennedy-zendesk : I replied to your email thread (a bit late).

I like your prototype. This is far from a complete review, but I have some comments at this moment:

Alternatively to the two previous points:

  • each event should be prefixed by a header (even 1 byte header) that specifies the version of the format

Probably this solution allows more flexibility, for example it may be possible to have mixed logging based on the value of mysql_query_rules.log (certain events logged in a format, while other events logged in a different format)

@renecannao
Copy link
Contributor Author

Ryan, following your question on the mailing list, I think that swap the Protocol Buffers-based implementation for one built specifically for JSON output is the right way to go.

My comment above is still relevant tho, and I would appreciate your feedback on file header vs event header (or yet another solution).
Thanks

@rkennedy-zendesk
Copy link
Contributor

Ryan, following your question on the mailing list, I think that swap the Protocol Buffers-based implementation for one built specifically for JSON output is the right way to go.

I'm working on this right now. Do you have any preferred JSON libraries for C/C++? I've been looking at a few and some of them have some serious baggage in terms of their dependencies. I'm going to give json-c a try first to get something working.

My comment above is still relevant tho, and I would appreciate your feedback on file header vs event header (or yet another solution).

I'm curious how this would work in practice for things wanting to consume the log file. Would, for instance, fluentd be able to make sense of the file before sending it on its way? As currently coded the log file will be closed, rotated, and reopened whenever the filename or the file format changes. So if I keep things as they are a file format change would trigger a new file, so we can be relatively certain that each file will contain exactly one format (either binary or JSON).

We could consider optionally adding a file suffix based on the file format.

I'm curious to hear some feedback from the folks asking for the feature to know what their use case would be to know whether some sort of file/row header would be beneficial or burdensome. Our primary use case is going to be to have fluentd pack these logs off to AWS Kinesis so we can grind them through Firehose and query them on the other side via Athena.

@rkennedy-zendesk
Copy link
Contributor

I've rewritten the JSON support using json-c (fun fact, the ubuntu 16 package is ~7MB with json-c and ~11MB with protobuf). Changes can be seen here.

I've branched MySQL_Event::write_query() based on the new enum log_event_format event_format parameter. This keeps MySQL_Logger::log_request() mostly pristine. It does, however, mean that the json_object is created in the critical section. I could move it outside of the critical section, but that could mean adding JSON specific checks/code to MySQL_Logger::log_request(), which seemed like a bad tradeoff from a readability/maintenance point of view. I don't imagine what's happening there is any/much more costly than what's happening in MySQL_Event::write_binary_query(), which is also executed in the critical path, but it would be nice to have some performance regression testing to verify.

Regarding multiple file formats, my personal preference would be to indicate the format in the filename, if anywhere. I'm mostly concerned about logging pipelines (many of which I suspect will be off the shelf solutions) needing to handle this extra bit of information. As of right now the log file will be rolled over if the format changes, so each file should have only contain records of a single format. Additionally, the setting parameter only allows one format at a time, so there's no possibility of producing binary and json logs concurrently with this implementation. You know the back history and current use cases for the query logs better than I do, however. So let me know if there are considerations I'm not taking into account.

@zx1986
Copy link

zx1986 commented Jan 13, 2019

@rkennedy-zendesk Could I get your version to install and run for production?
Any recommend or tips need to know?

https://github.com/rkennedy-zendesk/proxysql/tree/v2.0.0-json-support

the v2.0.0-json-support branch, right?

@rkennedy-zendesk
Copy link
Contributor

rkennedy-zendesk commented Jan 14, 2019 via email

@luca0x333
Copy link

I checked CHANGELOG but it's not update, has the json format been added to 2.0.0-rc2?

did anyone find a way to run eventslog_reader_sample in following mode like tail?

@asdptkt
Copy link

asdptkt commented Apr 5, 2019

When will this feature be released? Any update?

@umnya
Copy link

umnya commented Jun 13, 2019

I really want this function(query logging in json), when does it apply new version?

renecannao added a commit that referenced this issue Jul 14, 2019
Enhancements:
- added metrics rows_affected and rows_sent
- added global variable mysql-eventslog_default_log : if 1 , logging is enabled for every query unless explicitly disabled in mysql_query_rules.log . Default is 0
- added global variable mysql-eventslog_format : default is 1 (legacy format). A value of 2 enables logging in JSON format. Issue #871
  Changing value at runtime causes the current file to be closed and a new one created
- fixed logging for prepared statements: till 2.0.5 only some percentage of prepared statements was correctly logged

Extended tables stats_mysql_query_digest and stats_mysql_query_digest_reset to also include sum_rows_affected and sum_rows_sent

Extended `eventslog_reader_sample.cpp` to support the new enhancements
@nilsmeyer
Copy link

This is extremely useful for audit logging as well, since as far as I can tell neither the MariaDB or the Percona Audit logging plugin or the general log log affected rows / sent rows. This can be quite helpful as a metric to warn / alert on.

@renecannao
Copy link
Contributor Author

Thanks for the feedback! :)
It is available in 2.0.6

@asdptkt
Copy link

asdptkt commented Jul 31, 2019

Incase you are working with Go, you can use our lib:

https://github.com/tiket-oss/go-pxld

@renecannao
Copy link
Contributor Author

Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests