Skip to content

Commit

Permalink
feat: add StatementStatistics for slow query logging implementation (
Browse files Browse the repository at this point in the history
…#4719)

* feat: log slow query

* feat: log slow query for sql

* refactor: add slow query logging options

* ci: fix errors

* feat: add StatementStatistics

* chore: revert modification of servers crate

* docs: update config docs

* fix: clippy errors
  • Loading branch information
zyy17 authored Sep 30, 2024
1 parent 77af4fd commit e39a9e6
Show file tree
Hide file tree
Showing 21 changed files with 282 additions and 2 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions config/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,10 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
Expand Down Expand Up @@ -249,6 +253,10 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
Expand Down Expand Up @@ -314,6 +322,10 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
Expand Down Expand Up @@ -466,6 +478,10 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
Expand Down Expand Up @@ -512,5 +528,9 @@
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
13 changes: 13 additions & 0 deletions config/datanode.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
Expand Down
13 changes: 13 additions & 0 deletions config/flownode.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
## The tokio console address.
Expand Down
13 changes: 13 additions & 0 deletions config/frontend.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
Expand Down
13 changes: 13 additions & 0 deletions config/metasrv.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
Expand Down
13 changes: 13 additions & 0 deletions config/standalone.example.toml
Original file line number Diff line number Diff line change
Expand Up @@ -630,6 +630,19 @@ log_format = "text"
[logging.tracing_sample_ratio]
default_ratio = 1.0

## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false

## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"

## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0

## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
Expand Down
2 changes: 2 additions & 0 deletions src/cmd/src/frontend.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ use frontend::instance::builder::FrontendBuilder;
use frontend::instance::{FrontendInstance, Instance as FeInstance};
use frontend::server::Services;
use meta_client::{MetaClientOptions, MetaClientType};
use query::stats::StatementStatistics;
use servers::tls::{TlsMode, TlsOption};
use servers::Mode;
use snafu::{OptionExt, ResultExt};
Expand Down Expand Up @@ -352,6 +353,7 @@ impl StartCommand {
catalog_manager,
Arc::new(client),
meta_client,
StatementStatistics::new(opts.logging.slow_query.clone()),
)
.with_plugin(plugins.clone())
.with_local_cache_invalidator(layered_cache_registry)
Expand Down
2 changes: 2 additions & 0 deletions src/cmd/src/standalone.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ use frontend::service_config::{
};
use meta_srv::metasrv::{FLOW_ID_SEQ, TABLE_ID_SEQ};
use mito2::config::MitoConfig;
use query::stats::StatementStatistics;
use serde::{Deserialize, Serialize};
use servers::export_metrics::ExportMetricsOption;
use servers::grpc::GrpcOptions;
Expand Down Expand Up @@ -557,6 +558,7 @@ impl StartCommand {
catalog_manager.clone(),
node_manager.clone(),
ddl_task_executor.clone(),
StatementStatistics::new(opts.logging.slow_query.clone()),
)
.with_plugin(plugins.clone())
.try_build()
Expand Down
1 change: 1 addition & 0 deletions src/common/telemetry/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ backtrace = "0.3"
common-error.workspace = true
console-subscriber = { version = "0.1", optional = true }
greptime-proto.workspace = true
humantime-serde.workspace = true
lazy_static.workspace = true
once_cell.workspace = true
opentelemetry = { version = "0.21.0", default-features = false, features = [
Expand Down
62 changes: 60 additions & 2 deletions src/common/telemetry/src/logging.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
//! logging stuffs, inspired by databend
use std::env;
use std::sync::{Arc, Mutex, Once};
use std::time::Duration;

use once_cell::sync::{Lazy, OnceCell};
use opentelemetry::{global, KeyValue};
Expand All @@ -26,7 +27,7 @@ use serde::{Deserialize, Serialize};
use tracing_appender::non_blocking::WorkerGuard;
use tracing_appender::rolling::{RollingFileAppender, Rotation};
use tracing_log::LogTracer;
use tracing_subscriber::filter::Targets;
use tracing_subscriber::filter::{FilterFn, Targets};
use tracing_subscriber::fmt::Layer;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::prelude::*;
Expand Down Expand Up @@ -64,6 +65,24 @@ pub struct LoggingOptions {

/// The tracing sample ratio.
pub tracing_sample_ratio: Option<TracingSampleOptions>,

/// The logging options of slow query.
pub slow_query: SlowQueryOptions,
}

/// The options of slow query.
#[derive(Clone, Debug, Serialize, Deserialize, Default)]
#[serde(default)]
pub struct SlowQueryOptions {
/// Whether to enable slow query log.
pub enable: bool,

/// The threshold of slow queries.
#[serde(with = "humantime_serde")]
pub threshold: Option<Duration>,

/// The sample ratio of slow queries.
pub sample_ratio: Option<f64>,
}

#[derive(Clone, Debug, Copy, PartialEq, Eq, Serialize, Deserialize)]
Expand Down Expand Up @@ -96,6 +115,7 @@ impl Default for LoggingOptions {
otlp_endpoint: None,
tracing_sample_ratio: None,
append_stdout: true,
slow_query: SlowQueryOptions::default(),
}
}
}
Expand Down Expand Up @@ -235,6 +255,42 @@ pub fn init_global_logging(
None
};

let slow_query_logging_layer = if !opts.dir.is_empty() && opts.slow_query.enable {
let rolling_appender =
RollingFileAppender::new(Rotation::HOURLY, &opts.dir, "greptimedb-slow-queries");
let (writer, guard) = tracing_appender::non_blocking(rolling_appender);
guards.push(guard);

// Only logs if the field contains "slow".
let slow_query_filter = FilterFn::new(|metadata| {
metadata
.fields()
.iter()
.any(|field| field.name().contains("slow"))
});

if opts.log_format == LogFormat::Json {
Some(
Layer::new()
.json()
.with_writer(writer)
.with_ansi(false)
.with_filter(slow_query_filter)
.boxed(),
)
} else {
Some(
Layer::new()
.with_writer(writer)
.with_ansi(false)
.with_filter(slow_query_filter)
.boxed(),
)
}
} else {
None
};

// resolve log level settings from:
// - options from command line or config files
// - environment variable: RUST_LOG
Expand Down Expand Up @@ -279,6 +335,7 @@ pub fn init_global_logging(
.with(stdout_logging_layer)
.with(file_logging_layer)
.with(err_file_logging_layer)
.with(slow_query_logging_layer)
};

// consume the `tracing_opts` to avoid "unused" warnings.
Expand All @@ -289,7 +346,8 @@ pub fn init_global_logging(
.with(dyn_filter)
.with(stdout_logging_layer)
.with(file_logging_layer)
.with(err_file_logging_layer);
.with(err_file_logging_layer)
.with(slow_query_logging_layer);

if opts.enable_otlp_tracing {
global::set_text_map_propagator(TraceContextPropagator::new());
Expand Down
11 changes: 11 additions & 0 deletions src/common/telemetry/src/macros.rs
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,17 @@ macro_rules! trace {
};
}

#[macro_export]
macro_rules! slow {
(target: $target:expr, $($arg:tt)+) => {
$crate::log!(target: $target, slow = true, $crate::tracing::Level::INFO, $($arg)+)
};

($($arg:tt)+) => {
$crate::log!($crate::tracing::Level::INFO, slow = true, $($arg)+)
};
}

#[cfg(test)]
mod tests {
use common_error::mock::MockError;
Expand Down
2 changes: 2 additions & 0 deletions src/flow/src/server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ use operator::delete::Deleter;
use operator::insert::Inserter;
use operator::statement::StatementExecutor;
use partition::manager::PartitionRuleManager;
use query::stats::StatementStatistics;
use query::{QueryEngine, QueryEngineFactory};
use servers::error::{AlreadyStartedSnafu, StartGrpcSnafu, TcpBindSnafu, TcpIncomingSnafu};
use servers::server::Server;
Expand Down Expand Up @@ -475,6 +476,7 @@ impl FrontendInvoker {
layered_cache_registry.clone(),
inserter.clone(),
table_route_cache,
StatementStatistics::default(),
));

let invoker = FrontendInvoker::new(inserter, deleter, statement_executor);
Expand Down
5 changes: 5 additions & 0 deletions src/frontend/src/instance/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ use operator::statement::{StatementExecutor, StatementExecutorRef};
use operator::table::TableMutationOperator;
use partition::manager::PartitionRuleManager;
use pipeline::pipeline_operator::PipelineOperator;
use query::stats::StatementStatistics;
use query::QueryEngineFactory;
use servers::server::ServerHandlers;
use snafu::OptionExt;
Expand All @@ -55,6 +56,7 @@ pub struct FrontendBuilder {
plugins: Option<Plugins>,
procedure_executor: ProcedureExecutorRef,
heartbeat_task: Option<HeartbeatTask>,
stats: StatementStatistics,
}

impl FrontendBuilder {
Expand All @@ -65,6 +67,7 @@ impl FrontendBuilder {
catalog_manager: CatalogManagerRef,
node_manager: NodeManagerRef,
procedure_executor: ProcedureExecutorRef,
stats: StatementStatistics,
) -> Self {
Self {
options,
Expand All @@ -76,6 +79,7 @@ impl FrontendBuilder {
plugins: None,
procedure_executor,
heartbeat_task: None,
stats,
}
}

Expand Down Expand Up @@ -181,6 +185,7 @@ impl FrontendBuilder {
local_cache_invalidator,
inserter.clone(),
table_route_cache,
self.stats,
));

let pipeline_operator = Arc::new(PipelineOperator::new(
Expand Down
Loading

0 comments on commit e39a9e6

Please sign in to comment.