Skip to content

Latest commit

 

History

History
234 lines (222 loc) · 27.4 KB

README_EN.md

File metadata and controls

234 lines (222 loc) · 27.4 KB

SLS Grafana Plugin Logo.png

简体中文 | English | 旧版README | Old Version README

1. Overview

Log service is a cloud-native observation and analysis platform that provides large-scale, low-cost, and real-time platform services for Log, Metric, and Trace data. One-stop data collection, processing, analysis, alarm visualization, and delivery are provided to improve the digitization capabilities of R & D, O & M, operations, and security scenarios. Official documentation
this repository is an Alibaba Cloud log service Grafana data source plug-in. Before using this plug-in, you must use log service products and have at least one LogStore configured for collection.

2. Installation

The dependency Grafana version 8.0 and later. Grafana version 8.0 and later, use version 1.0.
download from Release . Go to the grafana plug-in directory and modify the configuration file. [plugins] node, set allow_loading_unsigned_plugins = aliyun-log-service-datasource , and then restart the grafana.

  • mac
    • Plug-In Directory: /usr/local/var/lib/grafana/plugins
    • Configuration file location:/usr/local/etc/grafana/grafana.ini
    • Restart command:brew services restart grafana
  • YUM/RPM
    • Plug-In Directory:/var/lib/grafana/plugins
    • Configuration file location:/etc/grafana/grafana.ini
    • Restart command:systemctl restart grafana-server
  • .tar.gz
    • Plug-In Directory:{PATH_TO}/grafana-{x}.{x}.{x}/data/plugins
    • Configuration file location:{PATH_TO}/grafana-{x}.{x}.{x}/conf/defaults.ini
    • Restart command:./bin/grafana-server web

3. Add a data source

3.1 logstore configuration

on the datasource page, add an SLS log-service-datasource data source. In the data source management panel, add a LogService data source. In the settings panel, set the Endpoint project for your log service endpoint ( endpoint can see it on the overview page of the project. For more information, see service entry ). For example, if your project is in the qingdao region, enter the Url cn-qingdao.log.aliyuncs.com . Set the Project and logstore as needed, and set the AccessId and AccessKeySecret. It is best to configure the accesskey of the sub-account. To ensure data security, the AK is saved and cleared without Echo.

3.2 Time series database configuration (using SLS plug-in, supports SQL query and combination operator query)

the time series library can also be configured as an SLS plug-in. With this access method, you can query the time series Library by SQL or by using the PromQL operator. For more information, see: description of time series query syntax ). The configuration method is the same as that in Section 1.1. Enter the name of the LogStore in the metricStore.

3.3 Time series Library configuration (native mode, using Prometheus plug-in)

the SLS time series Library supports native Prometheus query, so you can directly use the native Prometheus plug-in to access the data source. Please refer to the following official documentation configure the data source. The URL of the log service MetricStore in the format **https://{project}. {sls-endpoint}/prometheus/{project}/{metricstore} **. Where _{sls-endpoint} _the Project of the region where the Endpoint is located. For more information, see service entry , _{project} _and _{metricstore} _replace the Project and Metricstore of the created Log service with the actual value. For example: https://sls-prometheus-test.cn-hangzhou.log.aliyuncs.com/prometheus/sls-prometheus-test/prometheus
From the perspective of best practices, you can add Prometheus data sources and SLS data sources to the time series database at the same time. You can use different statement query methods according to your personal habits. We recommend that you use SLS data sources in Variable (you can convert them to SLS dashboards)

4. Use

4.1 dashboard variables

4.1.1 Writing Grafana variables in query

in practice, the only recommended writing method is: ${var_name} .
Theoretically, Grafana supports three writing methods: $varname , ${var_name} , [[varname]] . However, if you do not add parentheses, the variable name range may be incorrectly identified, and the double-Middle plus sign will be gradually abandoned in the future. Reference source .

4.1.2 General usage and all-selected configuration

for most SLS DataSource, you can use SQL statements to query values as variables. Procedure:

  • go to Grafana dashboard settings-Variables
  • select Query type, set Datasource to corresponding LogStore, and write query
  • view the results in the Preview of values

image.png
image.png
Generally, if you use it as a logstore filter, we recommend that you open it in practice. All Option and configure the Custom all value * .
image.png
In this way, Dashboard Query statement is written as follows: * and request_method: ${var} | select * from log you can select Variable and select all.

4.1.3 Variable of time series data in SLS plug-in

As mentioned in the configuration of SLS storage data sources, SLS time series libraries can be configured Prometheus either native or SLS plug-ins. If you use the SLS plug-in, you usually need to use the promql operator. For more information about the usage and syntax, see: overview of time series data query and analysis .
The following example shows how to obtain the names of all metrics in the time series database.
image.png
image.png

4.1.4 Use Interval variables to control hitting based on time span

configure a Grafana time variable

Name Variable name, such as myinterval. The name is the variable used in your configuration. In this case, the myinterval. **$$myinterval **.
Type Select **Interval **.
Label Configure **time interval **.
Values Configure **1 m, 10 m, 30 m, 1 h, 6 h, 12 h, 1d, 7d, 14d, 30d **.
Auto Option Open **Auto Option **switch, other parameters remain the default configuration.

**Note: this time variable is different from Grafana variable in SLS. You need to add another variable before writing the normal variable. ****$ **to correctly convert Interval in SLS statements.

chartType: TimeSeries
xcol: time
ycol: pv, uv
query: * | select __time__ - __time__ % $${myinterval} as time, COUNT(*)/ 100 as pv, approx_distinct(remote_addr)/ 60 as uv GROUP BY time order by time limit 2000

In the configuration 1m when:
image.png
In the configuration 10m when:
image.png
When the time range of the dashboard is very long, this function can easily control the time hitting density.

4.2 Overview of chart standard writing

4.2.1 Stat / Gauge

xcol: stat
ycol: <Numeric column>, <Numeric column>
Note: If the required numeric column is a non-numeric column, it will be set to 0.
Example 1:

chartType: Stat
xcol: stat
ycol: PV, deltaPercent
query: * | select diff[1] as "PV", round((diff[1] - diff[2])/diff[2] * 100, 2) as deltaPercent from (select compare("PV", 86400) as diff from (select count(*) as "PV" from log))

image.png
Example 2:

chartType: Gauge
xcol: stat
ycol: c
query: * | select count(distinct labels['hostname']) as c from (select promql_query('${metricName}{cluster =~ "${cluster}"}') from metrics ) limit 100000

image.png
Other scenarios:
online Graph scenarios can also be displayed as single-value graphs, but this writing method is not recommended.

4.2.2 Pie

xcol: pie
ycol: <Aggregate columns>, <Numeric column>
Example 1:

chartType: Pie
xcol: pie
ycol: request_method, c
query: request_method: "$method" |  select count(*) as c, request_method group by request_method

image.png
Example 2:

chartType: Pie
xcol: pie
ycol: http_user_agent, pv
query: * | select count(1) as pv, case when http_user_agent like '%Chrome%' then 'Chrome' when http_user_agent like '%Firefox%' then 'Firefox' when http_user_agent like '%Safari%' then 'Safari' else 'unKnown' end as http_user_agent  group by case when http_user_agent like '%Chrome%' then 'Chrome' when http_user_agent like '%Firefox%' then 'Firefox' when http_user_agent like '%Safari%' then 'Safari' else 'unKnown' end  order by pv desc limit 10

image.png
Other scenarios:
the Stat chart is also applicable to pie charts and can also show the effect.

chartType: Pie
xcol: stat
ycol: hostNameNum, ipNum
query: * | select count(distinct labels['hostname']) as hostNameNum, count(distinct labels['ip']) + 20 as ipNum from (select promql_query('${metricName}{cluster =~ ".*"}') from metrics ) limit 100000

image.png

4.2.3 TimeSeries

xcol: <Time column>
ycol: <Numeric column> [, <Numeric column>, ...](LogStore)<labels / Aggregate columns>#:#<Numeric column>(Write Time series Library or log aggregation)
Example 1:

chartType: Time series
xcol: time
ycol: pv, uv
query: * | select __time__ - __time__ % $${myinterval} as time, COUNT(*)/ 100 as pv, approx_distinct(remote_addr)/ 60 as uv GROUP BY time order by time limit 2000

image.png
Example 2:

chartType: Time series
xcol: time
ycol: labels#:#value
query: * | select time, * from (select promql_query_range('${metricName}') from metrics) limit 1000

image.png
Example 3:
you can also use SQL to customize time series labels.

chartType: Time series
xcol: time
ycol: customLabelsExtract#:#value
query: * | select concat(labels['ip'], ' -> ', labels['cluster']) as customLabelsExtract, value from (select promql_query_range('${metricName}') from metrics) limit 1000

image.png

4.2.4 Bar

xcol: bar
ycol: <Aggregate columns>, <Numeric column> [, <Numeric column>, ...]
Example 1:

chartType: Bar
xcol: bar
ycol: host, pv, pv2, uv
query: * | select host, COUNT(*)+10 as pv, COUNT(*)+20 as pv2, approx_distinct(remote_addr) as uv GROUP BY host ORDER BY uv desc LIMIT 5

image.png

4.2.5 Table

xcol: <empty>
ycol: <empty> or <Display Column> [, <Display Column>, ...]
Example 1:

chartType: Table
xcol: 
ycol: 
query: * | select __time__ - __time__ % 60 as time, COUNT(*)/ 100 as pv, approx_distinct(remote_addr)/ 60 as uv GROUP BY time order by time limit 2000

image.png

4.2.6 Log

xcol: <空>
ycol: <空>
Example 1:
image.png

chartType: Logs
xcol: 
ycol: 
query: host: www.vt.mock.com

4.2.7 Traces

image.png

chartType: Traces
xcol: trace
ycol: 
query: traceID: "f88271003ab7d29ffee1eb8b68c58237"

In this example, the Trace logstore is used. You need to use the Trace service in SLS. Log service supports native access to OpenTelemetry Trace data, and supports access to Trace data through other Trace systems. For more information, see: https://help.aliyun.com/document_detail/208894.html
in Grafana 10.0 and later versions, the span filtering function of Trace data is supported. If you are using a lower version Grafana, you can also customize the span filtering function in query filtering. For example:

traceID: "f88271003ab7d29ffee1eb8b68c58237" and resource.deployment.environment : "dev" and service : "web_request" and duration > 10

4.2.8 Map

xcol: map
ycol: <Country column>, <Geographic location column>, <Numeric column>
Example 1:

chartType: GeoMap
xcol: map
ycol: country, geo, pv
query: * | select count(1) as pv ,geohash(ip_to_geo(arbitrary(remote_addr))) as geo,ip_to_country(remote_addr) as country  from log group by country having geo <>'' limit 1000

image.png

5. One-click jump to the SLS console

Note: This feature is only available SLS Grafana Plugin version 2.30 and later. You can jump to the SLS console at any time on the Explore and dashboard interfaces. You can also use the more powerful functions and flexible log retrieval of the SLS console.
Jump to the SLS console, with query and time information, without manual input.
ezgif.com-reverse.gif
This method is to jump directly to the SLS console without any configuration. However, you must log on to the SLS console with your browser. Otherwise, the login page will be displayed.

5.1 STS redirection (no logon required)

Procedure:

  1. access the RAM console https://ram.console.aliyun.com/roles/ , create a **_yes and only AliyunLogReadOnlyAccess _**the role of the policy. The recommended maximum session time is 3600 seconds. You can copy roleArn information in the basic information section.
    image.png
  2. Access the RAM console authorization interface https://ram.console.aliyun.com/permissions to grant Grafana DataSource and AccessKey permissions to the user corresponding to the AliyunRAMReadOnlyAccess configured in the AliyunSTSAssumeRoleAccess. (Or change the Grafana DataSource and AccessKey configured by the AccessSecret. You must ensure that the user has this permission.)
    image.png
  3. On the DataSource page, configure the roleArn.
    image.png
  4. Return to the Explore interface again and try to gotoSLS the button to avoid STS redirection.

Note: If the configured roleArn is incorrect or the permission range is incorrect, the logon-free function will become invalid and will be redirected according to the general logic.
image.png

5.2 Precautions and risk warnings⚠️

if STS redirection is configured, the following conditions must be met for permission security:

  • the user corresponding to the DataSource of the configuration accessKey. AliyunRAMReadOnlyAccess permissions, AliyunSTSAssumeRoleAccess permission
  • configure the DataSource of the roleArn. The policy must be **yes and only **AliyunLogReadOnlyAccess

principle reference: embedded and shared in the console
If you configure no-logon redirection, be sure to check whether the data source involves sharing Grafana public access to the dashboard. If public access is involved, potential traffic costs may rise and potential log content may be exposed.

6. Frequently Asked Questions about continuous updates

6.1 Why can charts be displayed on SLS but not on Grafana

  • check whether xcol and ycol are configured properly. For more information, see chapter 4.
  • Leave xcol and ycol blank and check whether the data is correct in tabular form.
  • Check whether the numeric column contains non-numeric characters or special characters.
  • Check whether data is returned in the Query Inspector.
  • Contact us to check this issue.

6.2 Why does time drift occur and the time display is incorrect?

Check whether the SQL statement contains the date_format function. If yes, specify the following code in the pattern string:%Y-%m-%d %H:%i:%s
For example, the error statement is as follows:

* | SELECT date_format(date_trunc('minute', __time__), '%H:%i') AS time, COUNT(1) AS count, status GROUP BY time, status ORDER BY time

Change to:

* | SELECT date_format(date_trunc('minute', __time__), '%Y-%m-%d %H:%i:%s') AS time, COUNT(1) AS count, status GROUP BY time, status ORDER BY time

6.3 How can I contact log service?

DingTalk Group
image.png