Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Reverse engineer Grok patterns from categorization results #30125

Merged
24 changes: 14 additions & 10 deletions x-pack/docs/en/rest-api/ml/get-category.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,11 @@ roles provide these privileges. For more information, see
==== Examples

The following example gets information about one category for the
`it_ops_new_logs` job:
`esxi_log` job:

[source,js]
--------------------------------------------------
GET _xpack/ml/anomaly_detectors/it_ops_new_logs/results/categories
GET _xpack/ml/anomaly_detectors/esxi_log/results/categories
{
"page":{
"size": 1
Expand All @@ -83,14 +83,18 @@ In this example, the API returns the following information:
"count": 11,
"categories": [
{
"job_id": "it_ops_new_logs",
"category_id": 1,
"terms": "Actual Transaction Already Voided Reversed hostname dbserver.acme.com physicalhost esxserver1.acme.com vmhost app1.acme.com",
"regex": ".*?Actual.+?Transaction.+?Already.+?Voided.+?Reversed.+?hostname.+?dbserver.acme.com.+?physicalhost.+?esxserver1.acme.com.+?vmhost.+?app1.acme.com.*",
"max_matching_length": 137,
"examples": [
"Actual Transaction Already Voided / Reversed;hostname=dbserver.acme.com;physicalhost=esxserver1.acme.com;vmhost=app1.acme.com"
]
"job_id" : "esxi_log",
"category_id" : 1,
"terms" : "Vpxa verbose vpxavpxaInvtVm opID VpxaInvtVmChangeListener Guest DiskInfo Changed",
"regex" : ".*?Vpxa.+?verbose.+?vpxavpxaInvtVm.+?opID.+?VpxaInvtVmChangeListener.+?Guest.+?DiskInfo.+?Changed.*",
"max_matching_length": 154,
"examples" : [
"Oct 19 17:04:44 esxi1.acme.com Vpxa: [3CB3FB90 verbose 'vpxavpxaInvtVm' opID=WFU-33d82c31] [VpxaInvtVmChangeListener] Guest DiskInfo Changed",
"Oct 19 17:04:45 esxi2.acme.com Vpxa: [3CA66B90 verbose 'vpxavpxaInvtVm' opID=WFU-33927856] [VpxaInvtVmChangeListener] Guest DiskInfo Changed",
"Oct 19 17:04:51 esxi1.acme.com Vpxa: [FFDBAB90 verbose 'vpxavpxaInvtVm' opID=WFU-25e0d447] [VpxaInvtVmChangeListener] Guest DiskInfo Changed",
"Oct 19 17:04:58 esxi2.acme.com Vpxa: [FFDDBB90 verbose 'vpxavpxaInvtVm' opID=WFU-bbff0134] [VpxaInvtVmChangeListener] Guest DiskInfo Changed"
],
"grok_pattern" : ".*?%{SYSLOGTIMESTAMP:timestamp}.+?Vpxa.+?%{BASE16NUM:field}.+?verbose.+?vpxavpxaInvtVm.+?opID.+?VpxaInvtVmChangeListener.+?Guest.+?DiskInfo.+?Changed.*"
}
]
}
Expand Down
7 changes: 7 additions & 0 deletions x-pack/docs/en/rest-api/ml/resultsresource.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,13 @@ A category resource has the following properties:
`examples`::
(array) A list of examples of actual values that matched the category.

`grok_pattern`::
(string) A Grok pattern that could be used in Logstash or an Ingest Pipeline
to extract fields from messages that match the category. This field is
experimental and may be changed or removed in a future version. The Grok
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lcawl do you know if there are any precedents for a single field that is experimental within a feature that is generally fully supported? Is the way I've documented it here right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@droberts195 I'd suggest putting "experimental[]" after the data type, per https://github.com/elastic/docs#experimental-and-beta

patterns that are found are not optimal, but are often a good starting point
for manual tweaking.

`job_id`::
(string) The unique identifier for the job that these results belong to.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
*/
package org.elasticsearch.xpack.core.ml.job.results;

import org.elasticsearch.Version;
import org.elasticsearch.common.ParseField;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
Expand Down Expand Up @@ -34,6 +35,7 @@ public class CategoryDefinition implements ToXContentObject, Writeable {
public static final ParseField REGEX = new ParseField("regex");
public static final ParseField MAX_MATCHING_LENGTH = new ParseField("max_matching_length");
public static final ParseField EXAMPLES = new ParseField("examples");
public static final ParseField GROK_PATTERN = new ParseField("grok_pattern");

// Used for QueryPage
public static final ParseField RESULTS_FIELD = new ParseField("categories");
Expand All @@ -51,6 +53,7 @@ private static ConstructingObjectParser<CategoryDefinition, Void> createParser(b
parser.declareString(CategoryDefinition::setRegex, REGEX);
parser.declareLong(CategoryDefinition::setMaxMatchingLength, MAX_MATCHING_LENGTH);
parser.declareStringArray(CategoryDefinition::setExamples, EXAMPLES);
parser.declareString(CategoryDefinition::setGrokPattern, GROK_PATTERN);

return parser;
}
Expand All @@ -61,6 +64,7 @@ private static ConstructingObjectParser<CategoryDefinition, Void> createParser(b
private String regex = "";
private long maxMatchingLength = 0L;
private final Set<String> examples;
private String grokPattern;

public CategoryDefinition(String jobId) {
this.jobId = jobId;
Expand All @@ -74,6 +78,9 @@ public CategoryDefinition(StreamInput in) throws IOException {
regex = in.readString();
maxMatchingLength = in.readLong();
examples = new TreeSet<>(in.readList(StreamInput::readString));
if (in.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) {
grokPattern = in.readOptionalString();
}
}

@Override
Expand All @@ -84,6 +91,9 @@ public void writeTo(StreamOutput out) throws IOException {
out.writeString(regex);
out.writeLong(maxMatchingLength);
out.writeStringList(new ArrayList<>(examples));
if (out.getVersion().onOrAfter(Version.V_7_0_0_alpha1)) {
out.writeOptionalString(grokPattern);
}
}

public String getJobId() {
Expand Down Expand Up @@ -139,6 +149,14 @@ public void addExample(String example) {
examples.add(example);
}

public String getGrokPattern() {
return grokPattern;
}

public void setGrokPattern(String grokPattern) {
this.grokPattern = grokPattern;
}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject();
Expand All @@ -148,6 +166,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
builder.field(REGEX.getPreferredName(), regex);
builder.field(MAX_MATCHING_LENGTH.getPreferredName(), maxMatchingLength);
builder.field(EXAMPLES.getPreferredName(), examples);
if (grokPattern != null) {
builder.field(GROK_PATTERN.getPreferredName(), grokPattern);
}
builder.endObject();
return builder;
}
Expand All @@ -166,11 +187,12 @@ public boolean equals(Object other) {
&& Objects.equals(this.terms, that.terms)
&& Objects.equals(this.regex, that.regex)
&& Objects.equals(this.maxMatchingLength, that.maxMatchingLength)
&& Objects.equals(this.examples, that.examples);
&& Objects.equals(this.examples, that.examples)
&& Objects.equals(this.grokPattern, that.grokPattern);
}

@Override
public int hashCode() {
return Objects.hash(jobId, categoryId, terms, regex, maxMatchingLength, examples);
return Objects.hash(jobId, categoryId, terms, regex, maxMatchingLength, examples, grokPattern);
}
}
1 change: 1 addition & 0 deletions x-pack/plugin/ml/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ dependencies {
testCompile project(path: xpackModule('security'), configuration: 'testArtifacts')

// ml deps
compile project(':libs:grok')
compile 'net.sf.supercsv:super-csv:2.4.0'
nativeBundle "org.elasticsearch.ml:ml-cpp:${project.version}@zip"
testCompile 'org.ini4j:ini4j:0.5.2'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ protected void doExecute(GetCategoriesAction.Request request, ActionListener<Get

Integer from = request.getPageParams() != null ? request.getPageParams().getFrom() : null;
Integer size = request.getPageParams() != null ? request.getPageParams().getSize() : null;
jobProvider.categoryDefinitions(request.getJobId(), request.getCategoryId(), from, size,
jobProvider.categoryDefinitions(request.getJobId(), request.getCategoryId(), true, from, size,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be exposed as a request parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth it. People can easily ignore the Grok patterns if they're not interested.

r -> listener.onResponse(new GetCategoriesAction.Response(r)), listener::onFailure, client);
}
}
Loading