From 913d03b8b0289ab87edf0467e457017d8a771490 Mon Sep 17 00:00:00 2001 From: Ilya Nikokoshev Date: Mon, 14 Oct 2024 21:21:39 +0200 Subject: [PATCH 1/6] Reduce cases where LLM returns invalid field names --- .../server/graphs/categorization/prompts.ts | 3 +++ .../integration_assistant/server/graphs/related/prompts.ts | 3 +++ 2 files changed, 6 insertions(+) diff --git a/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts b/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts index 2f90e426dc552..34395fd6764e8 100644 --- a/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts +++ b/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts @@ -53,6 +53,7 @@ You ALWAYS follow these guidelines when writing your response: - You can add as many processor objects as you need to cover all the unique combinations that was detected. - If conditions should always use a ? character when accessing nested fields, in case the field might not always be available, see example processors above. +- You can access nested dictionaries with the ctx?.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. - When an if condition is not needed the argument it should not be included in that specific object of your response. - When using a range based if condition like > 0, you first need to check that the field is not null, for example: ctx.somefield?.production != null && ctx.somefield?.production > 0 - If no good match is found for any of the pipeline results, then respond with an empty array [] as valid JSON enclosed with 3 backticks (\`). @@ -110,6 +111,7 @@ You ALWAYS follow these guidelines when writing your response: - You can use as many processor objects as you need to add all relevant ECS categories and types combinations. - If conditions should always use a ? character when accessing nested fields, in case the field might not always be available, see example processors above. +- You can access nested dictionaries with the ctx?.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. - When an if condition is not needed the argument should not be used for the processor object. - If updates are needed you respond with the initially provided array of processors. - If an update removes the last remaining processor object you respond with an empty array [] as valid JSON enclosed with 3 backticks (\`). @@ -159,6 +161,7 @@ You ALWAYS follow these guidelines when writing your response: - If the error complains about having event.type or event.category not in the allowed values , fix the corresponding append processors to use the allowed values mentioned in the error. - If the error is about event.type not compatible with any event.category, please refer to the 'compatible_types' in the context to fix the corresponding append processors to use valid combination of event.type and event.category - If resolving the validation removes the last remaining processor object, respond with an empty array [] as valid JSON enclosed with 3 backticks (\`). +- Reminder: you can access nested dictionaries with the ctx?.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. - Do not respond with anything except the complete updated array of processors as a valid JSON object enclosed with 3 backticks (\`), see example response below. diff --git a/x-pack/plugins/integration_assistant/server/graphs/related/prompts.ts b/x-pack/plugins/integration_assistant/server/graphs/related/prompts.ts index 9fa50d5900806..eab6388afb791 100644 --- a/x-pack/plugins/integration_assistant/server/graphs/related/prompts.ts +++ b/x-pack/plugins/integration_assistant/server/graphs/related/prompts.ts @@ -35,6 +35,7 @@ You ALWAYS follow these guidelines when writing your response: - The \`message\` field may not be part of related fields. - You can use as many processor objects as needed to map all relevant pipeline result fields to any of the ECS related fields. +- You can access nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array. Never use brackets in the field name. - If no relevant fields or values are found that could be mapped confidently to any of the related fields, then respond with an empty array [] as valid JSON enclosed with 3 backticks (\`). - Do not respond with anything except the array of processors as a valid JSON objects enclosed with 3 backticks (\`), see example response below. @@ -82,6 +83,7 @@ You ALWAYS follow these guidelines when writing your response: - The \`message\` field may not be part of related fields. - Never use "split" in template values, only use the field name inside the triple brackets. If the error mentions "Improperly closed variable in query-template" then check each "value" field for any special characters and remove them. +- You can access nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array. Never use brackets in the field name. - If solving an error means removing the last processor in the list, then return an empty array [] as valid JSON enclosed with 3 backticks (\`). - Do not respond with anything except the complete updated array of processors as a valid JSON object enclosed with 3 backticks (\`), see example response below. @@ -131,6 +133,7 @@ You ALWAYS follow these guidelines when writing your response: - You can use as many processor objects as needed to map all relevant pipeline result fields to any of the ECS related fields. - If no updates are needed you respond with the initially provided current processors, if no processors are present you respond with an empty array [] as valid JSON enclosied with 3 backticks (\`). +- You can access nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array. Never use brackets in the field name. - Do not respond with anything except updated array of processors as a valid JSON object enclosed with 3 backticks (\`), see example response below. From 03399153dd987d629307de1c7f33506df34db747 Mon Sep 17 00:00:00 2001 From: Ilya Nikokoshev Date: Mon, 14 Oct 2024 21:30:48 +0200 Subject: [PATCH 2/6] Improve related prompts --- .../server/graphs/related/prompts.ts | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/x-pack/plugins/integration_assistant/server/graphs/related/prompts.ts b/x-pack/plugins/integration_assistant/server/graphs/related/prompts.ts index eab6388afb791..9b76ddb96ba8d 100644 --- a/x-pack/plugins/integration_assistant/server/graphs/related/prompts.ts +++ b/x-pack/plugins/integration_assistant/server/graphs/related/prompts.ts @@ -27,7 +27,7 @@ Here are some context for you to reference for your task, read it carefully as y For each pipeline result you find matching values that would fit any of the related fields perform the follow steps: 1. Identify which related field the value would fit in. -2. Create a new processor object with the field value set to the correct related.field, and the value_field set to the full path of the field that contains the value which we want to append. +2. Create a new processor object with the field value set to the correct related.field, and the value_field set to the full path of the field that contains the value which we want to append, if that path can be encoded as a string of dict key accesses. 3. Always check if the related.ip, related.hash, related.user and related.host fields are common in the ecs context above. 4. The value_field argument in your response consist of only one value. @@ -35,7 +35,7 @@ You ALWAYS follow these guidelines when writing your response: - The \`message\` field may not be part of related fields. - You can use as many processor objects as needed to map all relevant pipeline result fields to any of the ECS related fields. -- You can access nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array. Never use brackets in the field name. +- You can access nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array; skip them instead. - If no relevant fields or values are found that could be mapped confidently to any of the related fields, then respond with an empty array [] as valid JSON enclosed with 3 backticks (\`). - Do not respond with anything except the array of processors as a valid JSON objects enclosed with 3 backticks (\`), see example response below. @@ -83,7 +83,7 @@ You ALWAYS follow these guidelines when writing your response: - The \`message\` field may not be part of related fields. - Never use "split" in template values, only use the field name inside the triple brackets. If the error mentions "Improperly closed variable in query-template" then check each "value" field for any special characters and remove them. -- You can access nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array. Never use brackets in the field name. +- You can access nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array. Never use brackets in the field name, never try to access array elements. - If solving an error means removing the last processor in the list, then return an empty array [] as valid JSON enclosed with 3 backticks (\`). - Do not respond with anything except the complete updated array of processors as a valid JSON object enclosed with 3 backticks (\`), see example response below. @@ -125,7 +125,7 @@ Please review the pipeline results and the array of current processors above, an For each pipeline result you find matching values that would fit any of the related fields perform the follow steps: 1. Identify which related field the value would fit in. -2. Create a new processor object with the field value set to the correct related.field, and the value_field set to the full path of the field that contains the value which we want to append. +2. Create a new processor object with the field value set to the correct related.field, and the value_field set to the full path of the field that contains the value which we want to append. You can access fields inside nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array, so skip a field if it's path contains an array. 3. If previous errors above is not empty, do not add any processors that would cause any of the same errors again, if you are unsure, then remove the processor from the list. 4. If no updates are needed, then respond with the initially provided current processors. @@ -133,7 +133,6 @@ You ALWAYS follow these guidelines when writing your response: - You can use as many processor objects as needed to map all relevant pipeline result fields to any of the ECS related fields. - If no updates are needed you respond with the initially provided current processors, if no processors are present you respond with an empty array [] as valid JSON enclosied with 3 backticks (\`). -- You can access nested dictionaries with the field.another_field syntax, but it's not possible to access elements of an array. Never use brackets in the field name. - Do not respond with anything except updated array of processors as a valid JSON object enclosed with 3 backticks (\`), see example response below. From 96e48469fd66d7f9d3fe956f981a32ac460994f8 Mon Sep 17 00:00:00 2001 From: Ilya Nikokoshev Date: Mon, 14 Oct 2024 22:12:34 +0200 Subject: [PATCH 3/6] Hide the array values from the LLM in related chain --- .../server/graphs/related/related.ts | 3 +- .../server/graphs/related/review.ts | 3 +- .../server/graphs/related/util.test.ts | 135 ++++++++++++++++++ .../server/graphs/related/util.ts | 37 +++++ 4 files changed, 176 insertions(+), 2 deletions(-) create mode 100644 x-pack/plugins/integration_assistant/server/graphs/related/util.test.ts create mode 100644 x-pack/plugins/integration_assistant/server/graphs/related/util.ts diff --git a/x-pack/plugins/integration_assistant/server/graphs/related/related.ts b/x-pack/plugins/integration_assistant/server/graphs/related/related.ts index 902427a1c484f..4298fb1ab24fa 100644 --- a/x-pack/plugins/integration_assistant/server/graphs/related/related.ts +++ b/x-pack/plugins/integration_assistant/server/graphs/related/related.ts @@ -11,6 +11,7 @@ import type { RelatedState, SimplifiedProcessor, SimplifiedProcessors } from '.. import { combineProcessors } from '../../util/processors'; import { RELATED_MAIN_PROMPT } from './prompts'; import type { RelatedNodeParams } from './types'; +import { deepCopySkipArrays } from './util'; export async function handleRelated({ state, @@ -21,7 +22,7 @@ export async function handleRelated({ const relatedMainGraph = relatedMainPrompt.pipe(model).pipe(outputParser); const currentProcessors = (await relatedMainGraph.invoke({ - pipeline_results: JSON.stringify(state.pipelineResults, null, 2), + pipeline_results: JSON.stringify(state.pipelineResults.map(deepCopySkipArrays), null, 2), ex_answer: state.exAnswer, ecs: state.ecs, })) as SimplifiedProcessor[]; diff --git a/x-pack/plugins/integration_assistant/server/graphs/related/review.ts b/x-pack/plugins/integration_assistant/server/graphs/related/review.ts index 300f33144b52a..37c0008304958 100644 --- a/x-pack/plugins/integration_assistant/server/graphs/related/review.ts +++ b/x-pack/plugins/integration_assistant/server/graphs/related/review.ts @@ -11,6 +11,7 @@ import type { RelatedState, SimplifiedProcessors, SimplifiedProcessor } from '.. import type { RelatedNodeParams } from './types'; import { combineProcessors } from '../../util/processors'; import { RELATED_REVIEW_PROMPT } from './prompts'; +import { deepCopySkipArrays } from './util'; export async function handleReview({ state, @@ -24,7 +25,7 @@ export async function handleReview({ current_processors: JSON.stringify(state.currentProcessors, null, 2), ex_answer: state.exAnswer, previous_error: state.previousError, - pipeline_results: JSON.stringify(state.pipelineResults, null, 2), + pipeline_results: JSON.stringify(state.pipelineResults.map(deepCopySkipArrays), null, 2), })) as SimplifiedProcessor[]; const processors = { diff --git a/x-pack/plugins/integration_assistant/server/graphs/related/util.test.ts b/x-pack/plugins/integration_assistant/server/graphs/related/util.test.ts new file mode 100644 index 0000000000000..c81369f98e56d --- /dev/null +++ b/x-pack/plugins/integration_assistant/server/graphs/related/util.test.ts @@ -0,0 +1,135 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +import { deepCopySkipArrays } from './util'; + +describe('deepCopySkipArrays', () => { + it('should skip arrays and deeply copy objects', () => { + const input = { + field: ['a', 'b'], + another: { field: 'c' }, + }; + + const expectedOutput = { + another: { field: 'c' }, + }; + + expect(deepCopySkipArrays(input)).toEqual(expectedOutput); + }); + + it('should return primitive types as is', () => { + expect(deepCopySkipArrays(42)).toBe(42); + expect(deepCopySkipArrays('string')).toBe('string'); + expect(deepCopySkipArrays(true)).toBe(true); + }); + + it('should handle nested objects and skip nested arrays', () => { + const input = { + level1: { + level2: { + array: [1, 2, 3], + value: 'test', + }, + }, + }; + + const expectedOutput = { + level1: { + level2: { + value: 'test', + }, + }, + }; + + expect(deepCopySkipArrays(input)).toEqual(expectedOutput); + }); + + it('should return undefined for arrays', () => { + expect(deepCopySkipArrays([1, 2, 3])).toBeUndefined(); + }); + + it('should handle null and undefined values', () => { + expect(deepCopySkipArrays(null)).toBeNull(); + expect(deepCopySkipArrays(undefined)).toBeUndefined(); + }); + + it('should handle empty objects', () => { + expect(deepCopySkipArrays({})).toEqual({}); + }); + + it('should handle objects with mixed types', () => { + const input = { + number: 1, + string: 'test', + boolean: true, + object: { key: 'value' }, + array: [1, 2, 3], + }; + + const expectedOutput = { + number: 1, + string: 'test', + boolean: true, + object: { key: 'value' }, + }; + + expect(deepCopySkipArrays(input)).toEqual(expectedOutput); + }); + + // Test case + it('should skip arrays and deeply copy objects with nested arrays', () => { + const input = { + field: ['a', 'b'], + another: { field: 'c' }, + }; + + const expectedOutput = { + another: { field: 'c' }, + }; + + expect(deepCopySkipArrays(input)).toEqual(expectedOutput); + }); + + it('should handle objects with nested empty arrays', () => { + const input = { + field: [], + another: { field: 'c' }, + }; + + const expectedOutput = { + another: { field: 'c' }, + }; + + expect(deepCopySkipArrays(input)).toEqual(expectedOutput); + }); + + it('should handle objects with nested arrays containing objects', () => { + const input = { + field: [{ key: 'value' }], + another: { field: 'c' }, + }; + + const expectedOutput = { + another: { field: 'c' }, + }; + + expect(deepCopySkipArrays(input)).toEqual(expectedOutput); + }); + + it('should handle objects with nested arrays containing mixed types', () => { + const input = { + field: [1, 'string', true, { key: 'value' }], + another: { field: 'c' }, + }; + + const expectedOutput = { + another: { field: 'c' }, + }; + + expect(deepCopySkipArrays(input)).toEqual(expectedOutput); + }); +}); diff --git a/x-pack/plugins/integration_assistant/server/graphs/related/util.ts b/x-pack/plugins/integration_assistant/server/graphs/related/util.ts new file mode 100644 index 0000000000000..b939e939fed32 --- /dev/null +++ b/x-pack/plugins/integration_assistant/server/graphs/related/util.ts @@ -0,0 +1,37 @@ +/* + * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one + * or more contributor license agreements. Licensed under the Elastic License + * 2.0; you may not use this file except in compliance with the Elastic License + * 2.0. + */ + +/** + * Deeply copies a JSON object, skipping all arrays. + * + * @param value - The JSON value to be deeply copied, which can be an array, object, or other types. + * @returns A new object that is a deep copy of the input value, but with arrays skipped. + * + * This function recursively traverses the provided value. If the value is an array, it skips it. + * If the value is a regular object, it continues traversing its properties and copying them. + */ +export function deepCopySkipArrays(value: unknown): unknown { + if (Array.isArray(value)) { + // Skip arrays + return undefined; + } + + if (typeof value === 'object' && value !== null) { + // Regular dictionary, continue traversing. + const result: Record = {}; + for (const [k, v] of Object.entries(value)) { + const copiedValue = deepCopySkipArrays(v); + if (copiedValue !== undefined) { + result[k] = copiedValue; + } + } + return result; + } + + // For primitive types, return the value as is. + return value; +} From e95896dea270220c6d41a7b562727641aaccb00a Mon Sep 17 00:00:00 2001 From: Ilya Nikokoshev Date: Tue, 15 Oct 2024 11:34:38 +0300 Subject: [PATCH 4/6] Update x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts Co-authored-by: Bharat Pasupula <123897612+bhapas@users.noreply.github.com> --- .../server/graphs/categorization/prompts.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts b/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts index 34395fd6764e8..451916d9c2d5e 100644 --- a/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts +++ b/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts @@ -111,7 +111,7 @@ You ALWAYS follow these guidelines when writing your response: - You can use as many processor objects as you need to add all relevant ECS categories and types combinations. - If conditions should always use a ? character when accessing nested fields, in case the field might not always be available, see example processors above. -- You can access nested dictionaries with the ctx?.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. +- You can access nested dictionaries with the ctx.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. - When an if condition is not needed the argument should not be used for the processor object. - If updates are needed you respond with the initially provided array of processors. - If an update removes the last remaining processor object you respond with an empty array [] as valid JSON enclosed with 3 backticks (\`). From bbcf82b09aa0c6f819391cce0af1b1727bdac5b9 Mon Sep 17 00:00:00 2001 From: Ilya Nikokoshev Date: Tue, 15 Oct 2024 11:34:46 +0300 Subject: [PATCH 5/6] Update x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts Co-authored-by: Bharat Pasupula <123897612+bhapas@users.noreply.github.com> --- .../server/graphs/categorization/prompts.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts b/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts index 451916d9c2d5e..b713cd8a0b9b3 100644 --- a/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts +++ b/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts @@ -161,7 +161,7 @@ You ALWAYS follow these guidelines when writing your response: - If the error complains about having event.type or event.category not in the allowed values , fix the corresponding append processors to use the allowed values mentioned in the error. - If the error is about event.type not compatible with any event.category, please refer to the 'compatible_types' in the context to fix the corresponding append processors to use valid combination of event.type and event.category - If resolving the validation removes the last remaining processor object, respond with an empty array [] as valid JSON enclosed with 3 backticks (\`). -- Reminder: you can access nested dictionaries with the ctx?.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. +- Reminder: you can access nested dictionaries with the ctx.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. - Do not respond with anything except the complete updated array of processors as a valid JSON object enclosed with 3 backticks (\`), see example response below. From 4e4cdc92f1c5e59db9d3cd935106722119e386cd Mon Sep 17 00:00:00 2001 From: Ilya Nikokoshev Date: Tue, 15 Oct 2024 15:00:26 +0300 Subject: [PATCH 6/6] Update x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts Co-authored-by: Bharat Pasupula <123897612+bhapas@users.noreply.github.com> --- .../server/graphs/categorization/prompts.ts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts b/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts index b713cd8a0b9b3..baf9d5d5b3ada 100644 --- a/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts +++ b/x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts @@ -53,7 +53,7 @@ You ALWAYS follow these guidelines when writing your response: - You can add as many processor objects as you need to cover all the unique combinations that was detected. - If conditions should always use a ? character when accessing nested fields, in case the field might not always be available, see example processors above. -- You can access nested dictionaries with the ctx?.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. +- You can access nested dictionaries with the ctx.field?.another_field syntax, but it's not possible to access elements of an array. Never use brackets in an if statement. - When an if condition is not needed the argument it should not be included in that specific object of your response. - When using a range based if condition like > 0, you first need to check that the field is not null, for example: ctx.somefield?.production != null && ctx.somefield?.production > 0 - If no good match is found for any of the pipeline results, then respond with an empty array [] as valid JSON enclosed with 3 backticks (\`).