-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auto Import] Fix cases where LLM generates incorrect array field access #196207
[Auto Import] Fix cases where LLM generates incorrect array field access #196207
Conversation
Pinging @elastic/security-scalability (Team:Security-Scalability) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the custom fields with an array of ip
or hosts
or users
getting into related fields.
We will never be able to put them into the related fields , right?
For example:
{
some_user: [ 'john' , 'smith' ],
some_host: [ 'machine1' , 'machine2']
}
x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/integration_assistant/server/graphs/categorization/prompts.ts
Outdated
Show resolved
Hide resolved
I assume they get there with some custom Painless code, which we can also add in the future. I actually don't know – maybe it works in this example; we'll need to check. But if there is a dictionary inside |
…ion/prompts.ts Co-authored-by: Bharat Pasupula <[email protected]>
…ion/prompts.ts Co-authored-by: Bharat Pasupula <[email protected]>
…ion/prompts.ts Co-authored-by: Bharat Pasupula <[email protected]>
💚 Build Succeeded
Metrics [docs]
History
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can merge this now. Will need to work on the fields inside arrays for related graph in a later PR.
Starting backport for target branches: 8.x https://github.com/elastic/kibana/actions/runs/11348150567 |
…ess (elastic#196207) ## Release Note Fixes cases where LLM was likely to generate invalid processors containing array access in Automatic Import. ## Context Previously, it happened from time to time that the LLM attempts to add related fields or apply categorization conditions that use a field, path to which goes through an array. The problem is that such an access is invalid and leads to an immediate error (key part highlighted): Even including explicit instructions to avoid brackets or an array access did not seem enough, as the LLM would try to use a different syntax, owing to the aggressiveness of our review instructions. The suggested solution is to remove all arrays from the information shown to the LLM in the related chain. This guarantees that no illegal access will ever be attempted. ### Summary - Introduces a utility function to remove all arrays from a JSON object. - Applies this function for all LLM calls in the related chain. - Modifies the prompts of related and categorization chain to skip the arrays as well. --------- Co-authored-by: Bharat Pasupula <[email protected]> (cherry picked from commit 8abe259)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…ld access (#196207) (#196329) # Backport This will backport the following commits from `main` to `8.x`: - [[Auto Import] Fix cases where LLM generates incorrect array field access (#196207)](#196207) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ilya Nikokoshev","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-15T14:24:41Z","message":"[Auto Import] Fix cases where LLM generates incorrect array field access (#196207)\n\n## Release Note\r\n\r\nFixes cases where LLM was likely to generate invalid processors\r\ncontaining array access in Automatic Import.\r\n\r\n## Context\r\n\r\nPreviously, it happened from time to time that the LLM attempts to add\r\nrelated fields or apply categorization conditions that use a field, path\r\nto which goes through an array. \r\n\r\nThe problem is that such an access is invalid and leads to an immediate\r\nerror (key part highlighted):\r\n\r\nEven including explicit instructions to avoid brackets or an array\r\naccess did not seem enough, as the LLM would try to use a different\r\nsyntax, owing to the aggressiveness of our review instructions.\r\n\r\nThe suggested solution is to remove all arrays from the information\r\nshown to the LLM in the related chain. This guarantees that no illegal\r\naccess will ever be attempted.\r\n\r\n### Summary\r\n\r\n- Introduces a utility function to remove all arrays from a JSON object.\r\n- Applies this function for all LLM calls in the related chain.\r\n- Modifies the prompts of related and categorization chain to skip the\r\narrays as well.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <[email protected]>","sha":"8abe25970aa1b483676dde17b7972359c8c55475","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:fix","v9.0.0","backport:prev-minor","Team:Security-Scalability","Feature:AutomaticImport"],"title":"[Auto Import] Fix cases where LLM generates incorrect array field access","number":196207,"url":"https://github.com/elastic/kibana/pull/196207","mergeCommit":{"message":"[Auto Import] Fix cases where LLM generates incorrect array field access (#196207)\n\n## Release Note\r\n\r\nFixes cases where LLM was likely to generate invalid processors\r\ncontaining array access in Automatic Import.\r\n\r\n## Context\r\n\r\nPreviously, it happened from time to time that the LLM attempts to add\r\nrelated fields or apply categorization conditions that use a field, path\r\nto which goes through an array. \r\n\r\nThe problem is that such an access is invalid and leads to an immediate\r\nerror (key part highlighted):\r\n\r\nEven including explicit instructions to avoid brackets or an array\r\naccess did not seem enough, as the LLM would try to use a different\r\nsyntax, owing to the aggressiveness of our review instructions.\r\n\r\nThe suggested solution is to remove all arrays from the information\r\nshown to the LLM in the related chain. This guarantees that no illegal\r\naccess will ever be attempted.\r\n\r\n### Summary\r\n\r\n- Introduces a utility function to remove all arrays from a JSON object.\r\n- Applies this function for all LLM calls in the related chain.\r\n- Modifies the prompts of related and categorization chain to skip the\r\narrays as well.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <[email protected]>","sha":"8abe25970aa1b483676dde17b7972359c8c55475"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/196207","number":196207,"mergeCommit":{"message":"[Auto Import] Fix cases where LLM generates incorrect array field access (#196207)\n\n## Release Note\r\n\r\nFixes cases where LLM was likely to generate invalid processors\r\ncontaining array access in Automatic Import.\r\n\r\n## Context\r\n\r\nPreviously, it happened from time to time that the LLM attempts to add\r\nrelated fields or apply categorization conditions that use a field, path\r\nto which goes through an array. \r\n\r\nThe problem is that such an access is invalid and leads to an immediate\r\nerror (key part highlighted):\r\n\r\nEven including explicit instructions to avoid brackets or an array\r\naccess did not seem enough, as the LLM would try to use a different\r\nsyntax, owing to the aggressiveness of our review instructions.\r\n\r\nThe suggested solution is to remove all arrays from the information\r\nshown to the LLM in the related chain. This guarantees that no illegal\r\naccess will ever be attempted.\r\n\r\n### Summary\r\n\r\n- Introduces a utility function to remove all arrays from a JSON object.\r\n- Applies this function for all LLM calls in the related chain.\r\n- Modifies the prompts of related and categorization chain to skip the\r\narrays as well.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <[email protected]>","sha":"8abe25970aa1b483676dde17b7972359c8c55475"}}]}] BACKPORT--> Co-authored-by: Ilya Nikokoshev <[email protected]>
Release Note
Fixes cases where LLM was likely to generate invalid processors containing array access in Automatic Import.
Context
Previously, it happened from time to time that the LLM attempts to add related fields or apply categorization conditions that use a field, path to which goes through an array. Here's an example output of the review step in the related chain:
(check the original event)
(check the event as seen by the LLM)
The problem is that such an access is invalid and leads to an immediate error (key part highlighted):
Even including explicit instructions to avoid brackets or an array access did not seem enough, as the LLM would try to use a different syntax, owing to the aggressiveness of our review instructions:
The suggested solution is to remove all arrays from the information shown to the LLM in the related chain. This guarantees that no illegal access will ever be attempted.
Summary
Testing
main
with these changes: ai_falcon_202410142221-1.0.0.zipmain
where 100 samples are given to the chains with these changes: ai_falcon_202410141910-1.0.0.zip(check the event as seen by the LLM now)
For maintainers