Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays #2880

Closed
mingshl opened this issue Sep 3, 2024 · 1 comment
Labels
2.17 enhancement New feature or request

Comments

@mingshl
Copy link
Collaborator

mingshl commented Sep 3, 2024

Problem Statement

The current implementation of the ML Inference Search Response Processor in OpenSearch has an issue when handling custom prompts that include placeholders for lists or arrays. When a list or array is passed as a parameter, the string representation of the list or array is not properly escaped, leading to incorrect or invalid prompts being sent to the machine learning model.

For example, consider the following scenario:


POST /_plugins/_ml/models/2SwoD5EB6KAJXDLxezto/_predict
{
  "parameters": {
    "prompt": "\n\nHuman: You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context}. \n\n Human: please summarize the documents \n\n Assistant:",
    "context": ["Dr. Eric Goldberg is a fantastic doctor who has correctly diagnosed every issue that my wife and I have had. Unlike many of my past doctors, Dr. Goldberg is very accessible and we have been able to schedule appointments with him and his staff very quickly. We are happy to have him in the neighborhood and look forward to being his patients for many years to come."]
  }
}

In this example, the context parameter is a list containing a single string. When the prompt is constructed using the ${parameters.context} placeholder, the list is not properly escaped, leading to an invalid prompt being sent to the model.

Solution Proposal

To address this issue, we propose adding a toString() method to the HTTP connector in the ML Commons project. This method will be responsible for properly escaping and converting lists or arrays to their string representation when used as placeholders in custom prompts.

The proposed solution will involve the following changes:

Modify the HttpConnector class in the ML Commons project to introduce a new toString() method.
The toString() method should handle the conversion of lists or arrays to their string representation, ensuring that the elements are properly escaped and formatted as a valid JSON string.
Update the MLInferenceSearchResponseProcessor class in the OpenSearch project to use the toString() method when substituting placeholders for lists or arrays in custom prompts.
Update the documentation and examples to reflect the usage of the toString() method for handling custom prompts with lists or arrays.
By implementing this solution, users will be able to provide custom prompts with placeholders for lists or arrays without encountering issues related to improper escaping or formatting. The toString() method in the HTTP connector will ensure that the lists or arrays are correctly converted to their string representation, enabling seamless integration with machine learning models that expect properly formatted prompts.

Example usage:

PUT /_search/pipeline/my_pipeline_request_review_llm
{
  "response_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run llm",
        "model_id": "cf46K5EBoVpekzRp8x_3",
        "function_name": "REMOTE",
        "input_map": [
          {
            "context": "review"
          }
        ],
        "output_map": [
          {
            "llm_response": "response"
          }
        ],
        "model_config": {
          "prompt": "\n\nHuman: You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context.toString()}. \n\n Human: please summarize the documents \n\n Assistant:"
        },
        "ignore_missing": false,
        "ignore_failure": false
      }
    }
  ]
}

In this example, the ${parameters.context.toString()} placeholder will be replaced with the properly escaped and formatted string representation of the context parameter, ensuring that the prompt is correctly constructed and sent to the machine learning model.

Do you have any additional context?
META Issue](#2839)
[RFC for ML Inference Processors] #2173

@mingshl mingshl added enhancement New feature or request untriaged 2.17 and removed untriaged labels Sep 3, 2024
@mingshl mingshl changed the title [RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays in ML Inference Search Response Processor [RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays Sep 5, 2024
@mingshl
Copy link
Collaborator Author

mingshl commented Sep 5, 2024

This change is implemented in http connector, so it will help with all model during prediction task with a list/array.

@mingshl mingshl closed this as completed Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.17 enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant