[RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays #2880

mingshl · 2024-09-03T21:00:34Z

Problem Statement

The current implementation of the ML Inference Search Response Processor in OpenSearch has an issue when handling custom prompts that include placeholders for lists or arrays. When a list or array is passed as a parameter, the string representation of the list or array is not properly escaped, leading to incorrect or invalid prompts being sent to the machine learning model.

For example, consider the following scenario:


POST /_plugins/_ml/models/2SwoD5EB6KAJXDLxezto/_predict
{
  "parameters": {
    "prompt": "\n\nHuman: You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context}. \n\n Human: please summarize the documents \n\n Assistant:",
    "context": ["Dr. Eric Goldberg is a fantastic doctor who has correctly diagnosed every issue that my wife and I have had. Unlike many of my past doctors, Dr. Goldberg is very accessible and we have been able to schedule appointments with him and his staff very quickly. We are happy to have him in the neighborhood and look forward to being his patients for many years to come."]
  }
}

In this example, the context parameter is a list containing a single string. When the prompt is constructed using the ${parameters.context} placeholder, the list is not properly escaped, leading to an invalid prompt being sent to the model.

Solution Proposal

To address this issue, we propose adding a toString() method to the HTTP connector in the ML Commons project. This method will be responsible for properly escaping and converting lists or arrays to their string representation when used as placeholders in custom prompts.

The proposed solution will involve the following changes:

Modify the HttpConnector class in the ML Commons project to introduce a new toString() method.
The toString() method should handle the conversion of lists or arrays to their string representation, ensuring that the elements are properly escaped and formatted as a valid JSON string.
Update the MLInferenceSearchResponseProcessor class in the OpenSearch project to use the toString() method when substituting placeholders for lists or arrays in custom prompts.
Update the documentation and examples to reflect the usage of the toString() method for handling custom prompts with lists or arrays.
By implementing this solution, users will be able to provide custom prompts with placeholders for lists or arrays without encountering issues related to improper escaping or formatting. The toString() method in the HTTP connector will ensure that the lists or arrays are correctly converted to their string representation, enabling seamless integration with machine learning models that expect properly formatted prompts.

Example usage:

PUT /_search/pipeline/my_pipeline_request_review_llm
{
  "response_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run llm",
        "model_id": "cf46K5EBoVpekzRp8x_3",
        "function_name": "REMOTE",
        "input_map": [
          {
            "context": "review"
          }
        ],
        "output_map": [
          {
            "llm_response": "response"
          }
        ],
        "model_config": {
          "prompt": "\n\nHuman: You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context.toString()}. \n\n Human: please summarize the documents \n\n Assistant:"
        },
        "ignore_missing": false,
        "ignore_failure": false
      }
    }
  ]
}

In this example, the ${parameters.context.toString()} placeholder will be replaced with the properly escaped and formatted string representation of the context parameter, ensuring that the prompt is correctly constructed and sent to the machine learning model.

Do you have any additional context?
META Issue](#2839)
[RFC for ML Inference Processors] #2173

The text was updated successfully, but these errors were encountered:

mingshl · 2024-09-05T17:33:35Z

This change is implemented in http connector, so it will help with all model during prediction task with a list/array.

mingshl added enhancement New feature or request untriaged 2.17 and removed untriaged labels Sep 3, 2024

mingshl mentioned this issue Sep 3, 2024

Fix custom prompt substitute with List issue #2871

Merged

5 tasks

mingshl changed the title ~~[RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays in ML Inference Search Response Processor~~ [RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays Sep 5, 2024

mingshl closed this as completed Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays #2880

[RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays #2880

mingshl commented Sep 3, 2024 •

edited

Loading

mingshl commented Sep 5, 2024

[RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays #2880

[RFC] Introducing toString() method in HTTPConnector for handling custom prompts with lists/arrays #2880

Comments

mingshl commented Sep 3, 2024 • edited Loading

Problem Statement

Solution Proposal

Example usage:

mingshl commented Sep 5, 2024

mingshl commented Sep 3, 2024 •

edited

Loading