[BUG] Nested field missing sub embedding field will cause the IndexOutOfBoundsException #909

wdongyu · 2024-09-18T04:28:16Z

What is the bug?

Currently, if a nested field nestedField with two sub field textField and textFieldNotForEmbedding, and a doc like:

{
  "nestedField": [
    {
      "textFieldNotForEmbedding": "This is a text field"
    },
    {
      "textField": "This is another text field"
    }
  ]
}

applying pipeline configuration:

{
  "description": ...,
  "processors": [
    {
      "text_embedding": {
        "model_id": ...,
        "field_map": {
          "nestedField": {
              "textField": "vectorField"
          }
        }
      }
    }
  ]
}

Result:

java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1

Expected result:

{
  "nestedField": [
    {
      "textFieldNotForEmbedding": "This is a text field"
    },
    {
      "textField": "This is another text field"
      "vectorField": [...]
    }
  ]
}

How can one reproduce the bug?

Change the mock doc in test code:

// textField --> textFieldNotForEmbedding

nestedObj1.put("textFieldNotForEmbedding", "This is a text field")

In unit test, get the nlpResult from textEmbeddingProcessor.buildNLPResult, then set it to each field in original doc just like the embedding processor actually does:

Map<String, Object> nlpResult = textEmbeddingProcessor.buildNLPResult(
            knnMap,
            modelTensorList,
            ingestDocument.getSourceAndMetadata()
        );
nlpResult.forEach(ingestDocument::setFieldValue);

And run the unit test with:

./gradlew ':test' --tests "org.opensearch.neuralsearch.processor.TextEmbeddingProcessorTests.testBuildVectorOutput_withNestedList_successful"

What is the expected behavior?

Mentioned above.

What is your host/environment?

Operating system, version.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

Add any other context about the problem.

The text was updated successfully, but these errors were encountered:

wdongyu · 2024-09-18T04:39:11Z

When setting fields in IngestDocument, the processor should check if the textField is not empty :

// build nlp output for list of nested objects
for (Map<String, Object> nestedElement : (List<Map<String, Object>>) sourceAndMetadataMap.get(processorKey)) {
     // Adding non empty check for inputNestedMapEntry, Only fill in embedding when value is not null
     // if (inputNestedMapEntry.getValue().get(index) != null)
     nestedElement.put(inputNestedMapEntry.getKey(), results.get(indexWrapper.index++));
}

willing to open a pr if confirmed.

vibrantvarun · 2024-09-19T18:55:12Z

@martin-gaievski I think it is a bug and needs to be fixed. Can you once confirm?

martin-gaievski · 2024-09-19T18:56:22Z

yup, looks like it, we need a fix for it. @wdongyu is this on latest main and 2.x, or specific to a certain 2.x version?

wdongyu · 2024-09-20T03:32:55Z

@martin-gaievski I test on latest main, it should exist in all versions

wdongyu added bug Something isn't working untriaged labels Sep 18, 2024

wdongyu mentioned this issue Sep 20, 2024

Fix nested field missing sub embedding field #913

Merged

5 tasks

martin-gaievski added v2.18.0 and removed untriaged labels Sep 28, 2024

yuye-aws assigned wdongyu Oct 4, 2024

martin-gaievski closed this as completed in #913 Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Nested field missing sub embedding field will cause the IndexOutOfBoundsException #909

[BUG] Nested field missing sub embedding field will cause the IndexOutOfBoundsException #909

wdongyu commented Sep 18, 2024

wdongyu commented Sep 18, 2024

vibrantvarun commented Sep 19, 2024

martin-gaievski commented Sep 19, 2024

wdongyu commented Sep 20, 2024

[BUG] Nested field missing sub embedding field will cause the IndexOutOfBoundsException #909

[BUG] Nested field missing sub embedding field will cause the IndexOutOfBoundsException #909

Comments

wdongyu commented Sep 18, 2024

What is the bug?

How can one reproduce the bug?

What is the expected behavior?

What is your host/environment?

Do you have any screenshots?

Do you have any additional context?

wdongyu commented Sep 18, 2024

vibrantvarun commented Sep 19, 2024

martin-gaievski commented Sep 19, 2024

wdongyu commented Sep 20, 2024