Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Provide context on Inner hit of multi vector to aid highlighting/debug use cases #1447

Closed
heemin32 opened this issue Feb 2, 2024 · 9 comments

Comments

@heemin32
Copy link
Collaborator

heemin32 commented Feb 2, 2024

As a follow up item on #1065, I would like to see inner hit of nested field so that I know which item inside nested fields are matched.

Create KNN index

PUT /my-knn-index-1
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "nested_field": {
        "type": "nested",
        "properties": {
          "my_vector": {
            "type": "knn_vector",
            "dimension": 3,
            "method": {
              "name": "hnsw",
              "space_type": "l2",
              "engine": "lucene",
              "parameters": {
                "ef_construction": 100,
                "m": 16
              }
            }
          }
        }
      }
    }
  }
}

Insert Data

PUT /_bulk?refresh=true
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{"parking": false, "nested_field":[{"my_vector":[1,1,1]},{"my_vector":[2,2,2]},{"my_vector":[3,3,3]}]}
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{"parking": true, "nested_field":[{"my_vector":[10,10,10]},{"my_vector":[20,20,20]},{"my_vector":[30,30,30]}]}
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{"parking": true, "nested_field":[{"my_vector":[100,100,100]},{"my_vector":[200,200,200]},{"my_vector":[300,300,300]}]}

Query

GET /my-knn-index-1/_search
{
	"query": {
		"nested": {
			"path": "nested_field",
			"query": {
				"knn": {
					"nested_field.my_vector": {
						"vector": [
							1,
							1,
							1
						],
						"k": 2
					}
				}
			},
			"inner_hits": {}
		}
	}
}

Response

It returns two nearest values among entire nested values, but not best one for each document.

{
	"took": 9,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 2,
			"relation": "eq"
		},
		"max_score": 1.0,
		"hits": [
			{
				"_index": "my-knn-index-1",
				"_id": "1",
				"_score": 1.0,
				"_source": {
					"parking": false,
					"nested_field": [
						{
							"my_vector": [
								1,
								1,
								1
							]
						},
						{
							"my_vector": [
								2,
								2,
								2
							]
						},
						{
							"my_vector": [
								3,
								3,
								3
							]
						}
					]
				},
				"inner_hits": {
					"nested_field": {
						"hits": {
							"total": {
								"value": 2,
								"relation": "eq"
							},
							"max_score": 1.0,
							"hits": [
								{
									"_index": "my-knn-index-1",
									"_id": "1",
									"_nested": {
										"field": "nested_field",
										"offset": 0
									},
									"_score": 1.0,
									"_source": {
										"my_vector": [
											1,
											1,
											1
										]
									}
								},
								{
									"_index": "my-knn-index-1",
									"_id": "1",
									"_nested": {
										"field": "nested_field",
										"offset": 1
									},
									"_score": 0.25,
									"_source": {
										"my_vector": [
											2,
											2,
											2
										]
									}
								}
							]
						}
					}
				}
			},
			{
				"_index": "my-knn-index-1",
				"_id": "3",
				"_score": 3.400898E-5,
				"_source": {
					"parking": true,
					"nested_field": [
						{
							"my_vector": [
								100,
								100,
								100
							]
						},
						{
							"my_vector": [
								200,
								200,
								200
							]
						},
						{
							"my_vector": [
								300,
								300,
								300
							]
						}
					]
				},
				"inner_hits": {
					"nested_field": {
						"hits": {
							"total": {
								"value": 0,
								"relation": "eq"
							},
							"max_score": null,
							"hits": []
						}
					}
				}
			}
		]
	}
}
@navneet1v
Copy link
Collaborator

navneet1v commented Feb 6, 2024

@heemin32 were you able to figure out why the query with filter is not returning inner hits?

Plus is this consistent with Faiss engine too?

@heemin32
Copy link
Collaborator Author

heemin32 commented Feb 6, 2024

Same for both Faiss and Lucene. Wasn't able to find the why.

@navneet1v
Copy link
Collaborator

@heemin32 its pretty interesting that inner hits is working for standard search and not for filtered vector search even when we run almost same code in both the places.

@abdulacs
Copy link

@navneet1v We faced the same issue for our use case and due to this we were unable to use on a query with filter

@vamshin
Copy link
Member

vamshin commented May 14, 2024

@heemin32 looks like we are addressing two different issues on this thread. 1) ability to know which inner hit contributed the score for the document and 2)Filter seem to be broken with inner hits. Should we create separate issues for these?

@heemin32
Copy link
Collaborator Author

@vamshin. That is correct. However, in terms of feature completeness, wouldn't it be better to resolve them both together?

@vamshin
Copy link
Member

vamshin commented May 15, 2024

@heemin32 agree both need to be resolved. Just that it might be hard to discover for folks seeing issues with filter and inner hits. Not a major concern.

@heemin32
Copy link
Collaborator Author

@vamshin After some deep dive, it seems they are two completely separate issue in terms of implementation. Let me create a new issue for innerHit with filter and address them one by one.

@heemin32 heemin32 changed the title [FEATURE] Inner hit with nested field [FEATURE] Inner hit of multi vector May 20, 2024
@vamshin vamshin changed the title [FEATURE] Inner hit of multi vector [FEATURE] Provide context on Inner hit of multi vector to aid highlighting/debug use cases May 28, 2024
@heemin32
Copy link
Collaborator Author

Closing in favor of opensearch-project/OpenSearch#13903

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 2.15.0 (Release window opens on June 10th, 2024 and closes on June 25th, 2024)
Status: Done
Development

No branches or pull requests

5 participants