Skip to content

Commit

Permalink
Merge branch '8.x' into bp-113749
Browse files Browse the repository at this point in the history
  • Loading branch information
elasticmachine authored Oct 3, 2024
2 parents 211ad0f + d4746b5 commit c032c21
Show file tree
Hide file tree
Showing 4 changed files with 384 additions and 249 deletions.
22 changes: 21 additions & 1 deletion docs/reference/mapping.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,29 @@ reindexing. You can use runtime fields in conjunction with indexed fields to
balance resource usage and performance. Your index will be smaller, but with
slower search performance.

[discrete]
[[mapping-manage-update]]
== Managing and updating mappings

Explicit mappings should be defined at index creation for fields you know in advance.
You can still add _new fields_ to mappings at any time, as your data evolves.

Use the <<indices-put-mapping,Update mapping API>> to update an existing mapping.

In most cases, you can't change mappings for fields that are already mapped.
These changes require <<docs-reindex,reindexing>>.

However, you can _update_ mappings under certain conditions:

* You can add new fields to an existing mapping at any time, explicitly or dynamically.
* You can add new <<multi-fields,multi-fields>> for existing fields.
** Documents indexed before the mapping update will not have values for the new multi-fields until they are updated or reindexed. Documents indexed after the mapping change will automatically have values for the new multi-fields.
* Some <<mapping-params,mapping parameters>> can be updated for existing fields of certain <<mapping-types,data types>>.

[discrete]
[[mapping-limit-settings]]
== Settings to prevent mapping explosion
== Prevent mapping explosions

Defining too many fields in an index can lead to a mapping explosion, which can
cause out of memory errors and difficult situations to recover from.

Expand Down
200 changes: 0 additions & 200 deletions docs/reference/query-dsl/semantic-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,209 +40,9 @@ The `semantic_text` field to perform the query on.
(Required, string)
The query text to be searched for on the field.

`inner_hits`::
(Optional, object)
Retrieves the specific passages that match the query.
See <<semantic-query-passage-ranking, passage ranking with the `semantic` query>> for more information.
+
.Properties of `inner_hits`
[%collapsible%open]
====
`from`::
(Optional, integer)
The offset from the first matching passage to fetch.
Used to paginate through the passages.
Defaults to `0`.
`size`::
(Optional, integer)
The maximum number of matching passages to return.
Defaults to `3`.
====

Refer to <<semantic-search-semantic-text,this tutorial>> to learn more about semantic search using `semantic_text` and `semantic` query.

[discrete]
[[semantic-query-passage-ranking]]
==== Passage ranking with the `semantic` query
The `inner_hits` parameter can be used for _passage ranking_, which allows you to determine which passages in the document best match the query.
For example, if you have a document that covers varying topics:

[source,console]
------------------------------------------------------------
POST my-index/_doc/lake_tahoe
{
"inference_field": [
"Lake Tahoe is the largest alpine lake in North America",
"When hiking in the area, please be on alert for bears"
]
}
------------------------------------------------------------
// TEST[skip: Requires inference endpoints]

You can use passage ranking to find the passage that best matches your query:

[source,console]
------------------------------------------------------------
GET my-index/_search
{
"query": {
"semantic": {
"field": "inference_field",
"query": "mountain lake",
"inner_hits": { }
}
}
}
------------------------------------------------------------
// TEST[skip: Requires inference endpoints]

[source,console-result]
------------------------------------------------------------
{
"took": 67,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 10.844536,
"hits": [
{
"_index": "my-index",
"_id": "lake_tahoe",
"_score": 10.844536,
"_source": {
...
},
"inner_hits": { <1>
"inference_field": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 10.844536,
"hits": [
{
"_index": "my-index",
"_id": "lake_tahoe",
"_nested": {
"field": "inference_field.inference.chunks",
"offset": 0
},
"_score": 10.844536,
"_source": {
"text": "Lake Tahoe is the largest alpine lake in North America"
}
},
{
"_index": "my-index",
"_id": "lake_tahoe",
"_nested": {
"field": "inference_field.inference.chunks",
"offset": 1
},
"_score": 3.2726858,
"_source": {
"text": "When hiking in the area, please be on alert for bears"
}
}
]
}
}
}
}
]
}
}
------------------------------------------------------------
<1> Ranked passages will be returned using the <<inner-hits,`inner_hits` response format>>, with `<inner_hits_name>` set to the `semantic_text` field name.

By default, the top three matching passages will be returned.
You can use the `size` parameter to control the number of passages returned and the `from` parameter to page through the matching passages:

[source,console]
------------------------------------------------------------
GET my-index/_search
{
"query": {
"semantic": {
"field": "inference_field",
"query": "mountain lake",
"inner_hits": {
"from": 1,
"size": 1
}
}
}
}
------------------------------------------------------------
// TEST[skip: Requires inference endpoints]

[source,console-result]
------------------------------------------------------------
{
"took": 42,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 10.844536,
"hits": [
{
"_index": "my-index",
"_id": "lake_tahoe",
"_score": 10.844536,
"_source": {
...
},
"inner_hits": {
"inference_field": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 10.844536,
"hits": [
{
"_index": "my-index",
"_id": "lake_tahoe",
"_nested": {
"field": "inference_field.inference.chunks",
"offset": 1
},
"_score": 3.2726858,
"_source": {
"text": "When hiking in the area, please be on alert for bears"
}
}
]
}
}
}
}
]
}
}
------------------------------------------------------------

[discrete]
[[hybrid-search-semantic]]
==== Hybrid search with the `semantic` query
Expand Down
Loading

0 comments on commit c032c21

Please sign in to comment.