Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to access stored field and scroll_id #299

Open
ale-ful opened this issue Jun 16, 2023 · 5 comments
Open

how to access stored field and scroll_id #299

ale-ful opened this issue Jun 16, 2023 · 5 comments

Comments

@ale-ful
Copy link

ale-ful commented Jun 16, 2023

Hello, I am having difficulties to pull stored field and scroll_id.

Stored field:
The field is called "text" and in Kibana I can see it is present for the index "document-000002". When specifying "text" as a value for parameter stored_fields I don't get it pulled, instead only "_index", "_type", "_id", "_score" and "_source" are present in the resulting list (first two lines of code). When I tested the line with source parameter, element "_source" was an empty list.

An exemplary record from ES, accessed via Kibana:

{
  "_index": "document-000002",
  "_type": "_doc",
  "_id": "AS_63689606",
  "_version": 1,
  "_score": 1,
  "_source": {
    "visitid": "65_63209606",
    "processingdate": "2022-08-24 17:24-0400",
    "gender": "male",
    "facility": "40998",
    "user": "JOHNDOE",
    "customer": "656"
  },
  "fields": {
    "processingdate": [
      "2022-08-24T21:24:00.000Z"
    ],
    "servicedate": [
      "2022-08-22T22:05:00.000Z"
    ],
    "text": [
      "an exemplary text I want to pull"
    ]
  }
}

Tried code:

library(elastic)
docs <- Search(c, "document-000002", size = 8, stored_fields = "text")$hits$hits
docs <- Search(c, "document-000002", size = 8, stored_fields = c("text", "servicedate"))$hits$hits
docs <- Search(c, "document-000002", size = 8, source = "text")$hits$hits

scroll_id
I would like to use scroll parameter to pull more than the default 10K documents for the same index. I see it should be possible, because:

all_docs <- Search(conn = c, index = "document-000002")
all_docs$hits$total$value
all_docs$`_scroll_id`

total hits amount to more than 8 millions. However, scroll ID is always NULL

I will appreciate any help.

ES version in use: 7.3.1 Elastic package version in use: 1.2.0

@cphaarmeyer
Copy link
Contributor

cphaarmeyer commented Jun 19, 2023

Did you try to set the time_scroll parameter of Search()?
See also https://docs.ropensci.org/elastic/articles/search.html#scrolling-search---instead-of-paging

@ale-ful
Copy link
Author

ale-ful commented Jun 19, 2023

@cphaarmeyer thank you! Specifying parameter time_scroll for Search() was sufficient to access _scroll_id.

Therefore, the working code looks like this:

all_docs <- Search(conn = c, index = "document-000002", time_scroll = "1m")
all_docs$`_scroll_id`

Do you have any thoughts on how to pull stored field?

@cphaarmeyer
Copy link
Contributor

Do you mean something like this?

docs <- Search(conn = c, index = "document-000002", size = 6, body = list(`_source` = "text"))
lapply(docs$hits$hits, function(x) x[["_source"]][[1]])

@ale-ful
Copy link
Author

ale-ful commented Jun 20, 2023

Indeed, code you propose is somehow suggested in Search() documentation, and I also tried it (although not as a part of the body). However, it doesn't work, because "text" is a stored field, not a part of "_source" (see structure of the record I pasted as a part of my question). According to documentation, it should be pulled by specifying stored_fields parameter, but it is not the case.

@cphaarmeyer
Copy link
Contributor

Oh sorry. Then I don't know. I have never seen such a setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants