Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicate content between ES indexes #85

Open
agstephens opened this issue Nov 17, 2021 · 2 comments
Open

Replicate content between ES indexes #85

agstephens opened this issue Nov 17, 2021 · 2 comments
Assignees

Comments

@agstephens
Copy link
Collaborator

agstephens commented Nov 17, 2021

To get the mapping just use the <index_name>/_mapping endpoint (e.g.: https://es14.ceda.ac.uk:9200/c3s-roocs-fix-prop/_mapping). It is worth paring this down as you will get all the default stuff in there too, You only need the mappings which are non-standard.

Loading is as simple as

from elasticsearch import Elasticsearch
import json

with open('mapping_file.json') as reader:
    mapping = json.load(reader)

index_name = 'index_name'

es = Elasticsearch()
if not es.indices.exists(index_name):
    es.indices.create(index_name, body=mapping)

You can do a cross-cluster re-index to copy the data across:

https://elasticsearch-py.readthedocs.io/en/v7.11.0/helpers.html#reindex

Note: CEDA public end-point is: https://elasticsearch.ceda.ac.uk/c3s-roocs-fix-prop/_mapping

Example Search with no body specified:
https://elasticsearch.ceda.ac.uk/c3s-roocs-fix-prop/_search

@agstephens agstephens changed the title Ask Richard what the best approach is for dumping and reloading an ES index Replicate content between ES indexes Nov 17, 2021
@cehbrecht
Copy link
Collaborator

@ellesmith88 @rsmith013 I have setup my local elasticsearch instance for testing and I want to replicate the index. I'm using the reindex operation:

curl -X POST -H 'Content-Type: application/json' -i 'http://136.172.60.76:9200/_reindex' --data '{
  "source": {
    "remote": {
      "host": "https://elasticsearch.ceda.ac.uk:443"
    },
    "index": "c3s-roocs-fix-prop"
  },
  "dest": {
    "index": "test-index-1"
  }
}'

But this fails with missing authentication:

"reason": "method [GET], host [https://elasticsearch.ceda.ac.uk:443], URI [/], status line [HTTP/1.1 401 Unauthorized]\n{\"error\":{\"root_cause\":[{\"type\":\"forbidden_response\",\"reason\":\"forbidden\",\"due_to\":\"OPERATION_NOT_ALLOWED\",\"header\":{\"WWW-Authenticate\":\"Basic\"}}],\"type\":\"forbidden_response\",\"reason\":\"forbidden\",\"due_to\":\"OPERATION_NOT_ALLOWED\",\"header\":{\"WWW-Authenticate\":\"Basic\"}},\"status\":401}"

Do I need to use the authentication? Is there another way to replicate the index remotely?

@rsmith013
Copy link

You won't be able to do a remote re-index in this way. Cluster to cluster re-indexing like this is allow-listed in the configuration to protect resources. If you want to a copy locally, you'll probably need to do a scroll query and then index it to your local instance.
https://elasticsearch-py.readthedocs.io/en/v7.11.0/helpers.html#scan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants