Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to use Docker volume for Solr files #437

Open
1 task
nichtich opened this issue Mar 20, 2024 · 8 comments
Open
1 task

Allow to use Docker volume for Solr files #437

nichtich opened this issue Mar 20, 2024 · 8 comments

Comments

@nichtich
Copy link
Collaborator

nichtich commented Mar 20, 2024

The Solr directory of databases (I think /opt/solr/server/solr) should be mountable when starting the solr container so the Solr index can be kept outside of Docker container.

@pkiraly
Copy link
Owner

pkiraly commented Mar 20, 2024

I am investigating it. The difficulty I see here is that this directory might need some configuration files populated, e.g. the configsets directory.

@nichtich
Copy link
Collaborator Author

Attention: files created with RUN in Dockerfile will get ignored when their directory is later exposed as volume, so the files need to be created when the container is started, not as part of the image.

@Phu2
Copy link
Contributor

Phu2 commented Apr 23, 2024

Currently, all Solr data is lost when the container is stopped and removed, eg. by executing docker compose down

@Phu2
Copy link
Contributor

Phu2 commented Apr 23, 2024

How about using a separate container for Solr from the official image rather than manually install it in the same container? This way it is easier to persist data on the host. The docker-compose.yml would look like:

# Used to start the base image with `docker compose up -d`

version: '2'

services:
  solr:
    image: solr:8.11.3
    ports:
      - "8983:8983"
    volumes:
      # Create directory up front with the right permissions, eg.: mkdir ./solr && chown -R 8983:8983 ./solr
      - ./solr:/var/solr
    healthcheck:
      test:
        [
          "CMD-SHELL",
          "curl -s http://localhost:8983",
        ]
      interval: 10s
      timeout: 10s
      retries: 120
    restart: on-failure
    networks:
      - qa-catalogue-backend

  app:
    depends_on:
      solr:
        condition: service_healthy
    container_name: metadata-qa-marc
    # image: ${IMAGE:-pkiraly/metadata-qa-marc:0.7.0}
    # image: ghcr.io/pkiraly/qa-catalogue:main
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - ./${INPUT:-input}:/opt/qa-catalogue/input
      - ./${OUTPUT:-output}:/opt/qa-catalogue/output
      - ./catalogues:/opt/qa-catalogue/catalogues
      - ./${WEBBCONFIG:-web-config}:/var/www/html/qa-catalogue/config
    ports:
      - "${WEBPORT:-8000}:80"       # qa-catalogue-web
      # - "${SOLRPORT:-8983}:8983"  # Solr
    networks:
      - qa-catalogue-backend

networks:
  qa-catalogue-backend:
    name: qa-catalogue-backend
    external: true

However, the indexing scripts have to be adapted in order to talk to the Solr container (not localhost).

@nichtich
Copy link
Collaborator Author

Using the official Solr Docker image would also decrease image sizes and thus speed up build process.

@pkiraly
Copy link
Owner

pkiraly commented Sep 8, 2024

@nichtich @Phu2 Many thanks for the ideas and code! I've started to implement it. I will ping you again when it will be testable at least in a branch. I think there will be 3 docker container:

  • solr
  • the command line interface (backend)
  • the web UI (frontend) - based on a dedicated php image: php:8.1-apache)

We also need a volume that is shared between the last two containers, and we should add environment variables of the URLs of the components

@pkiraly pkiraly self-assigned this Sep 11, 2024
@pkiraly
Copy link
Owner

pkiraly commented Sep 12, 2024

@nichtich @Phu2 I run into a problem and I ask your opinion about that.
there are two methods to create a new Solr core (index):

  • use the command line tool, such as bin/solr create_core -c my_core
  • use URL (admin/cores?action=CREATE&name=my_core&instanceDir=path/to/dir&config=solrconfig.xml&dataDir=data) see documentation

QA catalogue so far utilizes the later method, but it does not specify instanceDir, config and dataDir parameters - they are the default. In s standard Solr installation the [solr base dir]/server/solr directory is the location where indices take place. It contains the individual index directories, and a preconfigured configsets directory, that contains configuration file templates. When Solr create a new core the configSet parameter could be used to specify the template, which is actually a subdirectory of the configsets (the default is called _default). The [solr base dir]/server/solr is the Solr home directory. In the Solr Dockerfile SOLR_HOME is specified as var/solr/data, an empty directory.

If we try to create a new core with the API, it thows an error message and it does not complete successfully:

SolrCore 'qa-catalogue_1' is not available due to init failure: Could not load configuration from directory /var/solr/data/configsets/_default

I can see several possible solutions:

  1. instead of using API, the tools should use the command line method. But it is a bit complicated, because inside a docker dontainer we have to call a command of another docker container, so the source should be able to run the docker command, and it should know the name of the target container's name.
  2. as a step in image creation process we should copy /opt/solr/server/solr/configsets to /var/solr/data/. The documentation says that there is a /docker-entrypoint-initdb.d directory that can be mounted from outside, and we can add something in there.
  3. we can provide configsets via the tool, and ask users to add it to the mounted directory.

The questions:

  • have you every run into this problem?
  • what solution you see as optimal; My choice would be 2), but maybe you know a better solution.

@pkiraly
Copy link
Owner

pkiraly commented Sep 12, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants