Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthcheck #133

Open
mrfuxi opened this issue Jul 25, 2017 · 20 comments
Open

Healthcheck #133

mrfuxi opened this issue Jul 25, 2017 · 20 comments

Comments

@mrfuxi
Copy link

mrfuxi commented Jul 25, 2017

Would it be possible to implement Docker's HEALTHCHECK command?

@twright-msft
Copy link
Collaborator

Yeah, now that we have sqlcmd in the image this should be much easier. I'll put it on our backlog internally. If you need something sooner, I'd suggest creating your own image FROM ours and specifying a HEALTCHECK that uses sqlcmd with -q to execute a simple query periodically like 'SELECT 1'.

@sangohan
Copy link

sangohan commented Sep 7, 2017

I too am having an issue where the docker image crashes, but continues to output lines of logging, which prevents my internal hung-build-killer tool to detect that something went wrong.

2017-09-06 09:50:11.21 spid90 Timeout occurred while waiting for latch: class 'LOG_MANAGER', id 00000005ED740BB0, type 4, Task 0x00000005EB49B088 : 0, waittime 71400 seconds, flags 0x1a, owning task 0x00000005EB492CA8. Continuing to wait.
2017-09-06 09:50:25.26 spid89 Timeout occurred while waiting for latch: class 'LOG_MANAGER', id 00000005ED740BB0, type 4, Task 0x00000005EEDE7088 : 0, waittime 71400 seconds, flags 0x1a, owning task 0x00000005EB492CA8. Continuing to wait.
2017-09-06 09:51:25.25 spid84 Timeout occurred while waiting for latch: class 'LOG_MANAGER', id 00000005ED740BB0, type 4, Task 0x00000005E800DC28 : 0, waittime 71400 seconds, flags 0x1a, owning task 0x00000005EB492CA8. Continuing to wait.

having a healthcheck mechanism in the case of this would be great.

@manicmonkey
Copy link

manicmonkey commented Nov 15, 2017

I get the same crash / error after using netcat to ascertain when the process has started listening on port 1433

@axelfontaine
Copy link

This is super useful in order to avoid having to develop custom poll and wait script to check whether the DB is up in order to be able to start other activities.

@krkabol
Copy link

krkabol commented Apr 19, 2018

Dockerfile:

FROM microsoft/mssql-server-linux
HEALTHCHECK --interval=10s --timeout=3s --start-period=10s --retries=10 \
    CMD sqlcmd -S localhost -U sa -P ${SA_PASSWORD} -Q "SELECT 1" || exit 1

@djechelon
Copy link

It should be

CMD /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P ${SA_PASSWORD} -Q "SELECT 1" || exit 1

Because sqlcmd is not on PATH

@raczeja
Copy link

raczeja commented Mar 18, 2020

Is there a simple way to use HEALTHCHECK?
I use this piece of code, but maybe you can advise something simpler?
https://github.com/ObjectivityLtd/DBTestCompare/blob/master/restoreBackup.ps1#L7
do { $JSON=docker inspect --format='{{json .State.Health}}' sqlserver-container | Out-String | ConvertFrom-Json Write-Host HEALTHCHECK: $JSON.Status.ToString() sqlserver-container Start-Sleep -s 10 } Until ($JSON.Status.ToString() -eq 'healthy')

@tinohager
Copy link

version: "3.4"
services:
    db:
        image: "mcr.microsoft.com/mssql/server:2019-latest"
        environment:
            SA_PASSWORD: "Secure-Password.1234"
            ACCEPT_EULA: "Y"
        healthcheck:
            test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "$$SA_PASSWORD" -Q "SELECT 1" || exit 1
            interval: 10s
            timeout: 3s
            retries: 10
            start_period: 10s

@mrlioncub
Copy link

mrlioncub commented Mar 11, 2021

Everything that is written above must be changed:
required with the option: -b
and I recommend using the option: -o /dev/null
in docker-compose.yaml

    healthcheck:
      test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "$${SA_PASSWORD}" -Q "SELECT 1" -b -o /dev/null
      interval: 10s
      timeout: 3s
      retries: 10
      start_period: 10s

in Dockerfile:

HEALTHCHECK --interval=10s --timeout=3s --start-period=10s --retries=10 \
  CMD /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P $${SA_PASSWORD} -Q "SELECT 1" -b -o /dev/null

@raschmitt
Copy link

I'm curious here, why do we need to have exit 1 if sqlcmd -Q already has a built-in exit?

 -Q "cmdline query" (and exit)  

https://docs.microsoft.com/en-us/sql/tools/sqlcmd-utility?view=sql-server-ver15#syntax

@mrlioncub
Copy link

Yes, exit 1 is not needed. Updated

@neur0manc
Copy link

These solution don't take into account that during database migration after an update, the database can be locked for several minutes. To the developers: Is it possible to take that into account? Does sql-server have an idiomatic way to determine that it is currently upgrading databases? Are you maybe working on a HEALTHCHECK command that is working with that scenario?

The best idea so far to work around this is to add --start-period=10m to the healthcheck.

@leomelzer
Copy link

Thanks @mrlioncub 🎉 I was searching for a solution for using the image in a GitHub Action and the healthcheck wasn't working...thankfully Google found your comment.

Passing -b -o /dev/null solved the issue for me. Sharing for future reference:

name: CI

on: push

jobs:
    test:
        name: Test
        runs-on: ubuntu-latest
        env:
            DB_USER: SA
            DB_PASSWORD: yourStrongP@ssword
            DB_NAME: your_db

        services:
            mssql:
                image: mcr.microsoft.com/mssql/server:2017-latest-ubuntu
                env:
                    ACCEPT_EULA: Y
                    SA_PASSWORD: ${{ env.DB_PASSWORD }}
                    MSSQL_PID: Express
                    DB_USER: ${{ env.DB_USER }}
                    DB_NAME: ${{ env.DB_NAME }}
                ports:
                    - 1433/tcp
                options: >-
                    --health-cmd "/opt/mssql-tools/bin/sqlcmd -U $DB_USER -P $SA_PASSWORD -Q 'select 1' -b -o /dev/null"
                    --health-interval 60s
                    --health-timeout 30s
                    --health-start-period 20s
                    --health-retries 3

        steps:
            - name: Create Database
              run: |
                  docker exec $(docker ps -alq) /opt/mssql-tools/bin/sqlcmd -U "$DB_USER" -P "$DB_PASSWORD" -Q "CREATE DATABASE $DB_NAME;"

            - uses: actions/checkout@v2

            - name: Setup Node Package Cache
              uses: actions/cache@v2
              with:
                  path: ~/.npm
                  key: ${{ runner.OS }}-node-${{ hashFiles('**/package-lock.json') }}

            - name: Install
              run: |
                  npm ci --prefer-offline --no-audit

            - name: Test
              env:
                  DB_HOST: localhost
                  DB_PORT: ${{ job.services.mssql.ports[1433] }}
                  DB_USER: ${{ env.DB_USER }}
                  DB_PASSWORD: ${{ env.DB_PASSWORD }}
                  DB_NAME: ${{ env.DB_NAME }}
              run: |
                  npm test

@YoChen
Copy link

YoChen commented May 13, 2022

yes ,I have the same problem,use these can't resolve. how to config sqlserver healthycheck?

@ErcinDedeoglu
Copy link

ErcinDedeoglu commented Mar 26, 2023

BEST PRACTICE


services:
  instance:
    container_name: sql-server-2022
    image: mcr.microsoft.com/mssql/server:2022-latest
    restart: unless-stopped
    environment:
      ACCEPT_EULA: Y
      MSSQL_SA_PASSWORD: SA-PASSWORD
      MSSQL_PID: Enterprise
    ports:
      - 2222:1433
    healthcheck:
      test: /opt/mssql-tools/bin/sqlcmd -S localhost -U sa -P "SA-PASSWORD" -Q "SELECT 1" -b -o /dev/null
      interval: 10s
      timeout: 3s
      retries: 10
      start_period: 10s
    volumes:
      - mssql:/var/opt/mssql
    networks:
      - internal

@antman2
Copy link

antman2 commented May 23, 2023

I disagree with the "best practice" of using -o /dev/null in the parameters. If you do that then the healthcheck output as seen in docker container inspect <container_id> is an empty string - which can hide clues about which part of the network/SQL stack is responsible for the failure.

If you do not include -o /dev/null in the parameters then you can see the actual error message as demonstrated in this snippet:

{
    "Health": {
        "Status": "unhealthy",
        "FailingStreak": 1,
        "Log": [
            {
                "Start": "2023-05-23T05:41:54.699717693Z",
                "End": "2023-05-23T05:41:54.827199774Z",
                "ExitCode": 1,
                "Output": "Msg 102, Level 15, State 1, Server 0999316e8300, Line 1\nIncorrect syntax near 'Error'.\n"
            }
        ]
    }
}

@she-trifork
Copy link

she-trifork commented Jan 30, 2024

Hi,
I'm curious why it's best practice to execute a command with a clear text password that can be read from anywhere on the system e.g. with ps -a. Is there really no other option to ask the mssql process if it's running and at good health? Something like pgrep -f /opt/mssql/bin/sqlservr && exit 1 || exit 0 would at least be more secure, but presumably not as reliable.

@fabiang
Copy link

fabiang commented Jan 30, 2024

Hi, I'm curious why it's best practice to execute a command with a clear text password that can be read from anywhere on the system e.g. with ps -a. Is there really no other option to ask the mssql process if it's running and at good health? Something like pgrep -f /opt/mssql/bin/sqlservr && exit 1 || exit 0 would at least be more secure, but presumably not as reliable.

You have two options here:

a) use SQLCMDPASSWORD environment variable to pass the password
b) use the new go-sqlcmd tool, which allows you to define a config file, from where it can read the password.

@clearwaterstream
Copy link

clearwaterstream commented Feb 28, 2024

since /opt/mssql-tools/bin/sqlcmd is NOT available on linux/arm64 (Mac M1, M2, etc) for mcr.microsoft.com/azure-sql-edge:latest

Can someone please provide a command that will work with whatever flavour of linux mcr.microsoft.com/azure-sql-edge:latest is using? This way no additional tools need to be installed, if possible.

I realize it's a different image, but the ask is fairly generic: how does one test for port being open, and accepting connections w/o using specific tooling such as sqlcmd

@k3n
Copy link

k3n commented Mar 13, 2024

@clearwaterstream Taking inspiration from StackOverflow [1], I was able to use this since timeout is available in the image:

healthcheck:
  test: timeout 1 bash -c 'cat < /dev/null > /dev/tcp/127.0.0.1/1433'
  1. https://stackoverflow.com/a/19866239/84473

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests