Inference graph becomes unavalible with no error in logs #1584

eavidan · 2020-03-23T15:29:10Z

We have created the following inference graph:
router -> transform-input => 10 X models -> combiner -> transform-output

we tarted running some benchmarks and noticed that fairlly earlly the deployment becomes unavailable with no apparent reason in the logs.

we have change the log level to DEBUG, and what we see is that the seldon-container-engine simply stops working. No error message appears.

following is our deployment definition.
all components contain minimal logic. The router always directs to one of the preprocess, the preprocess does nothing, the combiner simply takes the first model response and the postprocess also does nothing.

any direction will be highly appreciated

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: bm1
spec:
  annotations:
    project_name: bm1
    deployment_version: v1
    seldon.io/grpc-read-timeout: "100000"
  name: bm1
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: {sklearn_elasticnet_wine}
          name: model
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
          resources:
            requests:
              memory: "2Gi"
        - image: {sklearn_elasticnet_wine}
          name: model2
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image: {preprocess}
          name: mypreprocess1
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image:  {preprocess}
          name: mypreprocess2
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image:  {postprocess}
          name: mypostprocess1
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image: {router}
          name: router1
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image: {combiner}
          name: combiner1
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        nodeSelector:
          cpu: avx

        terminationGracePeriodSeconds: 20
        imagePullSecrets:
        - name: ci-model-aa-mlflow-cred
    graph:
      name: mypostprocess1
      endpoint:
        type: REST
      type: OUTPUT_TRANSFORMER
      children:
      - name: router1
        endpoint:
          type: REST
        type: ROUTER
        children:
          - name: mypreprocess1
            endpoint:
              type: REST
            type: TRANSFORMER
            children: []
          - name: mypreprocess2
            endpoint:
              type: REST
            type: TRANSFORMER
            children:
              - name: combiner1
                endpoint:
                  type: REST
                type: COMBINER
                children:
                  - name: model
                    endpoint:
                      type: REST
                    type: MODEL
                    children: []
                  - name: model2
                    endpoint:
                      type: REST
                    type: MODEL
                    children: []
    name: default    
    replicas: 1
    annotations:
      predictor_version: v1

The text was updated successfully, but these errors were encountered:

ukclivecox · 2020-03-23T15:34:21Z

Could it be related to: #1490

nickdgriffin · 2020-03-23T17:23:32Z

Smells very similar.

ukclivecox · 2020-04-02T10:31:08Z

Please reopen if still and issue

eavidan added bug triage Needs to be triaged and prioritised accordingly labels Mar 23, 2020

ukclivecox closed this as completed Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference graph becomes unavalible with no error in logs #1584

Inference graph becomes unavalible with no error in logs #1584

eavidan commented Mar 23, 2020

ukclivecox commented Mar 23, 2020

nickdgriffin commented Mar 23, 2020

ukclivecox commented Apr 2, 2020

Inference graph becomes unavalible with no error in logs #1584

Inference graph becomes unavalible with no error in logs #1584

Comments

eavidan commented Mar 23, 2020

ukclivecox commented Mar 23, 2020

nickdgriffin commented Mar 23, 2020

ukclivecox commented Apr 2, 2020