Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference graph becomes unavalible with no error in logs #1584

Closed
eavidan opened this issue Mar 23, 2020 · 3 comments
Closed

Inference graph becomes unavalible with no error in logs #1584

eavidan opened this issue Mar 23, 2020 · 3 comments
Labels
bug triage Needs to be triaged and prioritised accordingly

Comments

@eavidan
Copy link

eavidan commented Mar 23, 2020

We have created the following inference graph:
router -> transform-input => 10 X models -> combiner -> transform-output

we tarted running some benchmarks and noticed that fairlly earlly the deployment becomes unavailable with no apparent reason in the logs.

we have change the log level to DEBUG, and what we see is that the seldon-container-engine simply stops working. No error message appears.

following is our deployment definition.
all components contain minimal logic. The router always directs to one of the preprocess, the preprocess does nothing, the combiner simply takes the first model response and the postprocess also does nothing.

any direction will be highly appreciated

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  labels:
    app: seldon
  name: bm1
spec:
  annotations:
    project_name: bm1
    deployment_version: v1
    seldon.io/grpc-read-timeout: "100000"
  name: bm1
  predictors:
  - componentSpecs:
    - spec:
        containers:
        - image: {sklearn_elasticnet_wine}
          name: model
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
          resources:
            requests:
              memory: "2Gi"
        - image: {sklearn_elasticnet_wine}
          name: model2
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image: {preprocess}
          name: mypreprocess1
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image:  {preprocess}
          name: mypreprocess2
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image:  {postprocess}
          name: mypostprocess1
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image: {router}
          name: router1
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        - image: {combiner}
          name: combiner1
          imagePullPolicy: Always
          env:
          - name: SELDON_LOG_LEVEL
            value: "DEBUG"
        nodeSelector:
          cpu: avx

        terminationGracePeriodSeconds: 20
        imagePullSecrets:
        - name: ci-model-aa-mlflow-cred
    graph:
      name: mypostprocess1
      endpoint:
        type: REST
      type: OUTPUT_TRANSFORMER
      children:
      - name: router1
        endpoint:
          type: REST
        type: ROUTER
        children:
          - name: mypreprocess1
            endpoint:
              type: REST
            type: TRANSFORMER
            children: []
          - name: mypreprocess2
            endpoint:
              type: REST
            type: TRANSFORMER
            children:
              - name: combiner1
                endpoint:
                  type: REST
                type: COMBINER
                children:
                  - name: model
                    endpoint:
                      type: REST
                    type: MODEL
                    children: []
                  - name: model2
                    endpoint:
                      type: REST
                    type: MODEL
                    children: []
    name: default    
    replicas: 1
    annotations:
      predictor_version: v1
@eavidan eavidan added bug triage Needs to be triaged and prioritised accordingly labels Mar 23, 2020
@ukclivecox
Copy link
Contributor

Could it be related to: #1490

@nickdgriffin
Copy link

Smells very similar.

@ukclivecox
Copy link
Contributor

Please reopen if still and issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug triage Needs to be triaged and prioritised accordingly
Projects
None yet
Development

No branches or pull requests

3 participants