huge data set #972

bytearchive · 2019-10-19T23:13:01Z

I have a dataset of 1-10 TB of size. I want to train and serve a classification model on it.

Can seldon support this side of data?
Can seldon support data lineage, snapshotting of data if k8s pod crashes?
Any seldon-approved data management framework?
Any company using seldon in production?

is there any seldon blog post explaining non-hello-world type projects with large datasets? whitepaper? can it be done using sk-learn or pyspark on seldon?

ukclivecox · 2019-10-25T15:21:10Z

Sorry for late rely.

I have a dataset of 1-10 TB of size. I want to train and serve a classification model on it.
1. Can seldon support this side of data?

We are focused on real time APIs. If you mean a 1-10TB prediction data set then: We do allow batch requests but you would need to decide if a purely non API based solution is better for this case as you would need to handle splitting the requests into batches etc. Solutions such as Spark/Flink are designed for this and to handle errors gracefully.

2. Can seldon support data lineage, snapshotting of data if k8s pod crashes?

Not directly. This is in our roadmap to integrate into tools such as Pachyderm which focus on this.

3. Any seldon-approved data management framework?

I would suggest looking at the Kubeflow ecosystem which we are a part.

4. Any company using seldon in production?

We have a range of companies we know are running in production. We will make certain ones public in future with their consent. Or they can reply here.

is there any seldon blog post explaining non-hello-world type projects with large datasets? whitepaper? can it be done using sk-learn or pyspark on seldon?

You can join Seldon and Spark - see last weeks Spark AI Summit. https://databricks.com/session_eu19/migrating-apache-spark-ml-jobs-to-spark-tensorflow-on-kubeflow

However, again to restate offline batch is less our focus at present.

ukclivecox closed this as completed Oct 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

huge data set #972

huge data set #972

bytearchive commented Oct 19, 2019 •

edited

Loading

ukclivecox commented Oct 25, 2019

huge data set #972

huge data set #972

Comments

bytearchive commented Oct 19, 2019 • edited Loading

ukclivecox commented Oct 25, 2019

bytearchive commented Oct 19, 2019 •

edited

Loading