You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a dataset of 1-10 TB of size. I want to train and serve a classification model on it.
Can seldon support this side of data?
Can seldon support data lineage, snapshotting of data if k8s pod crashes?
Any seldon-approved data management framework?
Any company using seldon in production?
is there any seldon blog post explaining non-hello-world type projects with large datasets? whitepaper? can it be done using sk-learn or pyspark on seldon?
The text was updated successfully, but these errors were encountered:
I have a dataset of 1-10 TB of size. I want to train and serve a classification model on it.
1. Can seldon support this side of data?
We are focused on real time APIs. If you mean a 1-10TB prediction data set then: We do allow batch requests but you would need to decide if a purely non API based solution is better for this case as you would need to handle splitting the requests into batches etc. Solutions such as Spark/Flink are designed for this and to handle errors gracefully.
2. Can seldon support data lineage, snapshotting of data if k8s pod crashes?
Not directly. This is in our roadmap to integrate into tools such as Pachyderm which focus on this.
3. Any seldon-approved data management framework?
I would suggest looking at the Kubeflow ecosystem which we are a part.
4. Any company using seldon in production?
We have a range of companies we know are running in production. We will make certain ones public in future with their consent. Or they can reply here.
is there any seldon blog post explaining non-hello-world type projects with large datasets? whitepaper? can it be done using sk-learn or pyspark on seldon?
I have a dataset of 1-10 TB of size. I want to train and serve a classification model on it.
is there any seldon blog post explaining non-hello-world type projects with large datasets? whitepaper? can it be done using sk-learn or pyspark on seldon?
The text was updated successfully, but these errors were encountered: