VTS (short for Vector Transport Service) is an open-source tool for moving vectors and unstructured data. It is developed by Zilliz based on Apache Seatunnel.
- Meeting the Growing Data Migration Needs: VTS evolves from our Milvus Migration Service, which has successfully helped over 100 organizations migrate data between Milvus clusters. User demands have grown to include migrations from different vector databases, traditional search engines like Elasticsearch and Solr, relational databases, data warehouses, document databases, and even S3 and data lakes to Milvus.
- Supporting Real-time Data Streaming and Offline Import: As vector database capabilities expand, users require both real-time data streaming and offline batch import options.
- Simplifying Unstructured Data Transformation: Unlike traditional ETL, transforming unstructured data requires AI and model capabilities. VTS, in conjunction with the Zilliz Cloud Pipelines, enables vector embedding, tagging, and complex transformations, significantly reducing data cleaning costs and operational complexity.
- Ensuring End-to-End Data Quality: Data integration and synchronization processes are prone to data loss and inconsistencies. VTS addresses these critical data quality concerns with robust monitoring and alerting mechanisms.
Built on top of Apache Seatunnel, Vector-Transport-Service offers:
- Rich, extensible connectors
- Unified stream and batch processing for real-time synchronization and offline batch imports
- Distributed snapshot support for data consistency
- High performance, low latency, and scalability
- Real-time monitoring and visual management
Additionally, Vector-Transport-Service introduces vector-specific capabilities such as multiple data source support, schema matching, and basic data validation.
Future roadmaps include incremental sync, combined one-time migration and change data capture, and more advanced data transformation capabilities.
To learn more details about VTS used in action, read our blog:
To get started with VTS, follow the QuickStart Guide.
This guide will help you get started with how to use vts to transport vector data into milvus, currently, we support the following source connectors:
- milvus
- postgres vector
- elastic search
- pinecone
- qdrant
- tencent vectordb
1. Build the vts project
./mvnw install -Dmaven.test.skip
2. Setup the configuration file
go to ./seatunnel-example/seatunnel-examples/src/main/resources/examples, update the conf file
- milvus_to_milvus.conf
- pg_to_milvus.conf
- es_to_milvus.conf
here is an example of milvus_to_milvus.conf
env {
parallelism = 1
job.mode = "BATCH"
}
source {
Milvus {
url="https://in01-***.aws-us-west-2.vectordb.zillizcloud.com:19530"
token="***"
database="default"
collection="medium_articles"
batch_size=100
}
}
sink {
Milvus {
url="https://in01-***.aws-us-west-2.vectordb.zillizcloud.com:19542"
token="***"
database="default"
batch_size=10
}
}
3. Run examples
The example file is located at ./seatunnel-example/seatunnel-examples/src/main/java/com/zilliz/seatunnel/examples/engine/SeatunnelEngineExample.java
update the configuration file path in SeatunnelEngineExample.java, and run the example.
String configurePath = args.length > 0 ? args[0] : "/examples/****.conf";
4. Check the data in milvus
go to milvus console, check the data in the collection
In addition to the quick start guide, vts has much more powerful features like
- lots of transformer to support TabelPathMapper, FieldMapper, Embedding etc.
- cluster mode ready for production use with restful api to manage the job
- docker deploy, etc.
For detailed information, please refer to Tutorial.md
VTS supports a variety of connectors to move data between different systems.
Find Detailed documentation for each connector:
If you require any assistance or have questions regarding VTS, please feel free to reach out to our support team: Email: [email protected]
SeaTunnel is a next-generation, high-performance, distributed data integration tool, capable of synchronizing vast amounts of data daily. It's trusted by numerous companies for its efficiency and stability. It's released under Apache 2 License.
SeaTunnel is a top-level project of the Apache Software Foundation (ASF). For more information, visit the Apache Seatunnel website.