Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Geospatial Data Type and GIS Function Support for milvus #37417

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tasty-gumi
Copy link
Contributor

issue:#27576
pr:#35990

Main Goals

  1. Create and describe collections with geospatial fields, enabling both client and server to recognize and process geo fields.
  2. Insert geospatial data as payload values in the insert binlog, and print the values for verification.
  3. Load segments containing geospatial data into memory.
  4. Ensure query outputs can display geospatial data.
  5. Support filtering on GIS functions for geospatial columns.

Solution

  1. Add Type: Modify the Milvus core by adding a Geospatial type in both the C++ and Go code layers, defining the Geospatial data structure and the corresponding interfaces.
  2. Dependency Libraries: Introduce necessary geospatial data processing libraries. In the C++ source code, use Conan package management to include the GDAL library. In the Go source code, add the go-geom library to the go.mod file.
  3. Protocol Interface: Revise the Milvus protocol to provide mechanisms for Geospatial message serialization and deserialization.
  4. Data Pipeline: Facilitate interaction between the client and proxy using the WKT format for geospatial data. The proxy will convert all data into WKB format for downstream processing, providing column data interfaces, segment encapsulation, segment loading, payload writing, and cache block management.
  5. Query Operators: Implement simple display and support for filter queries. Initially, focus on filtering based on spatial relationships for a single column of geospatial literal values, providing parsing and execution for query expressions.
  6. Index Construction: Consider building an H3 index, utilizing the C interface provided by the H3 system.
  7. Client Modification: Enable the client to handle user input for geospatial data and facilitate end-to-end testing.Check the modification in pymilvus.

delete incomplete H3 Index development and useless generated files.
fix conanfiles in milvus conan repo so that local can fetch the packages to build libraries

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tasty-gumi
To complete the pull request process, please assign czs007 after the PR has been reviewed.
You can assign the PR to them by writing /assign @czs007 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot added size/XXL Denotes a PR that changes 1000+ lines. area/dependency Pull requests that update a dependency file area/test sig/testing test/integration integration test labels Nov 4, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/feature Issues related to feature request from users labels Nov 4, 2024
Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

rerun go-sdk

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

rerun cpp-unit-test

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@czs007
Copy link
Collaborator

czs007 commented Nov 4, 2024

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

1 similar comment
Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 4, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

return geometry;
}

inline GeneratedData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of function DataGen is changed, the parameter 'repeat_count' become the 6th parameter, this will lead to the failures of test_group_by, which pass a non-zero repeat_count but got a zero repeat count after this change. Please modify the corresponding ut case to avoid Signal 8 (SIGFPE)

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

@tasty-gumi
Copy link
Contributor Author

rerun go-sdk

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 5, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

add geospatial interface in src common

change type define and add segcore support

add storage & chunkdata support

feature: go package storage & proxy & typeutil support geospatial type in internal and typeutil in pkg

Signed-off-by: tasty-gumi <[email protected]>

add geospatial interface in src common

change type define and add segcore support

change: use wkb only in core

Signed-off-by: tasty-gumi <[email protected]>

fix:the geospatial only use std::string as FieldDataImpl template paramters && add geospatial data generation && pass chunk ,growing , sealed test

fix : merge confilcts after rebase ,test nullable not pass due to upstream

feat:basic GIS Function expr and visitor impl and GIS proto support && add:storage test of geo data

Signed-off-by: tasty-gumi <[email protected]>

feat:add proxy validate (pass httpserver test) && plan parser of geospatialfunction

fix:sealedseg && go tidy

fix:go mod

feat:can produce wkt result for pymilvus client

feat: add parser and query operator for geos filed && print geos binlog as wkt

fix:fielddataimpl interface
Signed-off-by: tasty-gumi <[email protected]>

fix: some format of code && segmentfault debug for rebase

Signed-off-by: tasty-gumi <[email protected]>

add: import util test for parquet and mix compaction test

Signed-off-by: tasty-gumi <[email protected]>

fix: delete useless file and fix error for rebase

Signed-off-by: tasty-gumi <[email protected]>

fix: git rebase for custom function feat

Signed-off-by: tasty-gumi <[email protected]>

fix:rename geospatial field && update proto && rewrite Geometry class with smart pointer

Signed-off-by: tasty-gumi <[email protected]>

add:last commit miss add files

Signed-off-by: tasty-gumi <[email protected]>

fix: geospatial name replace in test files && fix geomertry and parser

fix:remove some file change for dev

Signed-off-by: tasty-gumi <[email protected]>

fix:remove size in if && add destory in ~Geometry()

Signed-off-by: tasty-gumi <[email protected]>

add:conan file gdal rep

Signed-off-by: tasty-gumi <[email protected]>

remove:gdal fPIC

Signed-off-by: tasty-gumi <[email protected]>

fix: for rebase

Signed-off-by: tasty-gumi <[email protected]>

remove:log_warn

Signed-off-by: tasty-gumi <[email protected]>

remove:gdal shared

Signed-off-by: tasty-gumi <[email protected]>

remove:tbbproxy

Signed-off-by: tasty-gumi <[email protected]>

fix:add gdal option && update go mod

Signed-off-by: tasty-gumi <[email protected]>

dev:change some scripts

Signed-off-by: tasty-gumi <[email protected]>

remove: dev scripts

Signed-off-by: tasty-gumi <[email protected]>

add:conan files dependency of gdal

Signed-off-by: tasty-gumi <[email protected]>

fix:fmt cpp code

Signed-off-by: tasty-gumi <[email protected]>

add:delete geos-config in cmake_bulid/bin which may cause permission deny

Signed-off-by: tasty-gumi <[email protected]>

fix: add go client geometry interface && fix group by test

Signed-off-by: tasty-gumi <[email protected]>

fix: mod tidy for tests go client

Signed-off-by: tasty-gumi <[email protected]>

fix:memory leak in test and go fmt

Signed-off-by: tasty-gumi <[email protected]>

fix: datagen function remove pkoffset

Signed-off-by: tasty-gumi <[email protected]>
Copy link
Contributor

mergify bot commented Nov 6, 2024

@tasty-gumi E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

@tasty-gumi
Copy link
Contributor Author

/run-cpu-e2e

Copy link
Contributor

mergify bot commented Nov 6, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 47.17742% with 131 lines in your changes missing coverage. Please review.

Project coverage is 67.17%. Comparing base (b3de4b0) to head (c050d46).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...core/src/exec/expression/GISFunctionFilterExpr.cpp 0.00% 33 Missing ⚠️
internal/core/src/common/Geometry.h 38.29% 29 Missing ⚠️
internal/core/src/query/PlanProto.cpp 6.25% 15 Missing ⚠️
internal/core/src/common/FieldData.cpp 0.00% 14 Missing ⚠️
internal/core/src/exec/expression/Expr.cpp 0.00% 7 Missing ⚠️
internal/core/src/expr/ITypeExpr.h 0.00% 7 Missing ⚠️
internal/core/src/common/Array.h 0.00% 6 Missing ⚠️
internal/core/src/common/Types.h 16.66% 5 Missing ⚠️
...l/core/src/exec/expression/GISFunctionFilterExpr.h 0.00% 5 Missing ⚠️
internal/core/src/common/FieldDataInterface.h 87.87% 4 Missing ⚠️
... and 2 more
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #37417       +/-   ##
===========================================
- Coverage   83.25%   67.17%   -16.09%     
===========================================
  Files        1015      293      -722     
  Lines      157480    25626   -131854     
===========================================
- Hits       131116    17214   -113902     
+ Misses      21172     8412    -12760     
+ Partials     5192        0     -5192     
Components Coverage Δ
Client ∅ <ø> (∅)
Core 67.17% <47.17%> (∅)
Go ∅ <ø> (∅)
Files with missing lines Coverage Δ
internal/core/src/common/FieldData.h 100.00% <100.00%> (ø)
internal/core/src/query/PlanProto.h 18.51% <ø> (ø)
internal/core/src/segcore/ConcurrentVector.cpp 97.26% <100.00%> (ø)
internal/core/src/segcore/InsertRecord.h 83.66% <100.00%> (ø)
internal/core/src/segcore/SegmentGrowingImpl.cpp 77.60% <100.00%> (ø)
internal/core/src/segcore/SegmentSealedImpl.cpp 86.18% <100.00%> (ø)
internal/core/src/storage/Event.cpp 83.39% <100.00%> (ø)
internal/core/src/storage/Util.cpp 76.93% <100.00%> (ø)
internal/core/src/mmap/Utils.h 81.60% <84.61%> (ø)
internal/core/src/common/FieldDataInterface.h 59.10% <87.87%> (ø)
... and 10 more

... and 1288 files with indirect coverage changes

Copy link
Contributor

mergify bot commented Nov 6, 2024

@tasty-gumi cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@tasty-gumi
Copy link
Contributor Author

rerun cpp-unit-test

Copy link
Contributor

mergify bot commented Nov 6, 2024

@tasty-gumi go-sdk check failed, comment rerun go-sdk can trigger the job again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dependency Pull requests that update a dependency file area/test dco-passed DCO check passed. kind/feature Issues related to feature request from users sig/testing size/XXL Denotes a PR that changes 1000+ lines. test/integration integration test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants