From 4efe9dc4418a7a2cdbc30de1bee6553566d1faea Mon Sep 17 00:00:00 2001 From: zhangfengcdt Date: Tue, 3 Sep 2024 11:13:37 -0700 Subject: [PATCH] [DOC] Update docs to explain the case of filtering after KNN --- docs/api/sql/NearestNeighbourSearching.md | 24 +++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/docs/api/sql/NearestNeighbourSearching.md b/docs/api/sql/NearestNeighbourSearching.md index 224c63e442..bc65777cbd 100644 --- a/docs/api/sql/NearestNeighbourSearching.md +++ b/docs/api/sql/NearestNeighbourSearching.md @@ -19,6 +19,30 @@ In case there are ties in the distance, the result will include all the tied geo spark.sedona.join.knn.includeTieBreakers=true ``` +Filter Pushdown Considerations: + +When using ST_KNN with filters applied to the resulting DataFrame, some of these filters may be pushed down to the object side of the kNN join. This means the filters will be applied to the object side reader before the kNN join is executed. If you want the filters to be applied after the kNN join, ensure that you first materialize the kNN join results and then apply the filters. + +For example, you can use the following approach: + +Scala Example: + +``` +val knnResult = knnJoinDF.cache() +val filteredResult = knnResult.filter(condition) +``` + +SQL Example: + +``` +CREATE OR REPLACE TEMP VIEW knnResult AS +SELECT * FROM ( + -- Your KNN join SQL here +) AS knnView; +CACHE TABLE knnResult; +SELECT * FROM knnResult WHERE condition; +``` + SQL Example Suppose we have two tables `QUERIES` and `OBJECTS` with the following data: