Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Update docs to explain the case of filtering after KNN #1575

Merged
merged 1 commit into from
Sep 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions docs/api/sql/NearestNeighbourSearching.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,30 @@ In case there are ties in the distance, the result will include all the tied geo
spark.sedona.join.knn.includeTieBreakers=true
```

Filter Pushdown Considerations:

When using ST_KNN with filters applied to the resulting DataFrame, some of these filters may be pushed down to the object side of the kNN join. This means the filters will be applied to the object side reader before the kNN join is executed. If you want the filters to be applied after the kNN join, ensure that you first materialize the kNN join results and then apply the filters.

For example, you can use the following approach:

Scala Example:

```
val knnResult = knnJoinDF.cache()
val filteredResult = knnResult.filter(condition)
```

SQL Example:

```
CREATE OR REPLACE TEMP VIEW knnResult AS
SELECT * FROM (
-- Your KNN join SQL here
) AS knnView;
CACHE TABLE knnResult;
SELECT * FROM knnResult WHERE condition;
```

SQL Example

Suppose we have two tables `QUERIES` and `OBJECTS` with the following data:
Expand Down
Loading