Skip to content
This repository has been archived by the owner on Jun 14, 2024. It is now read-only.

input_file_name() results change after Hyperspace is enabled #480

Open
clee704 opened this issue Jul 21, 2021 · 1 comment
Open

input_file_name() results change after Hyperspace is enabled #480

clee704 opened this issue Jul 21, 2021 · 1 comment
Labels
untriaged This is the default tag for a newly created issue

Comments

@clee704
Copy link

clee704 commented Jul 21, 2021

Describe the issue

Results change after Hyperspace is enabled.

To Reproduce

import com.microsoft.hyperspace._
import com.microsoft.hyperspace.index._

spark.range(1000).toDF("A").write.parquet("X")
val df = spark.read.parquet("X")
val hs = Hyperspace()
hs.createIndex(df, IndexConfig("myind", Seq("A"), Nil))
spark.enableHyperspace
df.filter("A = 1").withColumn("B", input_file_name()).show(false)

Expected behavior

Column B contains the source file names.

@clee704 clee704 added the untriaged This is the default tag for a newly created issue label Jul 21, 2021
@clee704
Copy link
Author

clee704 commented Jul 21, 2021

Possible fix:

  1. If index lineage is disabled: Don't apply CoveringIndex if input_file_name() is used in the query.
  2. If index lineage is enabled: Replace input_file_name() with source file paths using the file IDs.

@sezruby sezruby changed the title Results change after Hyperspace is enabled input_file_name() results change after Hyperspace is enabled Jul 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
untriaged This is the default tag for a newly created issue
Projects
None yet
Development

No branches or pull requests

1 participant