Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34897][SQL] Support reconcile schemas based on index after nested column pruning #31993

Closed
wants to merge 9 commits into from
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
import org.apache.spark.sql.catalyst.util.quoteIdentifier
import org.apache.spark.sql.errors.QueryExecutionErrors
import org.apache.spark.sql.execution.datasources.SchemaMergeUtils
import org.apache.spark.sql.internal.SQLConf.NESTED_SCHEMA_PRUNING_ENABLED
import org.apache.spark.sql.types._
import org.apache.spark.util.{ThreadUtils, Utils}

Expand Down Expand Up @@ -157,7 +158,8 @@ object OrcUtils extends Logging {
// In these cases we map the physical schema to the data schema by index.
assert(orcFieldNames.length <= dataSchema.length, "The given data schema " +
s"${dataSchema.catalogString} has less fields than the actual ORC physical schema, " +
"no idea which columns were dropped, fail to read.")
"no idea which columns were dropped, fail to read. Try to disable " +
s"${NESTED_SCHEMA_PRUNING_ENABLED.key} to workaround this issue.")
wangyum marked this conversation as resolved.
Show resolved Hide resolved
// for ORC file written by Hive, no field names
// in the physical schema, there is a need to send the
// entire dataSchema instead of required schema.
Expand Down