[BUG] Fix partitioning SQL scans on empty tables #2885
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When scanning an empty SQL table, if a user specifies
num_partitions > 1
, then we getTypeError: unsupported operand type(s) for -: 'NoneType' and 'NoneType'
orFailed to get partition bounds: {self._partition_col} is not a numeric or temporal type, and cannot be used for partitioning
.These are both results of attempts to partition the scan using a column with no data. Despite the table not having rows, we attempt to fulfill the user's request to use
num_partitions > 2
, only to run into errors because there is no min and max value available compute partition range sizes on.Furthermore, in some cases SQL databases do not return type information when a table has no rows, so an empty integer column might be read as an empty string column which cannot be used for partitioning.
We fix this by capping the number of scan tasks by the number of rows in the scan table. If there are no rows, we don't attempt partitioning, and we simply generate 0 scan tasks.