NullPointerException when using VectorizedArrowReader to read a null column #10275

slessard · 2024-05-06T16:12:16Z

Apache Iceberg version

1.5.1 (latest release)

Query engine

Other

Please describe the bug 🐞

I am writing a compatibility layer for Teradata so that it can access Iceberg tables stored in AWS S3. I am experiencing what at first glance appears to be a bug in Iceberg, but I'd like to get the opinion of the experts here. To be clear I am using Apache Iceberg 1.5.1 and Apache Arrow 15.0.0.

The problem is I am getting a NullPointerException thrown from GenericArrowVectorFactory.java line 224. The NPE is thrown on line 224 because vector is null.

    throw new UnsupportedOperationException("Unsupported vector: " + vector.getClass());

How do I get to this point? Here's the minimal test case:

Prerequisite:

create table otf920ath (
	a INT NOT NULL,
	b string(10),
	c decimal(12, 3)
)
LOCATION 's3://*******************'
TBLPROPERTIES ('table_type' = 'ICEBERG');

INSERT INTO otf920ath values (1, 'san diego', 1024.025);

ALTER TABLE otf920ath
  ADD COLUMNS (a1 int);

repro:

select * from otf920ath;

The above SQL select statement works in AWS Athena, but fails in my code. My code is using an instance of org.apache.iceberg.arrow.vectorized.ArrowReader$VectorizedCombinedScanIterator

The cause, as I see it, is that the one row in the table contains only three columns worth of data, but the current table schema defines four columns. Because of this difference in schemas Iceberg creates the following four readers, once for each column respectively:
VecorizedArrowReader corresponding to column a
VecorizedArrowReader corresponding to column b
VecorizedArrowReader corresponding to column c
VecorizedArrowReader$NullVectorReader corresponding to column a1

Naturally the VecorizedArrowReader$NullVectorReader instance contains a null value for the vector. This instance is assigned at VectorizedReaderBuilder.java line 100.

Continuing down the code path Iceberg calls GenericArrowVectorAccessorFactory.getPlainVectorAccessor. This method checks to see whether vector is an instance of various *Vector types. Because vector has a value of null it is not an instance of any type. Thus this method ends up in its ultimate fallback case and tries to throw an exception:

throw new UnsupportedOperationException("Unsupported vector: " + vector.getClass());

The problem is that vector is null and this calling vector.getClass() throws a NullPointerException.

The stack trace is:

java.lang.NullPointerException
	at org.apache.iceberg.arrow.vectorized.GenericArrowVectorAccessorFactory.getPlainVectorAccessor(GenericArrowVectorAccessorFactory.java:224)
	at org.apache.iceberg.arrow.vectorized.GenericArrowVectorAccessorFactory.getVectorAccessor(GenericArrowVectorAccessorFactory.java:110)
	at org.apache.iceberg.arrow.vectorized.ArrowVectorAccessors.getVectorAccessor(ArrowVectorAccessors.java:54)
	at org.apache.iceberg.arrow.vectorized.ColumnVector.getVectorAccessor(ColumnVector.java:136)
	at org.apache.iceberg.arrow.vectorized.ColumnVector.<init>(ColumnVector.java:56)
	at org.apache.iceberg.arrow.vectorized.ArrowBatchReader.read(ArrowBatchReader.java:54)
	at org.apache.iceberg.arrow.vectorized.ArrowBatchReader.read(ArrowBatchReader.java:29)
	at org.apache.iceberg.parquet.VectorizedParquetReader$FileIterator.next(VectorizedParquetReader.java:149)
	at org.apache.iceberg.arrow.vectorized.ArrowReader$VectorizedCombinedScanIterator.next(ArrowReader.java:314)
	at org.apache.iceberg.arrow.vectorized.ArrowReader$VectorizedCombinedScanIterator.next(ArrowReader.java:190)

So my questions:

Is it possible that this is a bug in Iceberg?
If so, is the fix simply to handle the null value for vector when building the message for the UnsupportedOperationException?
If not, is there some other code path or method arguments I should be using?

p.s. I asked this question in the Slack channel but didn't get any traction. https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1714676216273989

The text was updated successfully, but these errors were encountered:

Fix NullPointerException when trying to add the vector's class name to the message for an UnsupportedOperationException

This test more closely follows the reproduction steps described in issue apache#10275

github-actions · 2024-11-06T00:14:52Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

slessard added the bug Something isn't working label May 6, 2024

slessard pushed a commit to slessard/iceberg that referenced this issue May 8, 2024

apache#10275 - fix NullPointerException

ac6440a

Fix NullPointerException when trying to add the vector's class name to the message for an UnsupportedOperationException

slessard mentioned this issue May 8, 2024

#10275 - fix NullPointerException #10284

Closed

slessard pushed a commit to slessard/iceberg that referenced this issue May 29, 2024

Issue apache#10275 - Fix NullPointerException

e3d747f

slessard pushed a commit to slessard/iceberg that referenced this issue May 29, 2024

Issue apache#10275 - Fix NullPointerException

9c628f6

slessard pushed a commit to slessard/iceberg that referenced this issue Jun 11, 2024

Add new unit test

12bc3de

This test more closely follows the reproduction steps described in issue apache#10275

slessard linked a pull request Oct 2, 2024 that will close this issue

Arrow: add support for null vectors #10953

Open

github-actions bot added the stale label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException when using VectorizedArrowReader to read a null column #10275

NullPointerException when using VectorizedArrowReader to read a null column #10275

slessard commented May 6, 2024 •

edited

Loading

github-actions bot commented Nov 6, 2024

NullPointerException when using VectorizedArrowReader to read a null column #10275

NullPointerException when using VectorizedArrowReader to read a null column #10275

Comments

slessard commented May 6, 2024 • edited Loading

Apache Iceberg version

Query engine

Please describe the bug 🐞

github-actions bot commented Nov 6, 2024

slessard commented May 6, 2024 •

edited

Loading