Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Spark action to compute the partition stats #73

Closed
wants to merge 1 commit into from
Closed

Conversation

ajantha-bhat
Copy link
Owner

No description provided.

Broadcast<Table> tableBroadcast = sparkContext.broadcast(serializableTable);
int numShufflePartitions = spark.sessionState().conf().numShufflePartitions();

return manifestBeanDS(table, null, numShufflePartitions)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aokolnychyi,

I am working on this PR to compute partition stats via spark action. (DRAFT)

I have a problem with Spark encoding with beans.
In BaseSparkAction, I have added partitionEntryDS similar to contentFileDS.
ReadManifestForPartitionStats.call() properly creates a new object filled with contents PartitionEntryBean.
But when I collect the partitionEntryDS, I will have empty PartitionEntryBean objects.
So, I am guessing codegen is using default constructor (empty objects) instead of the actual constructor.
When I remove default constructor from the bean class, I get the below error.

File 'generated.java', Line 24, Column 8: No applicable constructor/method found for zero actual parameters; candidates are: "org.apache.iceberg.spark.actions.PartitionEntryBean(org.apache.iceberg.FileContent, int, org.apache.iceberg.StructLike, long, long, long, long)"

Do you have any suggestions for me? I am not aware of Spark in depth and not finding docs or blog explaining this.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried having simple two siring members for PartitionEntryBean. But still I get the result as null after I collect partitionEntryDS. It does make a new object with proper contents. But when I collect, it is null 🤔

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind. I figured it out. getter setter naming conventions has to be standard for Scala codegen to pick it up. I used iceberg conventions. So, it was unable to detect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant