Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procedure to compute table stats #10986

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

karuppayya
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the spark label Aug 22, 2024
@karuppayya
Copy link
Contributor Author

@aokolnychyi @szehon-ho Can you help review this please

sql("CREATE TABLE %s (id bigint NOT NULL, data string) USING iceberg", tableName);
List<Object[]> result =
sql("CALL %s.system.compute_table_stats('%s')", catalogName, tableIdent);
Assertions.assertTrue(result.isEmpty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assertions.assertTrue(result.isEmpty());
assertThat(result).isEmpty();

please use the AssertJ assertions and we generally try to avoid ussing assertTrue/assertFalse as it doesn't provide enough context about the actual/expected when an assertion fails

sql(
"CALL %s.system.compute_table_stats(table => '%s', columns => array('id'))",
catalogName, tableIdent);
Assertions.assertNotNull(output.get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assertions.assertNotNull(output.get(0));
assertThat(output.get(0)).isNotNull();

should this maybe also assert some more details here?

tableName);
sql("INSERT INTO TABLE %s VALUES (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')", tableName);
IllegalArgumentException illegalArgumentException =
Assertions.assertThrows(
Copy link
Contributor

@nastra nastra Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use AssertJ's assertThatThrownBy(..).isInstanceOf(..).hasMessage(..) here and in the other places in this test. There are a bunch of examples in the codebase for this that you can use as a reference

IllegalArgumentException.class,
() ->
sql(
"CALL %s.system.compute_table_stats(table => '%s', snapshot_id => %dL)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also add a test with an invalid/non-existing table name

sql(
"CALL %s.system.compute_table_stats('%s', %dL)",
catalogName, tableIdent, snapshot.snapshotId());
Assertions.assertNotNull(output.get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@aokolnychyi
Copy link
Contributor

I will take a look at the partition stats PR first by @ajantha-bhat. I want to understand if we want a single analyze procedure or different procedures for table and partition stats.

sql(
"CALL %s.system.compute_table_stats('%s', %dL)",
catalogName, tableIdent, snapshot.snapshotId());
assertThat(output.get(0)).isNotNull();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to check whether the output has some valid results here and in the other test

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I left a few comments around testing that would be good to address

@karuppayya
Copy link
Contributor Author

@aokolnychyi Can this merged now?

}

@Override
@SuppressWarnings("checkstyle:CyclomaticComplexity")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why is this needed?

@SuppressWarnings("checkstyle:CyclomaticComplexity")
public InternalRow[] call(InternalRow args) {
ProcedureInput input = new ProcedureInput(spark(), tableCatalog(), PARAMETERS, args);
Identifier tableIdent = toIdentifier(args.getString(0), PARAMETERS[0].name());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant we use input.ident()?

ProcedureInput input = new ProcedureInput(spark(), tableCatalog(), PARAMETERS, args);
Identifier tableIdent = toIdentifier(args.getString(0), PARAMETERS[0].name());

Long snapshotId = input.isProvided(SNAPSHOT_ID_PARAM) ? input.asLong(SNAPSHOT_ID_PARAM) : null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should use input API that has default value?

sql(
"CREATE TABLE %s (id bigint NOT NULL, data string) USING iceberg PARTITIONED BY (data)",
tableName);
sql("INSERT INTO TABLE %s VALUES (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')", tableName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could we remove the insert to make test shorter? (on next test as well)

krajendran4 added 4 commits November 11, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants