Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output validation using matching in SQL #217

Merged
merged 10 commits into from
Feb 14, 2023

Conversation

szarnyasg
Copy link
Member

@szarnyasg szarnyasg commented Aug 24, 2022

Will fix #205.

We can use the DuckDB appender to populate the tables.

Current validation scripts are in:

A lot of time is spent parsing the results back from CSVs to Java data structures, this could also be improved by using DuckDB's COPY ... FROM 'filename.csv' (DELIMITER ' ', FORMAT csv) clause.

Validation tests (that are used to test the validation rules themselves) are in:

Populating tables using the DuckDB appender and comparing WCC results

A snippet for using appenders (not sure whether it is useful):

try (DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:")) {
    Statement stmt = conn.createStatement();

    // fill 'expected' table
    stmt.execute("DROP TABLE IF EXISTS expected");
    stmt.execute("CREATE TABLE expected(v bigint not null, x double not null);");
    DuckDBAppender expectedAppender = conn.createAppender("main", "expected");
    for (long vertexId : outputGraph.getVertices()) {
        expectedAppender.beginRow();
        expectedAppender.append(vertexId);
        expectedAppender.append(outputGraph.getVertexValue(vertexId));
        expectedAppender.endRow();
    }
    expectedAppender.close();

    // fill 'actual' table
    stmt.execute("DROP TABLE IF EXISTS actual");
    stmt.execute("CREATE TABLE actual(v bigint not null, x double not null);");
    DuckDBAppender actualAppender = conn.createAppender("main", "actual");
    for (long vertexId : outputGraph.getVertices()) {
        actualAppender.beginRow();
        actualAppender.append(vertexId);
        actualAppender.append(expected result);
        actualAppender.endRow();
    }
    actualAppender.close();

    ResultSet rs = stmt.executeQuery(
            "SELECT e1.v AS v, e1.x AS x, a1.x AS x\n" +
            "FROM expected e1, actual a1\n" +
            "WHERE e1.v = a1.v -- select a node in the expected-actual tables\n" +
            "  AND EXISTS (\n" +
            "    SELECT 1\n" +
            "    FROM expected e2, actual a2\n" +
            "    WHERE e2.v = a2.v   -- another node in expected-actual tables\n" +
            "      AND e1.x = e2.x   -- where the node is in the same equivalence class in the expected table\n" +
            "      AND a1.x != a2.x  -- but not in the actual table\n" +
            "  )\n" +
            ";");
    while (rs.next()) {
        System.out.format("%ld: %ld != %ld %n", rs.getLong(1), rs.getLong(2), rs.getLong(3));
    }
    rs.close();

Handling infinity values

Handling infinity necessitates special care as multiple values should be accepted:

if (low.equals("inf") || low.equals("+inf") || low.equals("infinity") || low.equals("+infinity")) {
    return Double.POSITIVE_INFINITY;
} else if (low.equals("-inf") || low.equals("-infinity")) {
    return Double.NEGATIVE_INFINITY;
}

Validation of completeness

The validation should not only check whether the results are correct, it should also check whether all vertices are included in the result set.

@szarnyasg szarnyasg force-pushed the output-validation-using-matching-in-sql branch 2 times, most recently from df28286 to 1407616 Compare August 27, 2022 17:52
@szarnyasg szarnyasg force-pushed the output-validation-using-matching-in-sql branch from 929a105 to 57c4366 Compare August 27, 2022 20:07
@szarnyasg szarnyasg marked this pull request as ready for review January 12, 2023 13:49
@szarnyasg szarnyasg merged commit c2bca48 into main Feb 14, 2023
@szarnyasg szarnyasg deleted the output-validation-using-matching-in-sql branch February 14, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validation is slow for large graphs
1 participant