Make the ambiguous check the same as in runtime #21

jongleb · 2024-09-06T20:04:45Z

Description

This PR adds the missing ambiguous checks for WHERE, ORDER BY, GROUP BY and HAVING for some cases which were not previously supported

One of these cases (that are reflected in tests) is:

DDL:

CREATE TABLE test21 (id INT, column_a TEXT, column_b BOOL);
CREATE TABLE test22 (id INT, column_d INT);

Queries

First

SELECT test21.id 
FROM test21 JOIN test22 ON test21.id = test22.id 
WHERE id > 2 ORDER BY id

Second

SELECT test21.id 
FROM test21 JOIN test22 ON test21.id = test22.id 
ORDER BY id

The first one fails unlike the second one, because ORDER BY, HAVING, GROUP BY allow have column without explicit referring to source if it's specified in SELECT when ambiguous fields are present.

Ambiguous Column Rule:

I have empirically tested all cases across three databases: MySQL, PostgreSQL and SQLite. Here are the main tests that are designed to be false positives (i.e., cases that should fail) along with their corresponding Playground links:

Testing on PostgreSQL
Testing on MySQL
Testing on SQLite

Comparison of versions

DDL

CREATE TABLE test21 (id INT, column_a TEXT, column_b BOOL);
CREATE TABLE test22 (id INT, column_d INT);
CREATE TABLE test2 (id INT, str TEXT);
CREATE TABLE test (id INT, str TEXT, name TEXT);

Ambiguous error - 🔴
No errors - 🟢
Warnings - 🟡
Syntax error - ❓

Query	Before this PR Branch	Since this PR	Mysq	Pgsql	Sqlite	Ambiguous reason/expression	Playground
SELECT test21.id FROM test21 JOIN test22 ON test21.id = test22.id WHERE id > 2;	🟢	🔴	🔴	🔴	🔴	`WHERE` expression	Link
UPDATE test, (SELECT * FROM test2) AS x SET str = x.str WHERE test.id = x.id;	🟢	🔴	🔴	❓	❓	`SET` expression	Link
SELECT * FROM test21 JOIN test22 on test21.id = test22.id GROUP BY id;	🟢	🟡 It throws warning . But this warning isn't related with `GROUP BY` !	🔴	🔴	🔴	`GROUP` expression	Link
SELECT * FROM test21 JOIN test22 on test21.id = test22.id ORDER BY id;	🟢	🟡 It throws warning . But this warning isn't related with `ORDER BY` !	🔴	🔴	🔴	`ORDER` expression	Link

rr0gi · 2024-09-11T16:29:04Z

to clarify : this ambiguity rule is something from SQL standard (which?) or database specific (mysql?) or observed empirically?

rr0gi · 2024-09-11T16:31:15Z

lib/syntax.ml

  (* use schema without aliases here *)
  let p1 = get_params_of_columns env columns in
-  let env = { env with schema = Schema.Join.cross env.schema final_schema |> make_unique } in (* enrich schema in scope with aliases *)


enrich comment lost but it is useful

rr0gi · 2024-09-11T16:33:10Z

lib/syntax.ml

+  handle ~is_agg:false 
+
+let resolve_ambiguous_columns all_schema final_schema = 
+  List.filter (fun s1 ->


filter only applies to all_schema - make it more clear

this was not addressed
i mean let l = List.filter ... all_schema in l @ final_schema

Ah, I thought you meant to me to name it more clearly. Made more clear ( it to be not to looked that @ applies to final_shema and all_schema before filter applies to all_schema)

rr0gi · 2024-09-11T16:34:11Z

lib/syntax.ml

-  handle ~is_agg:false  
+  handle ~is_agg:false 
+
+let resolve_ambiguous_columns all_schema final_schema = 


the name resolve_ is confusing - it is not resolving anything, it is just adding alias columns to schema

jongleb · 2024-09-11T18:06:37Z

to clarify : this ambiguity rule is something from SQL standard (which?) or database specific (mysql?) or observed empirically?

Added this part to the description

rr0gi · 2024-09-25T21:18:17Z

i still hard time wrapping my head around the change
the links to db-fiddle showcase different queries than the ones in the table
the table is missing the column what is the behaviour of these queries in the databases
the second row in the table i don't see anything ambiguous

jongleb · 2024-09-30T07:30:58Z

i still hard time wrapping my head around the change
the links to db-fiddle showcase different queries than the ones in the table

I also added the second table with queries from the db-fiddle example. The first table demonstrates examples from given from tests.

the table is missing the column what is the behaviour of these queries in the databases

I added it to the both tables.

the second row in the table i don't see anything ambiguous

Yes, and as we see it work since this PR

rr0gi · 2024-10-03T20:26:33Z

Yes, and as we see it work since this PR

it works before this PR as well, there is no ambiguity there

[..]
  module List = struct
    let select_4 db  callback =
      let invoke_callback stmt =
        callback
          ~id:(T.get_column_Int_nullable stmt 0)
          ~id0:(T.get_column_Int_nullable stmt 1)
      in
      let r_acc = ref [] in
      IO.(>>=) (T.select db ("SELECT test21.id, test22.id\n\
FROM test21 \n\
JOIN test22 ON test21.id = test22.id") T.no_params (fun x -> r_acc := invoke_callback x :: !r_acc))
      (fun () -> IO.return (List.rev !r_acc))

  end (* module List *)
end (* module Sqlgg *)
Warning: this SQL statement will produce rowset with duplicate column names:
SELECT test21.id, test22.id
FROM test21 
JOIN test22 ON test21.id = test22.id

it gives the warning on duplicate column names which is valid, but it generates code alright

rr0gi · 2024-10-03T20:29:33Z

SELECT id, id
FROM test21 
JOIN test22 ON test21.id = test22.id 
GROUP BY id;

this example is orthogonal to this PR, it is duplicate columns in the result set again, does not test ambiguity.

rr0gi · 2024-10-03T20:30:14Z

so i mean PR is useful and i guess i understand which cases are handled, but the description in PR is still very confusing (or just plain wrong), need to fix because this is the only documentation that we have now.

rr0gi · 2024-10-03T20:36:03Z

the queries in the test cases in the PR make sense, idu why the ones you pick for the documentation tables are different %)

rr0gi · 2024-10-03T20:37:56Z

src/test.ml

+  tt "select test21.id from test21 join test22 on test21.id = test22.id order by id" [
+    attr' ~nullability:(Nullable) "id" Int;
+  ] [];
+  wrong "select test21.id from test21 join test22 on test21.id = test22.id where id > 2 order by id";


please put comment from the PR here why this one doesn't fail

But it fails. I call here the "wrong" function. I means assert fail. Added comment that says that wrong does assert fail

rr0gi · 2024-10-03T20:38:55Z

src/test.ml

+    attr' ~nullability:(Nullable) "id" Int;
+    attr' ~nullability:(Nullable) "id" Int;
+  ] [];
+  wrong "select id, id from test21 join test22 on test21.id = test22.id group by id";


Suggested change

wrong "select id, id from test21 join test22 on test21.id = test22.id group by id";

wrong "select id as id1, id as id2 from test21 join test22 on test21.id = test22.id group by id";

Kept it

wrong "select id, id from test21 join test22 on test21.id = test22.id group by id";

Since this call "wrong" function, that does assert fail.
And added your example too

wrong "select id as id1, id as id2 from test21 join test22 on test21.id = test22.id group by id";

rr0gi · 2024-10-03T20:41:45Z

src/test.ml

+  wrong "select * from test21 join test22 on test21.id = test22.id group by id" ;
+  tt "CREATE TABLE test23 (id INT)" [] [];
+  tt "CREATE TABLE test24 (id INT)" [] [];
+  wrong "select * from foo join bar on foo.id = bar.id order by id";


this is same test as above (line 390)?

Yes, deleted

rr0gi · 2024-10-03T20:42:17Z

src/test.ml

+  tt "CREATE TABLE test24 (id INT)" [] [];
+  wrong "select * from foo join bar on foo.id = bar.id order by id";
+  wrong "select * from foo join bar on foo.id";
+  tt "SELECT test21.id AS id, test22.id AS id FROM test21 JOIN test22 ON test21.id = test22.id" [


Suggested change

tt "SELECT test21.id AS id, test22.id AS id FROM test21 JOIN test22 ON test21.id = test22.id" [

tt "SELECT test21.id AS id1, test22.id AS id2 FROM test21 JOIN test22 ON test21.id = test22.id" [

nobody should write two AS id so lets not confuse ppl

rr0gi · 2024-10-03T20:45:17Z

lib/syntax.ml

+      (List.length a2.Schema.Source.Attr.sources = 0
+      (* Check if columns are from the same table (source)  *)
+      || (List.length a1.sources = List.length a2.sources && List.for_all2 ( = ) a1.sources a2.sources))


Suggested change

(List.length a2.Schema.Source.Attr.sources = 0

(* Check if columns are from the same table (source) *)

|| (List.length a1.sources = List.length a2.sources && List.for_all2 ( = ) a1.sources a2.sources))

(* Check if columns are from the same table (source) *)

(a2.Schema.Source.Attr.sources = [] || a1.sources = a2.sources)

less confusing indentation and simplify

why is a2.sources=[] check is needed? it means if a2.sources is empty and a1.sources is not empty then result will be true, ie equality is not symmetric? 🤔

why is a2.sources=[] check is needed? it means if a2.sources is empty and a1.sources is not empty then result will be true, ie equality is not symmetric? 🤔

SELECT COUNT(column_a) as column_a FROM test21 WHERE column_a = @column_a

This is when we alias a column to the same name as a column in selected tables.
Maybe this is a good pattern to do that at all , but

sql supports it

we have this examples in our code

rr0gi · 2024-10-03T20:51:10Z

lib/syntax.ml

+  handle ~is_agg:false 
+
+let resolve_ambiguous_columns all_schema final_schema = 
+  List.filter (fun s1 ->


this was not addressed
i mean let l = List.filter ... all_schema in l @ final_schema

rr0gi · 2024-10-03T20:52:23Z

lib/syntax.ml

+
+let update_schema_with_aliases all_schema final_schema = 
+  List.filter (fun s1 ->
+    List.for_all (fun s2 -> s2.Schema.Source.Attr.attr.name <> s1.Schema.Source.Attr.attr.name) final_schema


what about unnamed columns btw, "" <> "" is it important here?

It's function applied to Column expression, but unnamed. Like example above but without alias.

SELECT COUNT(column_a) FROM test21 WHERE column_a = @column_a

If we have more unnamed function applied to columns expressiono as columns,

SELECT COUNT(column_a), AVG(column_a) FROM test21 WHERE column_a = @column_a

the case of uniqueness will always be matched here.
we give names at a later stage already during code generation for and only for the arguments in the function for mapping results

rr0gi · 2024-10-03T20:54:56Z

lib/syntax.ml

@@ -107,10 +107,17 @@ let exists_grouping columns =
  List.exists (function Expr (e,_) -> is_grouping e | All | AllOf _ -> false) columns

 (* all columns from tables, without duplicates *)
-(* FIXME check type of duplicates *)


this fixme disappeared, i think it is important, at some point we want to warn if there are columns with same name but different types

Okay, I'll return it. I just thought my PR put a dot to the ambiguity check, and this is why I deleted it.

jongleb · 2024-10-04T11:01:07Z

SELECT id, id
FROM test21 
JOIN test22 ON test21.id = test22.id 
GROUP BY id;
this example is orthogonal to this PR, it is duplicate columns in the result set again, does not test ambiguity.

Removed

jongleb · 2024-10-04T12:01:36Z

it gives the warning on duplicate column names which is valid, but it generates code alright

First of all, you are right, sorry. I didn't see this warning when I wrote tests. Because this warning happens on the later stages, but tests are checked for the earlier stage (parsing + param type inferring). I would say since my PR it happens at the early stage (since in runtime it's runtime error, not warning) + GROUP BY, ORDER BY, WHERE, HAVING statements weren't cheked at all

and make this query valid

UPDATE test, (SELECT * FROM test2) AS x 
SET str = x.str 
WHERE test.id = x.id;

, since

          if not (Sql.Schema.is_unique stmt.schema) then
            Printf.eprintf "Warning: this SQL statement will produce rowset with duplicate column names:\n%s\n" (fst sql);

this warning checks only schema. I Added "Main distinguishing requests" chapter to the description of the PR. Or does it make senee to delete tables above and to make a new one ?

UPD: I updated table, warnings aren't related to WHERE/GROUP BY/ even SET and the rest any statements not from schema (code with warning checks schema)

jongleb requested a review from rr0gi September 6, 2024 20:08

jongleb marked this pull request as ready for review September 6, 2024 20:08

jongleb force-pushed the fix-ambiguous branch from 419c701 to e265508 Compare September 6, 2024 20:19

jongleb requested review from mfp, cyberhuman and Khady September 11, 2024 15:33

jongleb self-assigned this Sep 11, 2024

rr0gi requested changes Sep 11, 2024

View reviewed changes

jongleb force-pushed the fix-ambiguous branch from e265508 to c7eb0d1 Compare September 18, 2024 13:31

make ambiguous columns check properly

91cdc63

jongleb force-pushed the fix-ambiguous branch from c7eb0d1 to 91cdc63 Compare September 18, 2024 13:33

jongleb requested a review from rr0gi September 20, 2024 11:22

rr0gi requested changes Oct 3, 2024

View reviewed changes

fix ambiguous test, make clearly update_schema_with_aliases

c0a8e8a

jongleb requested a review from rr0gi October 4, 2024 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the ambiguous check the same as in runtime #21

Make the ambiguous check the same as in runtime #21

jongleb commented Sep 6, 2024 •

edited

Loading

rr0gi commented Sep 11, 2024

rr0gi Sep 11, 2024

rr0gi Sep 11, 2024

rr0gi Oct 3, 2024

jongleb Oct 4, 2024

rr0gi Sep 11, 2024

jongleb commented Sep 11, 2024

rr0gi commented Sep 25, 2024

jongleb commented Sep 30, 2024 •

edited

Loading

rr0gi commented Oct 3, 2024

rr0gi commented Oct 3, 2024 •

edited

Loading

rr0gi commented Oct 3, 2024

rr0gi commented Oct 3, 2024

rr0gi Oct 3, 2024

jongleb Oct 4, 2024 •

edited

Loading

rr0gi Oct 3, 2024

jongleb Oct 4, 2024 •

edited

Loading

rr0gi Oct 3, 2024

jongleb Oct 4, 2024

rr0gi Oct 3, 2024

jongleb Oct 4, 2024

rr0gi Oct 3, 2024

rr0gi Oct 3, 2024

jongleb Oct 4, 2024 •

edited

Loading

rr0gi Oct 3, 2024

rr0gi Oct 3, 2024

jongleb Oct 4, 2024 •

edited

Loading

rr0gi Oct 3, 2024

jongleb Oct 4, 2024

jongleb commented Oct 4, 2024

jongleb commented Oct 4, 2024 •

edited

Loading

	wrong "select id, id from test21 join test22 on test21.id = test22.id group by id";
	wrong "select id as id1, id as id2 from test21 join test22 on test21.id = test22.id group by id";

	tt "SELECT test21.id AS id, test22.id AS id FROM test21 JOIN test22 ON test21.id = test22.id" [
	tt "SELECT test21.id AS id1, test22.id AS id2 FROM test21 JOIN test22 ON test21.id = test22.id" [

Make the ambiguous check the same as in runtime #21

Are you sure you want to change the base?

Make the ambiguous check the same as in runtime #21

Conversation

jongleb commented Sep 6, 2024 • edited Loading

Description

DDL:

Queries

First

Second

Ambiguous Column Rule:

Comparison of versions

DDL

rr0gi commented Sep 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jongleb commented Sep 11, 2024

rr0gi commented Sep 25, 2024

jongleb commented Sep 30, 2024 • edited Loading

rr0gi commented Oct 3, 2024

rr0gi commented Oct 3, 2024 • edited Loading

rr0gi commented Oct 3, 2024

rr0gi commented Oct 3, 2024

Choose a reason for hiding this comment

jongleb Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jongleb Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jongleb Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jongleb Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jongleb commented Oct 4, 2024

jongleb commented Oct 4, 2024 • edited Loading

jongleb commented Sep 6, 2024 •

edited

Loading

jongleb commented Sep 30, 2024 •

edited

Loading

rr0gi commented Oct 3, 2024 •

edited

Loading

jongleb Oct 4, 2024 •

edited

Loading

jongleb Oct 4, 2024 •

edited

Loading

jongleb Oct 4, 2024 •

edited

Loading

jongleb Oct 4, 2024 •

edited

Loading

jongleb commented Oct 4, 2024 •

edited

Loading