Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve implementation of associations!! #363

Closed
wants to merge 2 commits into from

Conversation

weiznich
Copy link
Member

@weiznich weiznich commented Jun 28, 2016

diesel:

  • allow to implement the join_dsl for multiple joins between the same
    table. All join related traits are now abstract over the foreign key!!
  • Implement JoinTo for already joined tables to allow joins containing
    multiple tables. Currently it's not possible to join two tables
    multiple times.
  • remove select_column_workaround! and add generic implementations for
    this
  • remove join_through, because it's now possible to build joins
    containing more than two tables

diesel_codegen:

  • add an optional parameter to the has_many and belongs_to annotation
    naming the foreign key column
  • improve name building in codegen (use the Inflector crate for this. We
    don't need to implement this on our own)

Open Points:

  • Formatting
  • Investigate why left_outer_joins is sometimes not able to infer the right sql_type (See the manually specified types in some tests cases. This happens only with users::id)
  • Try to get the second JoinTo implementation for InnerJoinSource and LeftOuterJoinSource running to allow multiple joins between two tables in one query.
  • Add tests
  • Document codgen changes

@killercup
Copy link
Member

You seem to like exclamation marks 😄

I haven't read through all the changes; I just wanted to mention: Regarding formatting, less diff-noise is generally better. (E.g., avoiding visual indentation for function definitions and chains, putting the where at the end of the fn definition, etc.)

diesel:
* allow to implement the join_dsl for multiple joins between the same
  table. All join related traits are now abstract over the foreign key!!
* Implement JoinTo for already joined tables to allow joins containing
  multiple tables. Currently it's not possible to join two tables
  multiple times.
* remove select_column_workaround! and add generic implementations for
  this
* remove join_through, because it's now possible to build joins
  containing more than two tables

diesel_codegen:
* add an optional parameter to the has_many and belongs_to annotation
  naming the foreign key column
* improve name building in codegen (use the Inflector crate for this. We
  don't need to implement this on our own)
@weiznich
Copy link
Member Author

Hopefully formatting is a bit better now. I've tried to remove most of the indentation related changes.

@@ -45,6 +51,26 @@ macro_rules! column {
{
}

impl<Left, Right, FK> $crate::expression::SelectableExpression<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impl would allow the following incorrect query to compile:

users.inner_join(posts).select(comments::text)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that I wasn't missing something, and I'm not. I've explored this change numerous times. We cannot write

impl<Other> SelectableExpression<InnerJoinSource<table, Other>> for column where
    Other: Table,
    table: JoinTo<Other, Inner>,
{}

impl<Other> SelectableExpression<InnerJoinSource<Other, table>> for column where
    Other: Table + JoinTo<table, Inner>,
{}

because they overlap if Other == table. For your impl to be correct, you'd need to add a SelectableExpression constraint, which would be:

impl<Left, Right> SelectableExpression<InnerJoinSource<Left, Right>> for column where
    Left: JoinTo<Right, Inner>,
    Right: Table,
    column: SelectableExpression<Left>,
{}

impl<Left, Right> SelectableExpression<InnerJoinSource<Left, Right>> for column where
    Left: JoinTo<Right, Inner>,
    Right: Table,
    column: SelectableExpression<Right>,
{}

which again, overlaps if Left and Right are the same type, or if column: SelectableExpression<Left> + SelectableExpression<Right>. There's no way to properly resolve this in the language today without a more in depth rethink of this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely there should be a compile_test for this. I did not expect that to happen.
Maybe we could use JoinTo::JoinAllColumns to solve this. SelectableExpression<InnerJoinSource<Left, Right, FK>> should be implemented for each column there. Is there any "Trait" to check if a Tuple contains a certain type? If not, is it possible to write such thing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that trait doesn't exist and it would not be possible to write it today.

Copy link
Member Author

@weiznich weiznich Jun 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that trait doesn't exist and it would not be possible to write it today.

Using specialization this seems to be possible. Currently there seems to be a problem with the resolution of associated types when using specialization. See the following minimal example. Because of this the explicit specialization from line 27 to line 37 and from line 66 to line 72 is needed. There are already rust-lang issues for this.

@sgrif
Copy link
Member

sgrif commented Jun 29, 2016

I'm not sure I like passing the foreign key around like this. The goal of the API was to abstract away how two tables are joined together. This is missing tests for the added behavior, both behavioral and compile-fail. In particular, it seems like the resulting type is wrong. users.inner_joins(posts).inner_joins(comments) would return (User, (Post, Comment)) not (User, Post, Comment) if I'm reading this correctly?

In general I've been moving away from multiple joins as a solution for associations anyway, as once you've introduced more than one join into the mix, it's often significantly better to run two queries, as the amount of duplicate data sent over the wire is significantly reduced.

@weiznich
Copy link
Member Author

weiznich commented Jun 29, 2016

I'm not sure I like passing the foreign key around like this. The goal of the API was to abstract away how two tables are joined together.

If we abstract away which foreign key is used to join two tables it's not possible to have multiple joins between tables. See #332. In a second step we could maybe try to abstract the key away for cases where there is only one possible join?

This is missing tests for the added behavior, both behavioral and compile-fail.

There is not that much added behaviour in this pull request:

  • Joins over multiple tables -> There is already an test case
  • Ability to join tables multiple times with different foreign keys
  • additional optional parameters on belong_to and has_many annotations
  • compile-fail tests for selecting a column that is not part of the join

In particular, it seems like the resulting type is wrong. users.inner_joins(posts).inner_joins(comments) would return (User, (Post, Comment)) not (User, Post, Comment) if I'm reading this correctly?

Yes the returning type is (User, (Post, Comment)). I wouldn't call this wrong. I could (and maybe should) change this to (User, Post, Comment). But on the other hand having something like this: users.left_outer_join(posts).inner_join(comments) should maybe result into something like (User, Option<(Post, Comment)>) so the current variant matches this type as close as possible.

In general I've been moving away from multiple joins as a solution for associations anyway, as once you've introduced more than one join into the mix, it's often significantly better to run two queries, as the amount of duplicate data sent over the wire is significantly reduced.

I currently try to implement the following query using diesel:

select t.*, o.*, u.*, v.*, w.* from tile_versions as t 
    inner join tiles as ts on t.tile_id = ts.id 
    inner join bounding_boxes as b on ts.bbox = b.id  
    inner join geo_points as o on b.o = o.id
    inner join geo_points as u on b.u = o.id
    inner join geo_points as v on b.v = v.id
    inner join geo_points as w on b.w = w.id
where ts.geometry_id = 1 and t.version_id=1;

If I try to implement everything without any join something like this is the result

select t.*, o.*, u.*, v.*, w.* from tile_versions as t, 
    (select * from geo_points) as o,
    (select * from geo_points) as u,
    (select * from geo_points) as v,
    (select * from geo_points) as w
where t.version_id = 1 and 
    t.tile_id IN (select id from tiles where geometry_id = 1)
    and o.id IN (select o from bounding_boxes where id in (select bbox from tiles where geometry_id =1))
    and u.id IN (select u from bounding_boxes where id in (select bbox from tiles where geometry_id =1))
    and v.id IN (select v from bounding_boxes where id in (select bbox from tiles where geometry_id =1))
    and w.id IN (select w from bounding_boxes where id in (select bbox from tiles where geometry_id =1));

Postgres Explain results for the query 1 in "Nested Loop (cost=3.54..11.12 rows=16 width=213) " and for query 2 in "Nested Loop Semi Join (cost=6.31..87.46 rows=1 width=213). A other solution would be to split of the selection of the bounding_box. The result would be a loop in my rust code iteration over every tile and than loading the corresponding box using 4 queries. Probably there are also cases where using separated queries will result in a poor performance. Because of that I'm thinking it should be possible to write such queries in diesel. Nobody say it must be easy, but it should be possible without relaying on plain sql and losing almost all type checking abilities of diesel.

@weiznich
Copy link
Member Author

weiznich commented Sep 8, 2016

As @sgrif explained this won't work in this way.

@weiznich weiznich closed this Sep 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants