Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix fetch order by bug when using DISTINCT #963

Merged
merged 19 commits into from
Oct 7, 2021

Conversation

jverswijver
Copy link
Contributor

@jverswijver jverswijver commented Sep 17, 2021

Added test based off of the linked issue. It seems enforcing DISTINCT when we are making sql queries with primary keys was not useful, I have removed it for now.

fix #914

@jverswijver jverswijver changed the title Fetch order by bug Fetch order by bug when using DISTINCT Sep 17, 2021
@jverswijver jverswijver marked this pull request as ready for review September 17, 2021 16:13
@jverswijver jverswijver changed the title Fetch order by bug when using DISTINCT Fix fetch order by bug when using DISTINCT Sep 17, 2021
Copy link
Collaborator

@guzman-raphael guzman-raphael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jverswijver! Primarily just have some questions about other similar usage of DISTINCT in source.

tests/test_fetch.py Outdated Show resolved Hide resolved
distinct = self.heading.names == self.primary_key
return 'SELECT {distinct}{fields} FROM {from_}{where}'.format(
distinct="DISTINCT " if distinct else "",
return 'SELECT {fields} FROM {from_}{where}'.format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also explore all usage of DISTINCT. Noticed that there is another similar usage in the Aggregation class. Did you mean to remove its use here but maintain its use there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through it a bit and saw that it also uses DISTINCT, but it might be best to just leave it until a user reports a specific issue that is related to the aggregation usage of distinct. It would be a bad idea to remove it without a specific reason/issue.

@@ -4,6 +4,7 @@
* Bugfix - Replace use of numpy aliases of built-in types with built-in type. (#938) PR #939
* Bugfix - `ExternalTable.delete` should not remove row on error (#953) PR #956
* Bugfix - Fix error handling of remove_object function in `s3.py` (#952) PR #955
* Bugfix - Fix regression issue with DISCINCT clause and GROUP_BY (#914) PR #963
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Bugfix - Fix regression issue with DISCINCT clause and GROUP_BY (#914) PR #963
* Bugfix - Fix regression issue with DISTINCT clause and GROUP_BY (#914) PR #963

CHANGELOG.md Outdated
@@ -5,6 +5,7 @@
* Bugfix - Replace use of numpy aliases of built-in types with built-in type. (#938) PR #939
* Bugfix - `ExternalTable.delete` should not remove row on error (#953) PR #956
* Bugfix - Fix error handling of remove_object function in `s3.py` (#952) PR #955
* Bugfix - Fix regression issue with DISCINCT clause and GROUP_BY (#914) PR #963
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Bugfix - Fix regression issue with DISCINCT clause and GROUP_BY (#914) PR #963
* Bugfix - Fix regression issue with DISTINCT clause and GROUP_BY (#914) PR #963


try:
Parent().fetch('KEY', order_by='name')
assert True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has no effect.

def test_fetch_group_by(self):
# https://github.com/datajoint/datajoint-python/issues/914

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try ... except does not appear to be necessary. Just letting the exception raise will have the same effect.

@dimitri-yatsenko
Copy link
Member

DISTINCT becomes important in queries of the following types:

dj.U('attr') & q

where attr is not the primary key of q.

Another case is unions.

q + r

If we don't have test cases that require DISTINCT, let's add them

@jverswijver
Copy link
Contributor Author

A few updates:

  • github actions are failing because an api we use for code coverage is down for maintenance but the code up to commit 13b77d4 passes the test suite on my machine.
  • I have added an additional test that ensures that the results of dj.U operations are unique.
  • I have added the _distinct property to the QueryExpression class so we can be explicit when we want to use DISTINCT in our query formulation.
  • I have updated the U class to use this new _distinct property to ensure that the results produced are unique like they were before when we blanket applied DISTINCT.

Copy link
Collaborator

@guzman-raphael guzman-raphael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jverswijver Nice work in determining the root cause for this one! 💪
Only providing some minor feedback here.

tests/test_fetch.py Outdated Show resolved Hide resolved
docs-parts/intro/Releases_lang1.rst Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
tests/test_fetch.py Outdated Show resolved Hide resolved
datajoint/expression.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@guzman-raphael guzman-raphael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jverswijver Nice job man! 📈

Comment on lines 319 to 325
feched_result = result.fetch(as_dict=True)

# Cleanup table
Stimulus.delete()

# Test to see if the repeated row was removed in the results
assert feched_result == expected_result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetched results do not come in any particular order, so need to order_by.

Suggested change
feched_result = result.fetch(as_dict=True)
# Cleanup table
Stimulus.delete()
# Test to see if the repeated row was removed in the results
assert feched_result == expected_result
fetched_result = result.fetch(as_dict=True, order_by=('contrast', 'brightness'))
Stimulus.delete_quick()
assert fetched_result == expected_result

@dimitri-yatsenko dimitri-yatsenko merged commit 72c8fea into datajoint:master Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in fetching with order_by specified on attributes that are not part of the returned attribute list
3 participants