Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ir): make impure ibis.random() and ibis.uuid() functions return unique node instances #8967

Merged
merged 12 commits into from
Apr 15, 2024

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Apr 15, 2024

Resolves #8921

@kszucs kszucs force-pushed the impure branch 3 times, most recently from ffa38da to 813dc52 Compare April 15, 2024 11:37
@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

There's also now and today and probably others.

@cpcloud cpcloud added this to the 9.0 milestone Apr 15, 2024
@cpcloud cpcloud added the bug Incorrect behavior inside of ibis label Apr 15, 2024
@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

I think this is a bug fix, since it has to do with output correctness.

@kszucs kszucs changed the title feat(ir): make impure ibis.random() and ibis.uuid() functions return unique node instances fix(ir): make impure ibis.random() and ibis.uuid() functions return unique node instances Apr 15, 2024
ibis/expr/operations/generic.py Show resolved Hide resolved
@kszucs
Copy link
Member Author

kszucs commented Apr 15, 2024

There's also now and today and probably others.

I am planning to address that in a follow-up.

@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

Any reason why those can't be done in this PR?

@jcrist
Copy link
Member

jcrist commented Apr 15, 2024

Are now and today impure in the same sense? Implemented using CURRENT_TIMESTAMP these will return the same value for every instance of the expression when the query executes (even though they may return a different value for two subsequent executions). In contrast random() and uuid() will generally return a new value for each call in an expression (some backends like clickhouse handle this a bit differently).

@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

Ah, good point. I hadn't considered that select now() = now() whereas select random() <> random().

Do we know whether that's true for all of our backends?

@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

Anyway, that's sufficiently different that I think it can be done in a follow up.

@cpcloud cpcloud dismissed their stale review April 15, 2024 14:21

Comment is no longer relevant.

@jcrist
Copy link
Member

jcrist commented Apr 15, 2024

Do we know whether that's true for all of our backends?

I wouldn't be surprised if it's broken for pandas/dask/polars, but semantically I think it should be true for all backends.

@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

@kszucs Let me know when you want me to regen snapshots.

@kszucs
Copy link
Member Author

kszucs commented Apr 15, 2024

We still have a problem which is definitely solvable but would expand the scope of this PR heavily:

# semantically this should produce two different columns
t.select(x=ibis.random(), y=ibis.random())
# semantically this should produce identical columns
rand = ibis.random()
t.select(x=rand, y=rand)

Currently we compile both as rand() AS x, rand() AS y which behaves differently on the backends. After this PR the two random values are equal from ibis' perspective, but ideally we should ensure that they produce the same columns on all backends.
The solution could be to inject an additional subselect:

SELECT *, randcol AS x, randcol AS y FROM (SELECT *, rand() as randcol FROM ...)  # where randcol name would be generated

For clickhoue and mysql all the random calls produce the same value in a select, but we could also overcome that with similar strategies.

@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

Can that be done in a follow up?

@kszucs
Copy link
Member Author

kszucs commented Apr 15, 2024

Can that be done in a follow up?

Definitely! That is slightly harder and we need to work around backend specifics.

Yes, please regenerate the snapshots for the failing backends!

@cpcloud
Copy link
Member

cpcloud commented Apr 15, 2024

Clouds are passing:

…/ibis on  impure is 📦 v8.0.0 via 🐍 v3.11.8 via ❄️   impure (ibis-3.11.8-env)
❯ pytest -m 'bigquery or snowflake' -n auto --dist loadgroup --snapshot-update
==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.11.8, pytest-8.1.1, pluggy-1.4.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Using --randomly-seed=168840051
rootdir: /home/cloud/src/ibis
configfile: pyproject.toml
plugins: xdist-3.5.0, pytest_httpserver-1.0.10, cov-5.0.0, benchmark-4.0.0, randomly-3.15.0, hypothesis-6.100.1, repeat-0.9.3, snapshot-0.9.0, mock-3.14.0, clarity-1.0.1
16 workers [3509 items]
.....................................x......................x....................x.x.....x.....x............x.....x..........xx..................x...xxx............................x [  5%]
.x.........x..x.........................xx.xx.x.xxx...x.xxx.xxx..x....x.....x.....x...x........x.xx.x..x...........x.x......x.......x.x...x..x..x.xx..................x.............. [ 10%]
.x..x...xx....x.x...x....x.............x...................x....x.xx..x........x.x.............x......x..........x...............x.............x......x.............x.x..x........... [ 15%]
....xx.....x........x.xx...x......x..........x........x...........x..x........x...............x.....x...........................xx.......s....x.x..x........................x.x.....x [ 20%]
..x..xx...x..xx.x.xx..x.x..xxx...xx.xxxxx.x.x...xxxxxx....xxxx..xxxx.xxx.x...xxxx..x..x.xx..x...x.x..x..x..xx..xx..x..........x.......x...x......................................x... [ 25%]
.x......x......x...x.....x...xxx.xx.x...xx.x..x.x....x..x..x...........x.........xx.x.......x.....xx............xx..x........x...x......x....x......xxx...x....x.xx...............x.. [ 30%]
.x.......x.........x.............................................s.....s..s......................s...x..s..s...s....x...........................x.......x.........s.................. [ 36%]
................x.........x..x..........x................x..x.......s..................x.........x...................sssssssssssssssssss.sss...x...........x.......x................. [ 41%]
.....................x.....xxx..........x......x....x......x.................x..........x.............................................................x.............................. [ 46%]
........x........x.....x........................................x.x...x.x.xx.x.x..x...xx...x.x...xxx......xxx........x..x..x......x..x...x..............x....x.xx.................... [ 51%]
.........x.....x...........x...x...x.............x...x.x..xx...x........xx.x..x....x.x......x..x.x..x.......x...............x.....x.x..........x.x.....xx..............x............. [ 56%]
.............x...x..................x..x......x...............................x.......x....x.....x.....x.....................x.x...x.......x....x...........s...................xxxx. [ 61%]
.xxx.xx.....xxxx.x.xx.xxxxx....x.xxxx.xxxxxxxxxxxxxxxx.xxxxxxxxxxxx.xxxxxxxxxxxxx.x.xxxxxxxxx.xxxxxxxxxxxxxx.xxxxx....xxxx..x.xx..x.x...x.x.x..x..............x...........x..x.x..... [ 67%]
........x.....x....x.........x.x..xx.xxxx.x.x.x...x..xx..x..x.xx.........x..x....x......x.x..x........x.....x.............x.x.xx.....x...................xx.........x.........x...... [ 72%]
..x.......xxx..........x......xx..............x................s.......................s.................sx..x..sx..x.x.s.sxx.sxxx.x.xx....x.x....x..x.xxx...x....xx..x..x........xx.. [ 77%]
....x..x.s....x........s...s..........s..x.......x...x...........x........x.xxx..........xx.....x........x....x.....x.....x..x....x........x......x.x...x...x........x.x..x..xx...... [ 82%]
......x............x..x...xx..........x......x.......s............x................................................................................x................................. [ 87%]
.....................................................................x..x.xx......................................................................................................... [ 92%]
...........................................................................................................................................................x......................... [ 98%]
..................................................x..................                                                                                                                 [100%]
================================================================= 2921 passed, 45 skipped, 543 xfailed in 356.95s (0:05:56) =================================================================

@cpcloud cpcloud merged commit 741063a into ibis-project:main Apr 15, 2024
84 checks passed
@kszucs kszucs deleted the impure branch April 16, 2024 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: inlining expressions leads to wrong results for non-pure functions
3 participants