Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contain_exactly matcher spins forever, never finishing whatever it's supposed to be doing #1006

Open
hakanai opened this issue Aug 2, 2017 · 9 comments

Comments

@hakanai
Copy link

hakanai commented Aug 2, 2017

For some reason, this example never completes:

describe 'something' do
  it 'fails this test' do
    actual_path_names = ["y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y2", "y3", "y4", "y5", "y5", "y5", "y6", "y6", "y6", "y7", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y10", "y10", "y10", "y11", "y11", "y11", "y11", "y12", "y12", "y12", "y13", "y13", "y13", "y14"]
    puts "actual_path_names.size = #{actual_path_names.size}"

    expected_path_names = ["y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y2", "y3", "y3", "y3", "y3", "y3", "y3", "y3", "y3", "y3", "y3", "y3", "y4", "y5", "y5", "y5", "y5", "y5", "y6", "y6", "y6", "y6", "y6", "y6", "y6", "y7", "y7", "y7", "y7", "y7", "y7", "y7", "y7", "y7", "y7", "y7", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "[x1, x2, x10, x26, ]", "y10", "y10", "y10", "y10", "y11", "y11", "y11", "y11", "y11", "y11", "y12", "y12", "y12", "y12", "y12", "y12", "y13", "y13", "y13", "y15", "y15", "y16", "y16", "y14"]
    puts "expected_path_names.size = #{expected_path_names.size}"

    expect(actual_path_names).to contain_exactly(*expected_path_names)
  end
end

RSpec v3.6.0
JRuby v9.1.6.0, MRI v2.4.0

@myronmarston
Copy link
Member

Thanks for the bug report!

Unfortunately, the large amount of flexibility provided by the contain_exactly matcher forces it to fall back to a slow brute force algorithm that compares every actual element against every expected element, to see if there is any way to pair up every actual element against exactly one expected element each. To understand why, please read the lengthy comment the explains the PairingsMaximizer. In your case, you have very large actual and expected arrays, and given that they don't match, it causes a very slow brute force search to find the best set of pairings that minimizes what is unpaired.

You can work around this by not using contain_exactly in this one case; instead you could do:

expect(actual_path_names.sort).to eq(expected_path_names.sort)

Of course, that's not ideal, but it should at least unblock you. Long term, I'd love for us to improve contain_exactly to not have this issue but I'm not sure of a way to do that. I think we might be able to improve it to at least not use the slow brute force approach when all the expected items are matching using only exact equality (as strings do), using some logic like this:

irb(main):001:0> a = "abc"
=> "abc"
irb(main):002:0> a.method(:==) == a.method(:===)
=> true

Here we can see that for a string, === is just an alias of == -- which means we don't need to bother with the slow brute force solution in this case. I don't have the time to work on that optimization now, but maybe someone else can take it up (would you be interested, @trejkaz?).

@hakanai
Copy link
Author

hakanai commented Aug 2, 2017

You say large arrays, but my array was only about 200 elements in size... And if I randomly move some elements around, suddenly it's much, much faster again, even though only a couple of elements changed. It doesn't really make sense to me.

@myronmarston
Copy link
Member

myronmarston commented Aug 2, 2017 via email

@hakanai
Copy link
Author

hakanai commented Aug 2, 2017

Turns out I couldn't get it to happen from the original example. But my original test failure did fail instantly, so I converted that to the same style, and I get:

describe 'something' do
  it 'fails this test' do
    actual_path_names = ["y1", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y3", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y5", "y6", "y6", "y6", "y6", "y6", "y7", "y7", "y7", "y7", "y7", "y7", "y7", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y10", "y10", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y12", "y13", "y14", "y14", "y15", "y16", "y16", "y16", "y16", "y16", "y16", "y16", "y17", "y17", "y17", "y17", "y17", "y17", "y18", "y18", "y18", "y18", "y18", "y18", "y19", "y19", "y19", "y19", "y19", "y19", "y20", "y20", "y21", "y21", "y22"]
    puts "actual_path_names.size = #{actual_path_names.size}"

    expected_path_names = ["y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y3", "y4", "y5", "y6", "y6", "y6", "y7", "y7", "y7", "y8", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y16", "y16", "y16", "y17", "y17", "y17", "y17", "y18", "y18", "y18", "y19", "y19", "y19", "y22"]
    puts "expected_path_names.size = #{expected_path_names.size}"

    expect(actual_path_names).to contain_exactly(*expected_path_names)
  end
end

The sizes:

actual_path_names.size = 158
expected_path_names.size = 100

So I guess each list is half the size, but it completes much faster than 1/4 of the time. Actually, I haven't yet seen the other example complete. :)

So actually just looking at the lists, it seems like the main difference is that this one has a unique element at the front before the run of identical elements, whereas in the other example, it starts with a run of identical elements.

@hakanai
Copy link
Author

hakanai commented Aug 2, 2017

Well, I ruled out that theory because I tried the original example with a "y0" at the front, and I still get a hang. So I have no idea what makes it run so much faster in one case and not the other.

@hakanai
Copy link
Author

hakanai commented Aug 2, 2017

So this is interesting...

If:

    expected_path_names = ["y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y3", "y4", "y5", "y6", "y6", "y6", "y7", "y7", "y7", "y8", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y16", "y16", "y16", "y17", "y17", "y17", "y17", "y18", "y18", "y18", "y19", "y19", "y19", "y22"]

I get a hang with:

    actual_path_names = ["y0", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y1", "y2", "y3", "y4", "y5", "y5", "y5", "y6", "y6", "y6", "y7", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y10", "y10", "y10", "y11", "y11", "y11", "y11", "y12", "y12", "y12", "y13", "y13", "y13", "y14"]

But no hang with:

    actual_path_names = ["y1", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y2", "y3", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y4", "y5", "y6", "y6", "y6", "y6", "y6", "y7", "y7", "y7", "y7", "y7", "y7", "y7", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y8", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y9", "y10", "y10", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y11", "y12", "y13", "y14", "y14", "y15", "y16", "y16", "y16", "y16", "y16", "y16", "y16", "y17", "y17", "y17", "y17", "y17", "y17", "y18", "y18", "y18", "y18", "y18", "y18", "y19", "y19", "y19", "y19", "y19", "y19", "y20", "y20", "y21", "y21", "y22"]

Even though the latter is much longer. :)

@myronmarston
Copy link
Member

Thanks, that's a useful example. I'll dig into it.

@kaiwren
Copy link

kaiwren commented Sep 18, 2018

I'm not sure how much progress I can make here, but just in case it's useful to others solving this, I've move @trejkaz's example into a spec at https://github.com/kaiwren/rspec-expectations/tree/1006-contains-exactly-hangs (this isn't merge-able, just useful to get started)

@mjayfrancis
Copy link

The optimisation suggested in #1006 (comment) should solve this particular case as reported, but if anyone is motivated to improve the contain_exactly matcher in a more general way, the following may be of use:

The problem that PairingsMaximizer solves is finding a maximum cardinality matching for a bipartite graph. Replacing the current brute force algorithm with a solution such as the Hopcroft-Karp algorithm should give far better results in the general case.

bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 29, 2021
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays.

This addresses rspec#1006, rspec#1161.

This PR is a collaboration between me and @genehsu based on
a couple of our earlier PRs and discussion that resulted:
1) rspec#1325
2) rspec#1328

Co-authored-by: Gene Hsu (@genehsu)
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 29, 2021
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays.

This addresses rspec#1006, rspec#1161.

This PR is a collaboration between me and @genehsu based on
a couple of our earlier PRs and discussion that resulted:
1) rspec#1325
2) rspec#1328

Co-authored-by: Gene Hsu (@genehsu)
bclayman-sq added a commit to bclayman-sq/rspec-expectations that referenced this issue Oct 29, 2021
Speed up the ContainExactly matcher by pre-emptively matching up corresponding elements in the expected and actual arrays.

This addresses rspec#1006, rspec#1161.

This PR is a collaboration between me and @genehsu based on
a couple of our earlier PRs and discussion that resulted:
1) rspec#1325
2) rspec#1328

Co-authored-by: Gene Hsu (@genehsu)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants