Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sporadic failures testing with JRuby #1012

Merged
merged 4 commits into from
Nov 8, 2023

Conversation

headius
Copy link
Contributor

@headius headius commented Nov 8, 2023

A collection of fixes to make specs more robust or fix issues causing sporadic failures on JRuby.

If this spec proceeds to reset the barrier before there are any
waiters, the reset will not trigger them and the spec will fail.
This at least tries to ensure that the waiter is there and will
be interrupted as expected.
On slower or heavily-loaded systems, such as CI, this join may
need more than one second. Give it ten before giving up and
raising an error.
This avoids getting stuck and waiting forever if something goes
awry.
@headius
Copy link
Contributor Author

headius commented Nov 8, 2023

Existing commits up to this point fix the following:

  1) Concurrent::CyclicBarrier#number_waiting with waiting threads should be equal to the waiting threads count
     Failure/Error: expect(thread_join).not_to be_nil, thread.inspect
       #<Thread:0x40712ee9 /Users/headius/work/concurrent-ruby/spec/support/example_group_extensions.rb:32 aborting>
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-support-3.12.1/lib/rspec/support.rb:108:in `block in Support'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-support-3.12.1/lib/rspec/support.rb:117:in `notify_failure'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/fail_with.rb:35:in `fail_with'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/handler.rb:40:in `handle_failure'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/handler.rb:84:in `block in handle_matcher'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/handler.rb:27:in `with_matcher'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/handler.rb:76:in `handle_matcher'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/expectation_target.rb:78:in `not_to'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/expectation_target.rb:106:in `not_to'
     # ./spec/spec_helper.rb:57:in `block in <main>'

I have at least one other intermittent failure I need to investigate:

  1) Concurrent ErlangActor on pool behaves like erlang actor asking timing out
     Failure/Error: raise NoActor.new(@Pid) if @Terminated.resolved?

     Concurrent::ErlangActor::NoActor:
       #<Concurrent::ErlangActor::Pid:0x7f58d87f terminated normally with true>
     Shared Example Group: "erlang actor" called from ./spec/concurrent/edge/erlang_actor_spec.rb:975
     # ./lib/concurrent-ruby-edge/concurrent/edge/erlang_actor.rb:706:in `ask'
     # ./lib/concurrent-ruby-edge/concurrent/edge/erlang_actor.rb:82:in `ask'
     # ./spec/concurrent/edge/erlang_actor_spec.rb:934:in `block in <main>'
     # org/jruby/RubyBasicObject.java:2614:in `instance_exec'
     # /home/runner/work/jruby/jruby/lib/ruby/gems/shared/gems/rspec-core-3.12.2/lib/rspec/core/example.rb:263:in `block in run'

@headius
Copy link
Contributor Author

headius commented Nov 8, 2023

I discovered this PR and commits by @eregon omitting "flaky tests" on TruffleRuby some years ago. The ErlangActor failure above would be excluded by this.

#943

I can certainly add JRuby there, but it would be better to figure out why these tests are flaky and fix them.

@headius
Copy link
Contributor Author

headius commented Nov 8, 2023

This error just showed up once on the Ruby 3.2 CI jobs for this PR:

  1) Concurrent::ReentrantReadWriteLock write lock wakes up waiting readers when the write lock is released
     Failure/Error: (1..3).each { |n| expect(threads[n].status).to eql "sleep" }

       expected: "sleep"
            got: "run"

       (compared using eql?)

@headius
Copy link
Contributor Author

headius commented Nov 8, 2023

TR failure in https://github.com/ruby-concurrency/concurrent-ruby/actions/runs/6792582483 does not seem related to my changes.

@bensheldon
Copy link
Contributor

TR failure in https://github.com/ruby-concurrency/concurrent-ruby/actions/runs/6792582483 does not seem related to my changes.

Agreed. Those look very racy and asserting that a thread is sleeping seems troublesome. I'd suggest rewriting those something like this:

  it "cannot be acquired when another thread holds a write lock" do
    latch   = CountDownLatch.new
    avoid_latch = CountDownLatch.new # expect this latch never to be reached
    threads = [
      in_thread { lock.acquire_write_lock; latch.count_down },
      in_thread { latch.wait; lock.acquire_write_lock; avoid_latch.count_down }
    ]
    expect { Timeout.timeout(1) { threads[0].join }}.not_to raise_error
    expect(threads[0]).to hold(lock).for_write
    expect(threads[1]).not_to hold(lock).for_write
    wait_up_to(0.2) { threads[1].status == 'sleep' }
    
    # more reliable expectations
    expect(avoid_latch.wait(0.2)).to be false
    expect(lock.try_write_lock).to be false
  end

But also could probably be in another PR---not that I am anyone to say one way or another.

@headius
Copy link
Contributor Author

headius commented Nov 8, 2023

I'm not sure what would be causing the TR segfaults in https://github.com/ruby-concurrency/concurrent-ruby/actions/runs/6792582483 so I'l lleave that for TR folks to figure out. The other TR failure in https://github.com/ruby-concurrency/concurrent-ruby/actions/runs/6792654287/job/18466286918?pr=1012 might be another flaky test, as @bensheldon suggested above, but I'm not sure of the right fix and I have not seen it on JRuby yet.

If it's ok, I'd like to merge what we have here so JRuby can start to have some consistency in concurrent-ruby testing. If we see more sporadic failures, I'll open another issue.

@headius headius merged commit dadc2ad into ruby-concurrency:master Nov 8, 2023
12 checks passed
@headius headius deleted the fix_sporadic_failures branch November 8, 2023 02:59
Copy link
Collaborator

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants