Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build timeouts on macos workflows #30510

Open
matschaffer opened this issue Feb 22, 2022 · 1 comment
Open

Build timeouts on macos workflows #30510

matschaffer opened this issue Feb 22, 2022 · 1 comment
Labels
flaky-test Unstable or unreliable test cases. Team:Automation Label for the Observability productivity team

Comments

@matschaffer
Copy link
Contributor

I hit 4 macos failures while working on some beats PRs. All were macos jobs and appeared to be timeout related.

Flaky Test

Stack Trace

=================================== FAILURES ===================================
__________________ Test.test_http_endpoint_missing_auth_value __________________

self = <test_http_endpoint.Test testMethod=test_http_endpoint_missing_auth_value>

        def test_http_endpoint_missing_auth_value(self):
            """
            Test http_endpoint input with missing basic auth values.
            """
            options = """
      basic_auth: true
      username: testuser
      password:
    """
            self.get_config(options)
            filebeat = self.start_beat()
>           self.wait_until(lambda: self.log_contains(
                "username and password required when basicauth is enabled"))

tests/system/test_http_endpoint.py:192: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test_http_endpoint.Test testMethod=test_http_endpoint_missing_auth_value>
cond = <function Test.test_http_endpoint_missing_auth_value.<locals>.<lambda> at 0x10fb6e550>
max_timeout = 10, poll_interval = 0.1, name = 'cond'

    def wait_until(self, cond, max_timeout=10, poll_interval=0.1, name="cond"):
        """
        Waits until the cond function returns true,
        or until the max_timeout is reached. Calls the cond
        function every poll_interval seconds.
    
        If the max_timeout is reached before cond() returns
        true, an exception is raised.
        """
        start = datetime.now()
        while not cond():
            if datetime.now() - start > timedelta(seconds=max_timeout):
>               raise TimeoutError("Timeout waiting for '{}' to be true. ".format(name) +
                                   "Waited {} seconds.".format(max_timeout))
E               beat.beat.TimeoutError: Timeout waiting for 'cond' to be true. Waited 10 seconds.

../../libbeat/tests/system/beat/beat.py:365: TimeoutError
=================================== FAILURES ===================================
_________________ TestKeystore.test_keystore_with_present_key __________________

self = <test_keystore.TestKeystore testMethod=test_keystore_with_present_key>

    def test_keystore_with_present_key(self):
        """
        Test that we correctly do string replacement with values from the keystore
        """
    
        key = "elasticsearch_host"
        secret = "myeleasticsearchsecrethost"
    
        self.render_config_template(keystore_path=self.keystore_path, elasticsearch={
            'host': "${%s}:9200" % key
        })
    
        exit_code = self.run_beat(extra_args=["keystore", "create"])
        assert exit_code == 0
    
        self.add_secret(key, secret, True)
        proc = self.start_beat()
        self.wait_until(lambda: self.log_contains("myeleasticsearchsecrethost"))
        assert self.log_contains(secret)
>       proc.kill_and_wait()

tests/system/test_keystore.py:36: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../libbeat/tests/system/beat/beat.py:142: in kill_and_wait
    return self.wait()
../libbeat/tests/system/beat/beat.py:124: in wait
    return self.proc.wait()
/usr/local/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py:1189: in wait
    return self._wait(timeout=timeout)
/usr/local/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py:1917: in _wait
    (pid, sts) = self._try_wait(0)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Popen: returncode: None args: ['/Users/runner/work/beats/beats/filebeat/fil...>
wait_flags = 0

    def _try_wait(self, wait_flags):
        """All callers to this function MUST hold self._waitpid_lock."""
        try:
>           (pid, sts) = os.waitpid(self.pid, wait_flags)
E           Failed: Timeout >90.0s

/usr/local/Cellar/[email protected]/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py:[187](https://github.com/elastic/beats/runs/5268753394?check_suite_focus=true#step:7:187)5: Failed
=================================== FAILURES ===================================
___________________________ Test.test_non_recursive ____________________________

self = <test_file_integrity.Test testMethod=test_non_recursive>

    @unittest.skipIf(os.getenv("BUILD_ID") is not None and platform.system() == 'Darwin',
                     'Flaky test: elastic/beats/issues/246[78](https://github.com/elastic/beats/runs/5268753269?check_suite_focus=true#step:7:78)')
    def test_non_recursive(self):
        """
        file_integrity monitors watched directories (non recursive).
        """
    
        dirs = [self.temp_dir("auditbeat_test"),
                self.temp_dir("auditbeat_test")]
    
        with PathCleanup(dirs):
            self.render_config_template(
                modules=[{
                    "name": "file_integrity",
                    "extras": {
                        "paths": dirs,
                        "scan_at_start": False
                    }
                }],
            )
            proc = self.start_beat()
    
            # wait until the directories to watch are printed in the logs
            # this happens when the file_integrity module starts.
            # Case must be ignored under windows as capitalisation of paths
            # may differ
            self.wait_log_contains(escape_path(dirs[0]), max_timeout=30, ignore_case=True)
    
            file1 = os.path.join(dirs[0], 'file.txt')
            self.create_file(file1, "hello world!")
    
            file2 = os.path.join(dirs[1], 'file2.txt')
            self.create_file(file2, "Foo bar")
    
            # wait until file1 is reported before deleting. Otherwise the hash
            # might not be calculated
>           self.wait_log_contains("\"path\":\"{0}\"".format(escape_path(file1)), ignore_case=True)

tests/system/test_file_integrity.py:101: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../libbeat/tests/system/beat/beat.py:471: in wait_log_contains
    self.wait_until(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test_file_integrity.Test testMethod=test_non_recursive>
cond = <function TestCase.wait_log_contains.<locals>.<lambda> at 0x10520e[79](https://github.com/elastic/beats/runs/5268753269?check_suite_focus=true#step:7:79)0>
max_timeout = 10, poll_interval = 0.1, name = 'log_contains'

    def wait_until(self, cond, max_timeout=10, poll_interval=0.1, name="cond"):
        """
        TODO: this can probably be a "wait_until_output_count", among other things, since that could actually use `self`, and this can become an internal function
        Waits until the cond function returns true,
        or until the max_timeout is reached. Calls the cond
        function every poll_interval seconds.
    
        If the max_timeout is reached before cond() returns
        true, an exception is raised.
        """
        start = datetime.now()
        while not cond():
            if datetime.now() - start > timedelta(seconds=max_timeout):
>               raise WaitTimeoutError(
                    f"Timeout waiting for condition '{name}'. Waited {max_timeout} seconds.")
E               beat.beat.WaitTimeoutError: Timeout waiting for condition 'log_contains'. Waited 10 seconds.

../libbeat/tests/system/beat/beat.py:429: WaitTimeoutError
=== Failed
Error: go test returned a non-zero value: exit status 1
=== FAIL: heartbeat/scheduler TestSchedTaskLimits/runs_100_with_limit_not_configured (0.00s)
    testing.go:1169: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
    --- FAIL: TestSchedTaskLimits/runs_100_with_limit_not_configured (0.00s)

=== FAIL: heartbeat/scheduler TestSchedTaskLimits (0.00s)
    assertion_compare.go:323: 
        	Error Trace:	scheduler_test.go:239
        	            				scheduler_test.go:268
        	Error:      	"5" is not greater than or equal to "100"
        	Test:       	TestSchedTaskLimits
        	Messages:   	[]
@matschaffer matschaffer added flaky-test Unstable or unreliable test cases. Team:Automation Label for the Observability productivity team labels Feb 22, 2022
@matschaffer
Copy link
Contributor Author

3 passed on the first retry.

The auditbeat test continues to fail. The test looks like it's supposed to be skipped https://github.com/matschaffer/beats/blob/987250d475a88279782c00ec860acdf7f34c3741/auditbeat/tests/system/test_file_integrity.py#L66

But it's running anyway. I'm guessing BUILD_ID is something that jenkins was providing at the time the skip was added. Thinking we should probably add a CI check since github provides that https://docs.github.com/en/actions/learn-github-actions/environment-variables and jenkins probably should if it doesn't already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Automation Label for the Observability productivity team
Projects
None yet
Development

No branches or pull requests

1 participant