Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIEM][Detection Engine][Lists] Adds the ability to change the timeout limits from 10 seconds for loads for imports #73103

Merged
merged 13 commits into from
Jul 28, 2020

Conversation

FrankHassanabad
Copy link
Contributor

@FrankHassanabad FrankHassanabad commented Jul 23, 2020

Summary

By default the upload time limit for payloads is 10 seconds. This is really too short and we were getting internal QA bug reports that uploads are timing out on large value list importing. This PR adds the plumbing and unit tests to make the timeout configurable for routes.

  • Adds a single timeout option for routes and then normalizes that through Hapi for the socket, payload, and server timeouts.
  • Adds unit tests which test the various options
  • Adds integration tests which test the various options
  • Adds some NOTES about where there are odd behaviors/bugs within Hapi around validations and the timeouts
  • Adds a configurable 5 minute timeout to the large value lists route

Manual testing of the feature

You can manually test this by adding a configurable option to your chrome network throttle like so below where you throttle upload by some configurable amount. I chose to use 300 kbs/s upload
Screen Shot 2020-07-23 at 11 26 01 AM

And then run an import of large value lists using a large enough file that it will exceed 5 minutes:
screen-shot-upload

After 5 minutes you should see this message within your server side messages if you have configured your kibana.dev.yml to allow for these messages:

server  respons [10:52:31.377] [access:lists-all] POST /api/lists/items/_import?type=keyword 408 318292ms - 9.0B

Note that it should show you that it is trying to return a 408 after 318292ms the timeout period. Sometimes you will get the 408 in the browser and sometimes the browser actually will not respect the 408 and continue staying in a pending state forever. This seems to be browser side issue and not a client/user land issue. If you get the browser message it will be this error toaster

timeout-message

Checklist

For maintainers

@elasticmachine
Copy link
Contributor

Pinging @elastic/siem (Team:SIEM)

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing plugin to have per route configuration of timeout probably makes sense in some edge cases such as uploads or long processing, so I don't have any functional objection.

What concerns me with the implementation is that adding these three 'raw' hapi option might be a little too 'low level'. Maybe adding some sugar to let consumers configure their timeout with a single option could be possible? (see comment)

src/core/server/http/router/route.ts Outdated Show resolved Hide resolved
src/core/server/http/http_server.test.ts Outdated Show resolved Hide resolved
x-pack/plugins/lists/server/config.ts Outdated Show resolved Hide resolved
src/core/server/http/http_server.test.ts Outdated Show resolved Hide resolved
x-pack/plugins/lists/server/config.ts Outdated Show resolved Hide resolved
src/core/server/http/http_server.test.ts Show resolved Hide resolved
}
);
registerRouter(router);
await expect(server.start()).rejects.toThrowError('Invalid routeConfig options');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The message doesn't contain an invalid option name
  2. you cover this in http config test https://github.com/elastic/kibana/pull/73103/files#diff-a030b4d8ada02fb446eccfaa965fb895, so probably it's fine not to test this logic here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test and error is actually me testing and exposing the errors that Hapi server is giving back to us here. I don't do a throw on the zero and neither does the Kibana Schema for this particular test. This test exercises the route as if the route was trying to send down a socket timeout of zero with no Kibana Schema validation and the error that comes back is from Hapi Server which does not tell you which invalid route config option you have. Hapi just does a generic throw which is, Invalid routeConfig options and is not really under my control unless I begin catching and abstracting away Hapi exceptions (which I didn't see anywhere else at this moment us doing).

This boundary test I found useful as it gave me confidence that if someone tries to create a route without any schema validation and pushes down a zero then at least Hapi will throw an error and that route will not be considered valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid testing Hapi internals then. It's easy to have false positives on the next Hapi update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove it.

const router = createRouter('/');

router.post(
{ path: '/a', validate: false, options: { timeout: 300000 } },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we test here?

Copy link
Contributor Author

@FrankHassanabad FrankHassanabad Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests and the others around it test that if the timeout is set it will work (sanity test). I didn't think adding an incredibly long lived test to ensure they can exceed the 2 minutes would be ok so I instead opted for a simple sanity check.

But really exceeding the full 2 minutes but not exceeding 5 minutes would be a typical positive unit test that it is possible to exceed the default 2 minutes and operates as we would want it. However, during integration tests I thought I probably shouldn't exceed 2 minutes for the test so instead I just opted for a test that shows when it is set on a regular endpoint call it will operate normally and not cause any issues such as unexpected timeouts.

);
router.delete({ path: '/b', validate: false }, (context, req, res) => res.ok({}));
await server.start();
expect(supertest(innerServer.listener).delete('/a')).rejects.toThrow('socket hang up');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test for 408 Request timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 408 Request timeout only happens when it is a PUT/POST with a form/payload:
https://hapi.dev/api/?v=19.2.0#-routeoptionspayloadtimeout

And then you have to simulate a connection that starts but then stalls or does it so slowly that the backend decides to then disconnect you because you're taking too long. I looked around at supertest yesterday for a quite a while hoping it had an easy way to explicitly test the condition but it does not look like there is a way for me to be able to start an upload and then stall it.

For the other endpoints such as GET, DELETE which do not take a payload when the socket does a disconnect Hapi server will do a hangup on them on a timeout and not give back a 408 Request timeout. It doesn't advertise that is its behavior but that is the behavior as the tests are showing.

Copy link
Contributor Author

@FrankHassanabad FrankHassanabad Jul 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can set a different switch which is the timeout.server here:
https://hapi.dev/api/?v=19.2.0#-routeoptionstimeoutserver

And then increase the socket timeout to be slightly higher than that and instead of a socket hangup we would get a:

Service Unavailable (503) error response.

For GET, DELETE endpoints and a 408 for when its payload based with PUT/POST, but it seemed like the socket hangup was more preferred from the example you sent me in the tests. Just let me know if you want me to change it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay if you merge it as is and create an issue for the platform team to add such a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket I created for this as requested:
#73557

src/core/server/server.api.md Outdated Show resolved Hide resolved
Copy link
Contributor

@dhurley14 dhurley14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@FrankHassanabad FrankHassanabad merged commit b399fb0 into elastic:master Jul 28, 2020
@FrankHassanabad FrankHassanabad deleted the add-payload-timeout branch July 28, 2020 23:47
FrankHassanabad added a commit to FrankHassanabad/kibana that referenced this pull request Jul 28, 2020
…t limits from 10 seconds for loads for imports (elastic#73103)

## Summary

By default the upload time limit for payloads is 10 seconds. This is really too short and we were getting internal QA bug reports that uploads are timing out on large value list importing. This PR adds the plumbing and unit tests to make the timeout configurable for routes.

* Adds a single timeout option for routes and then normalizes that through Hapi for the socket, payload, and server timeouts.
* Adds unit tests which test the various options
* Adds integration tests which test the various options
* Adds some NOTES about where there are odd behaviors/bugs within Hapi around validations and the timeouts
* Adds a configurable 5 minute timeout to the large value lists route

**Manual testing of the feature**

You can manually test this by adding a configurable option to your chrome network throttle like so below where you throttle upload by some configurable amount. I chose to use 300 kbs/s upload
<img width="556" alt="Screen Shot 2020-07-23 at 11 26 01 AM" src="https://user-images.githubusercontent.com/1151048/88318015-5ab3f700-ccd7-11ea-9d9b-7e3649ec65de.png">

And then run an import of large value lists using a large enough file that it will exceed 5 minutes:
![screen-shot-upload](https://user-images.githubusercontent.com/1151048/88318584-28ef6000-ccd8-11ea-90a1-8ca4aafabcb4.png)

After 5 minutes you should see this message within your server side messages if you have configured your kibana.dev.yml to allow for these messages:

```ts
server  respons [10:52:31.377] [access:lists-all] POST /api/lists/items/_import?type=keyword 408 318292ms - 9.0B
``` 

Note that it should show you that it is trying to return a `408` after `318292ms` the timeout period. Sometimes you will get the 408 in the browser and sometimes the browser actually will not respect the 408 and continue staying in a pending state forever. This seems to be browser side issue and not a client/user land issue. If you get the browser message it will be this error toaster

![timeout-message](https://user-images.githubusercontent.com/1151048/88318760-74a20980-ccd8-11ea-9b7b-0d27f8eb6bce.png)

### Checklist

- [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/master/packages/kbn-i18n/README.md)
- [x] [Documentation](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#writing-documentation) was added for features that require explanation or tutorials
- [x] [Unit or functional tests](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#cross-browser-compatibility) were updated or added to match the most common scenarios

### For maintainers

- [x] This was checked for breaking API changes and was [labeled appropriately](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#release-notes-process)
FrankHassanabad added a commit to FrankHassanabad/kibana that referenced this pull request Jul 28, 2020
…t limits from 10 seconds for loads for imports (elastic#73103)

## Summary

By default the upload time limit for payloads is 10 seconds. This is really too short and we were getting internal QA bug reports that uploads are timing out on large value list importing. This PR adds the plumbing and unit tests to make the timeout configurable for routes.

* Adds a single timeout option for routes and then normalizes that through Hapi for the socket, payload, and server timeouts.
* Adds unit tests which test the various options
* Adds integration tests which test the various options
* Adds some NOTES about where there are odd behaviors/bugs within Hapi around validations and the timeouts
* Adds a configurable 5 minute timeout to the large value lists route

**Manual testing of the feature**

You can manually test this by adding a configurable option to your chrome network throttle like so below where you throttle upload by some configurable amount. I chose to use 300 kbs/s upload
<img width="556" alt="Screen Shot 2020-07-23 at 11 26 01 AM" src="https://user-images.githubusercontent.com/1151048/88318015-5ab3f700-ccd7-11ea-9d9b-7e3649ec65de.png">

And then run an import of large value lists using a large enough file that it will exceed 5 minutes:
![screen-shot-upload](https://user-images.githubusercontent.com/1151048/88318584-28ef6000-ccd8-11ea-90a1-8ca4aafabcb4.png)

After 5 minutes you should see this message within your server side messages if you have configured your kibana.dev.yml to allow for these messages:

```ts
server  respons [10:52:31.377] [access:lists-all] POST /api/lists/items/_import?type=keyword 408 318292ms - 9.0B
``` 

Note that it should show you that it is trying to return a `408` after `318292ms` the timeout period. Sometimes you will get the 408 in the browser and sometimes the browser actually will not respect the 408 and continue staying in a pending state forever. This seems to be browser side issue and not a client/user land issue. If you get the browser message it will be this error toaster

![timeout-message](https://user-images.githubusercontent.com/1151048/88318760-74a20980-ccd8-11ea-9b7b-0d27f8eb6bce.png)

### Checklist

- [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/master/packages/kbn-i18n/README.md)
- [x] [Documentation](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#writing-documentation) was added for features that require explanation or tutorials
- [x] [Unit or functional tests](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#cross-browser-compatibility) were updated or added to match the most common scenarios

### For maintainers

- [x] This was checked for breaking API changes and was [labeled appropriately](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#release-notes-process)
FrankHassanabad added a commit that referenced this pull request Jul 29, 2020
…t limits from 10 seconds for loads for imports (#73103) (#73581)

## Summary

By default the upload time limit for payloads is 10 seconds. This is really too short and we were getting internal QA bug reports that uploads are timing out on large value list importing. This PR adds the plumbing and unit tests to make the timeout configurable for routes.

* Adds a single timeout option for routes and then normalizes that through Hapi for the socket, payload, and server timeouts.
* Adds unit tests which test the various options
* Adds integration tests which test the various options
* Adds some NOTES about where there are odd behaviors/bugs within Hapi around validations and the timeouts
* Adds a configurable 5 minute timeout to the large value lists route

**Manual testing of the feature**

You can manually test this by adding a configurable option to your chrome network throttle like so below where you throttle upload by some configurable amount. I chose to use 300 kbs/s upload
<img width="556" alt="Screen Shot 2020-07-23 at 11 26 01 AM" src="https://user-images.githubusercontent.com/1151048/88318015-5ab3f700-ccd7-11ea-9d9b-7e3649ec65de.png">

And then run an import of large value lists using a large enough file that it will exceed 5 minutes:
![screen-shot-upload](https://user-images.githubusercontent.com/1151048/88318584-28ef6000-ccd8-11ea-90a1-8ca4aafabcb4.png)

After 5 minutes you should see this message within your server side messages if you have configured your kibana.dev.yml to allow for these messages:

```ts
server  respons [10:52:31.377] [access:lists-all] POST /api/lists/items/_import?type=keyword 408 318292ms - 9.0B
``` 

Note that it should show you that it is trying to return a `408` after `318292ms` the timeout period. Sometimes you will get the 408 in the browser and sometimes the browser actually will not respect the 408 and continue staying in a pending state forever. This seems to be browser side issue and not a client/user land issue. If you get the browser message it will be this error toaster

![timeout-message](https://user-images.githubusercontent.com/1151048/88318760-74a20980-ccd8-11ea-9b7b-0d27f8eb6bce.png)

### Checklist

- [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/master/packages/kbn-i18n/README.md)
- [x] [Documentation](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#writing-documentation) was added for features that require explanation or tutorials
- [x] [Unit or functional tests](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#cross-browser-compatibility) were updated or added to match the most common scenarios

### For maintainers

- [x] This was checked for breaking API changes and was [labeled appropriately](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#release-notes-process)
FrankHassanabad added a commit that referenced this pull request Jul 29, 2020
…t limits from 10 seconds for loads for imports (#73103) (#73582)

## Summary

By default the upload time limit for payloads is 10 seconds. This is really too short and we were getting internal QA bug reports that uploads are timing out on large value list importing. This PR adds the plumbing and unit tests to make the timeout configurable for routes.

* Adds a single timeout option for routes and then normalizes that through Hapi for the socket, payload, and server timeouts.
* Adds unit tests which test the various options
* Adds integration tests which test the various options
* Adds some NOTES about where there are odd behaviors/bugs within Hapi around validations and the timeouts
* Adds a configurable 5 minute timeout to the large value lists route

**Manual testing of the feature**

You can manually test this by adding a configurable option to your chrome network throttle like so below where you throttle upload by some configurable amount. I chose to use 300 kbs/s upload
<img width="556" alt="Screen Shot 2020-07-23 at 11 26 01 AM" src="https://user-images.githubusercontent.com/1151048/88318015-5ab3f700-ccd7-11ea-9d9b-7e3649ec65de.png">

And then run an import of large value lists using a large enough file that it will exceed 5 minutes:
![screen-shot-upload](https://user-images.githubusercontent.com/1151048/88318584-28ef6000-ccd8-11ea-90a1-8ca4aafabcb4.png)

After 5 minutes you should see this message within your server side messages if you have configured your kibana.dev.yml to allow for these messages:

```ts
server  respons [10:52:31.377] [access:lists-all] POST /api/lists/items/_import?type=keyword 408 318292ms - 9.0B
``` 

Note that it should show you that it is trying to return a `408` after `318292ms` the timeout period. Sometimes you will get the 408 in the browser and sometimes the browser actually will not respect the 408 and continue staying in a pending state forever. This seems to be browser side issue and not a client/user land issue. If you get the browser message it will be this error toaster

![timeout-message](https://user-images.githubusercontent.com/1151048/88318760-74a20980-ccd8-11ea-9b7b-0d27f8eb6bce.png)

### Checklist

- [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/master/packages/kbn-i18n/README.md)
- [x] [Documentation](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#writing-documentation) was added for features that require explanation or tutorials
- [x] [Unit or functional tests](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#cross-browser-compatibility) were updated or added to match the most common scenarios

### For maintainers

- [x] This was checked for breaking API changes and was [labeled appropriately](https://github.com/elastic/kibana/blob/master/CONTRIBUTING.md#release-notes-process)
@MindyRS MindyRS added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Sep 23, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:fix Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:SIEM v7.9.0 v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants