proposal: requirement for having reasonable timeouts for (external) services #1778

elarlang · 2023-11-08T07:31:20Z

update: to skip long discussion, jump here: #1778 (comment)

Via #1777 we have proposed new category with 2 requirements:

## V12.7 Application Resources

#	Description	L1	L2	L3	CWE
12.7.1	[ADDED] Verify that the application is designed in a way that a failure to access external resources does not result in the entire application failing, for example using the circuit breaker pattern.		✓	✓
12.7.2	[ADDED] Verify that the application proactively releases system resources, such as database connections, open files, threads, etc, when it finishes using them to prevent resource exhaustion.		✓	✓	404

For 12.7.1 there is issue #1748 and for 12.7.2 there is issue #1620.

At the moment it seems we need one more requirement which addresses timeout for connections:

12.7.1 is for (fail safe) "when you can not connect at all"
12.7.2 is for "when everything worked well, but you did not release the resources"

The point for proposed requirement - "when you are able to connect, but you can not get answer with expected time". Personally I think the likelihood for this scenario is much higher than for the proposed 12.7.1 and 12.7.2.

From denial of service point of view - it matters when it is not done for synchronous flow, such as calling some external connection / resource when building response for HTTP request.

It can be problematic for cron-jobs as well, but there it does not increase the likelihood for DoS that much.

The text was updated successfully, but these errors were encountered:

tghosth · 2023-11-08T12:59:15Z

I think this should be added to 12.7.1:

#	Description	L1	L2	L3	CWE
12.7.1	[ADDED] Verify that the application is designed in a way that a failure to access external resources or a delay in response does not result in the entire application failing, for example using the circuit breaker pattern.		✓	✓

elarlang · 2023-11-08T13:13:32Z

12.7.1 - the focus is - when connection fails, then application should still give at least some answer - so the situation must be handled via error handling

the focus for proposed one is connectionpool lockup - every connection must have (relatively short) timeout, maybe also connection pool+queue and when it's full - it already gives response that service is not available, not putting another request to the queue and cause denial of service

Idea is to address things like:

CWE-410: Insufficient Resource Pool
CWE-1088: Synchronous Access of Remote Resource without Timeout

Here also extra logging requirement that all connection fails and timeouts must be logged.

tghosth · 2023-11-09T09:28:02Z

12.7.1 - the focus is - when connection fails, then application should still give at least some answer - so the situation must be handled via error handling

I don't agree. The focus on making sure the application still operates correctly that could be providing a sensible error message but also carrying on other operations as well.

Please can you suggest a requirement text, I am still not seeing the difference you are proposing.

elarlang · 2023-11-11T16:26:09Z

Maybe there are even 2 requirements.

Connection pool:

Problem to solve: if by default the connection pool to resource is really small, it will be bottle-neck for peak demand, all the connection pool is locked and the application can not provide availability (denial of service attack).
Requirement: Verify that the application has reasonable connection pool for every internal and external connection to handle peak demand.
CWE: CWE-410

Timeout:

Problem to solve: if there is problem on the external resource side or there is connection issue or there is really slow query (impact by attacker activity, mistake in program code, heavy demand, performance issues) then there is no connection pool lockout.
Requirement: Verify that the application has set reasonable timeout for every synchronous call to external service (to avoid connection pool lockout when there is connection or performane problem)
CWE: CWE-1088

tghosth · 2023-11-14T16:16:13Z

Time out / CWE-1088

We currently have:

#	Description	L1	L2	L3	CWE
12.7.1	[ADDED] Verify that the application is designed in a way that a failure to access external resources does not result in the entire application failing, for example using the circuit breaker pattern.		✓	✓

The whole point of circuit breaker pattern is to prevent this sort of lockout/lockup when relying on external resources. I think maybe we should just add CWE 1088 to this requirement. Your text seems like an example of the situation covered by this requirement.

Connection pool / CWE-410

We currently have:

#	Description	L1	L2	L3	CWE
12.7.2	[ADDED] Verify that the application proactively releases system resources, such as database connections, open files, threads, etc, when it finishes using them to prevent resource exhaustion.		✓	✓	404

I think your connection pool is similar case to this. but maybe we can add it more explicitly like this:

Verify that the application proactively releases system resources, such as database connections, open files, threads, etc, when it finishes using them or alternatively uses an appropriately sized resource pool to ensure resource reuse and to prevent resource exhaustion.

What do you think @elarlang ?

elarlang · 2023-11-14T17:05:58Z

If you have circuit braking handled (application does not die from program code logic perspective / the error is handled) - it does not eliminate the need to have timeouts set (and vice versa). Those are separate problems.

With second example for me it also combined two different problems to one requirement.

In a bit bigger picture - we created new separate section for those requirements, modern applications use external services a lot and those things getting more and more important. Shorter, one problem per requirement, requirements are better in my opinion.

jmanico · 2023-11-14T19:19:15Z

I personally this this is not necessary and see it more as a functionality vs security problem. Suggest we drop it.

elarlang · 2023-11-14T19:42:17Z

I personally this this is not necessary and see it more as a functionality vs security problem. Suggest we drop it.

If we talk about timeouts here - I have proved in pen-test cases it is clearly security issue. If there is some time consuming call to external service, you find the really slow response response build, hammer it, take all resources in use because they don't have reasonable timeout and it's service down. Clear availability issue.

elarlang · 2023-11-14T19:49:35Z

App level DoS is serious thing. Especially, when there is high SLA and availability requirement and/or it provides service to another applications. For me this vector is underestimated in general and also maybe under-covered in ASVS.

Everyone talk about DDoS like just sending huge amount of packets towards application. This is just stupid way to do it. Mostly it's doable with pushing some resource demanding and/or time-consuming service and you can take the service busy. From big picture network level you can not even notice it.

elarlang · 2023-11-14T20:16:14Z

I try to conclude the situation here:

12.7.1 - circuit braking - application must still give reasonable response, if there is issue with connection (it suits also to category 7.4)
12.7.2 - release resources after finishing using them
+1 - have timeouts set for avoiding app level dos or collection dying connection to resource pool lockout
+1 - have reasonable connection pool set that you can not occupy all connection with dos or it just won't happen with peak visiting

Those are all separate problems. If you just fix 3 of them, the 4th one is still an issue (for me personally 12.7.2 is the less problematic here, but it's separate topic)

tghosth · 2023-11-16T12:23:51Z

I think I could get on board with the connection pool requirement.

Maybe:

Verify that the application uses appropriately sized resource pools or can dynamically resize them to ensure that a volumetric Denial of Service or peak-time load will not lead to the application becoming unavailable.

On the other hand, I think the circuit breaker pattern comes to specifically address the timeout problem to the extent that after a while it stops trying altogether

We could modify to:

Verify that the application is designed in a way that a failure to access external resources within pre-defined timeouts does not result in the entire application failing, for example using the circuit breaker pattern.

elarlang · 2023-11-16T16:47:46Z

Verify that the application uses appropriately sized resource pools or can dynamically resize them to ensure that a volumetric Denial of Service or peak-time load will not lead to the application becoming unavailable.

For dynamic size there should be also max-limit. If you have "auto-scale" cloud environment and attacker takes resource pool in use, then this constant auto-scale you have later written to the bill as well.

Like I wrote before, then circuit braking must only handle the fact that application can give response, if there is whatever connection issue.

tghosth · 2023-11-23T14:55:55Z

Like I wrote before, then circuit braking must only handle the fact that application can give response, if there is whatever connection issue.

Following discussion with Elar, the current "circuit breaker" requirement is:

Verify that the application is designed in a way that a failure to access external resources within pre-defined timeouts does not result in the entire application failing, for example using the circuit breaker pattern.

This is a slightly reactive mechanism that is basically telling the application to fail gracefully without completely becoming unavailable.

Elar's point is that there should be proactive use of timeouts to ensure that a large/complicated query will not run for longer than the timeout and afterwards the resource will be freed.

A caveat is that even if timeouts are implemented in the application, the database may still be processing the query and therefore tied up executing it.

elarlang · 2023-12-15T21:01:42Z

Just for linking update - 12.7.2 in the issue definition is now 12.7.1, and previous 12.7.1 is now 7.4.4 (see e5c4a4b).

At the moment it stands like this:

#	Description	L1	L2	L3	CWE
7.4.4	[ADDED] Verify that the application is designed in a way that a failure to access external resources does not result in the entire application failing, for example using the circuit breaker pattern.		✓	✓
12.7.1	[ADDED] Verify that the application proactively releases system resources, such as database connections, open files, threads, etc, when it finishes using them to prevent resource exhaustion.		✓	✓	404

tghosth · 2023-12-28T10:43:56Z

So did you still want to add another requirement here?

tghosth · 2024-01-24T09:41:29Z

@elarlang is this on your todo list?

elarlang · 2024-01-24T09:48:33Z

@elarlang is this on your todo list?

yes. I keep labels (at least those that are related to me) up-to-date, so if labels are saying so, it is so.

jmanico · 2024-07-29T11:53:44Z

I like the idea of moving 7.4.4 to 12.7.2

So far we have

#	Description	L1	L2	L3	CWE
12.7.1	[ADDED] Verify that the application proactively releases system resources, such as database connections, open files, threads, etc, when it finishes using them to prevent resource exhaustion.		✓	✓	404

Then moving 7.4.4 we have

#	Description	L1	L2	L3	CWE
12.7.1	[ADDED] Verify that the application proactively releases system resources, such as database connections, open files, threads, etc, when it finishes using them to prevent resource exhaustion.		✓	✓	404
12.7.2	[MOVED from 7.4.4] Verify that the application is designed in a way that a failure to access external resources does not result in the entire application failing, for example using the circuit breaker pattern.		✓	✓

jmanico · 2024-09-04T15:46:27Z

I think 12.7.1 is more of an API issue and does not belong in 12.x. I suggest this is moved to 13.x. Timeouts are rarely a problem with file upload services or file download and I think it confuses the intent of this section.

elarlang · 2024-09-04T19:47:56Z

Comment for last Jim's comments:

note that 7.4.4 was moved away from this 12.* section and is not proposed as an idea to move it back (it is about error situation handling in the program code)
V12 title is "File and Resources" and V12.7 title is "Application Resources". If there is proposal to move 12.7 to somewhere else, it is a separate issue to discuss.

I need to work through my previous ideas to provide proposals.

jmanico · 2024-09-05T09:08:07Z

I like the idea of dropping 12.7.1 and all of section 12.7 from v12. Let's do it! v12 is close to being "done" at some level.

tghosth · 2024-09-08T18:11:55Z

hi @elarlang, can you confirm what the currently proposed action here is?

tghosth · 2024-11-05T10:59:01Z

Waiting for feedback from @elarlang

cronchie · 2024-11-07T12:10:17Z

I'd be open to moving/modifying the current 12.7.1 into the proposed text of #2281. The circuit breaker pattern is a fantastic retry strategy, and coupled with the backoff + jitter proposed in #2281, it can help protect against DoSing external applications.

elarlang · 2024-11-10T10:48:20Z

Awakening the monster here...

I reworked comments from the issue and made a summary with some new point of views, we can say that this comment is a new starting point without the need to go through the previous comments here.

First:

"circuit breaker" fail-safe is covered by 7.4.4
release resources after (successful) usage is covered by 12.7.1

We also have requirement for anti-automation that is closely related and maybe also covering some problems:

V11.2.2 Verify that anti-automation controls are in place to protect against excessive calls to application functions that could lead to data exfiltration, garbage data creation, quota exhaustion, rate limit breaches, denial of service, or overuse of costly resources.

The issue is, that connection-pool and resource-demanding or slow-responses you often run under the general "anti-automation" radar. Most likely we need to improve V11.2.2, avoid duplicates, check overlaps etc.

Problems to solve/cover:

Problem 1: The connection pool is too small
- Example: gets filled when attacker hammering some resource-heavy functionality
- Example: gets filled by peak-demand
- area/solution: general availability and performance testing area; configuration, and auto-scale question - auto-scale must be with limit, otherwise the bill also auto-scales. May require documented decision per connection/service used
Problem 2: connection failure time-out is too long
- Example: service (to connect) is not available and the application waits for too long
  - classical solution - waiting for timeout 1 minute, although it is clear, that if the connection is not successful in a few seconds, it is failed anyway
  - attacker can fill connection pool fast
  - for heavily used functionality connection pool gets filled without an attacker
- area/solution: for just connectivity, it fails fast. May require documented decision per connection/service used
Problem 3: slow-response time-out is too long
- complicated area, as sometimes services are resource and time-demanding to produce a response
- Example: response from (an external) service takes too long but without reaching the timeout
  - an attacker can hit the endpoint and keep the server busy and the connection pool full
  - may not be only a connection pool problem, as resource-demanding responses may occupy other available resources, such as memory
- Example: response from (an external) service takes too long and front-end connection gets timeout - but backend still processing the request, although there is no one waiting for the response anymore
  - classical example - slow database query, front-end gets timeout but database engine still processes the query, but no one waits for the answer
  - easy-dos scenario for an attacker
  - may happen under heavy usage
- area/solution:
  - The fail-strategy must be analyzed per service used (can be different for different services using the same connection)
  - general definition for slow-queries AND if building some response is resource demanding, it must happen asynchronously with parallel-process count limit
  - Definition for slow-query per functionality/service call - to have alerting and monitoring capability. From "a security event" point of view is covered by V1.7+V7 requirements, but the problem or need can be highlighted in a documentation requirement.

elarlang mentioned this issue Nov 8, 2023

Resource exhaustion, improper closing of resources. #1620

Closed

elarlang added the 1) Discussion ongoing Issue is opened and assigned but no clear proposal yet label Nov 8, 2023

elarlang assigned tghosth and elarlang Nov 8, 2023

elarlang added the V12 label Nov 8, 2023

elarlang unassigned tghosth Nov 9, 2023

elarlang added the 2) Awaiting response Awaiting a response from the original poster label Nov 9, 2023

elarlang assigned tghosth Nov 11, 2023

elarlang removed the 2) Awaiting response Awaiting a response from the original poster label Nov 11, 2023

elarlang mentioned this issue Nov 14, 2023

Add a new 12.7 Application Resources application resources section #1777

Merged

elarlang mentioned this issue Nov 24, 2023

clarification: application need to behave the situation when some resource is not available (current 12.7.1) #1748

Closed

elarlang unassigned tghosth Dec 28, 2023

elarlang added the 3) awaiting proposal There is some discussion in issue and reach to some results but it's not concluded with clear propos label Dec 28, 2023

tghosth added the _5.0 - prep This needs to be addressed to prepare 5.0 label Jan 24, 2024

jmanico mentioned this issue Nov 5, 2024

Update 0x18-V10-Coding.md #2222

Merged

elarlang mentioned this issue Nov 7, 2024

V7 Add transient error handling #2281

Open

tghosth assigned cronchie and unassigned elarlang Nov 7, 2024

elarlang self-assigned this Nov 10, 2024

elarlang added 1) Discussion ongoing Issue is opened and assigned but no clear proposal yet next meeting Filter for leaders and removed 2) Awaiting response Awaiting a response from the original poster labels Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: requirement for having reasonable timeouts for (external) services #1778

proposal: requirement for having reasonable timeouts for (external) services #1778

elarlang commented Nov 8, 2023 •

edited

Loading

tghosth commented Nov 8, 2023

elarlang commented Nov 8, 2023 •

edited

Loading

tghosth commented Nov 9, 2023

elarlang commented Nov 11, 2023

tghosth commented Nov 14, 2023

elarlang commented Nov 14, 2023 •

edited

Loading

jmanico commented Nov 14, 2023

elarlang commented Nov 14, 2023

elarlang commented Nov 14, 2023

elarlang commented Nov 14, 2023

tghosth commented Nov 16, 2023

elarlang commented Nov 16, 2023

tghosth commented Nov 23, 2023 •

edited

Loading

elarlang commented Dec 15, 2023 •

edited

Loading

tghosth commented Dec 28, 2023

tghosth commented Jan 24, 2024

elarlang commented Jan 24, 2024

jmanico commented Jul 29, 2024 •

edited

Loading

jmanico commented Sep 4, 2024

elarlang commented Sep 4, 2024

jmanico commented Sep 5, 2024

tghosth commented Sep 8, 2024

tghosth commented Nov 5, 2024

cronchie commented Nov 7, 2024

elarlang commented Nov 10, 2024 •

edited

Loading

proposal: requirement for having reasonable timeouts for (external) services #1778

proposal: requirement for having reasonable timeouts for (external) services #1778

Comments

elarlang commented Nov 8, 2023 • edited Loading

tghosth commented Nov 8, 2023

elarlang commented Nov 8, 2023 • edited Loading

tghosth commented Nov 9, 2023

elarlang commented Nov 11, 2023

tghosth commented Nov 14, 2023

Time out / CWE-1088

Connection pool / CWE-410

elarlang commented Nov 14, 2023 • edited Loading

jmanico commented Nov 14, 2023

elarlang commented Nov 14, 2023

elarlang commented Nov 14, 2023

elarlang commented Nov 14, 2023

tghosth commented Nov 16, 2023

elarlang commented Nov 16, 2023

tghosth commented Nov 23, 2023 • edited Loading

elarlang commented Dec 15, 2023 • edited Loading

tghosth commented Dec 28, 2023

tghosth commented Jan 24, 2024

elarlang commented Jan 24, 2024

jmanico commented Jul 29, 2024 • edited Loading

jmanico commented Sep 4, 2024

elarlang commented Sep 4, 2024

jmanico commented Sep 5, 2024

tghosth commented Sep 8, 2024

tghosth commented Nov 5, 2024

cronchie commented Nov 7, 2024

elarlang commented Nov 10, 2024 • edited Loading

elarlang commented Nov 8, 2023 •

edited

Loading

elarlang commented Nov 8, 2023 •

edited

Loading

elarlang commented Nov 14, 2023 •

edited

Loading

tghosth commented Nov 23, 2023 •

edited

Loading

elarlang commented Dec 15, 2023 •

edited

Loading

jmanico commented Jul 29, 2024 •

edited

Loading

elarlang commented Nov 10, 2024 •

edited

Loading