Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prebid Caching #663

Closed
bretg opened this issue Aug 31, 2018 · 18 comments
Closed

Prebid Caching #663

bretg opened this issue Aug 31, 2018 · 18 comments
Labels

Comments

@bretg
Copy link
Contributor

bretg commented Aug 31, 2018

This is a proposed set of additions around server-side caching that affects Prebid Server, Prebid Cache, Prebid.js, and Prebid SDK.

Background

Several Prebid use cases require that ad response creatives be cached server-side and subsequently retrieved upon Prebid’s bidWon event. Client-side caching is effective for the standard use case of client-side bidAdapters competing for web display ads through Prebid.js. Other integration types such as Prebid for Mobile, Prebid Video, and Prebid for AMP either cannot use client-side caching, or pay an undesirable performance penalty to do so.

Prebid's cache offering is the Prebid Cache server, which works alone and in conjunction with Prebid Server to implement some caching use cases.

Use Cases

Scenarios supported by this set of requirements:

  1. As a web publisher, I want to be able to use Prebid.js to serve video ads using a mix of bidders that support server-side and client-side caching of VAST XML. I want to be able to define the TTL when stored from the client so certain adunits (e.g. longer videos) may have custom TTL values.
  2. As a web publisher, I want to be able to use Prebid.js to serve video ads via Prebid Server, with the ability to define caching behavior.
  3. As an app developer, I want to be able to use Prebid SDK and Prebid Server to implement header bidding and minimize network traffic by utilizing server-side caching. I don't want the creative body to be returned in the result in order to save my user's network bandwidth and speed my application performance.
  4. As an operator of a prebid server cluster, I want to be able to host multiple independent datacenters in a region to provide fault tolerance.

New Requirements

These are features not currently supported by the Prebid caching infrastructure.

  1. The system should allow the publisher to define what gets cached in each supported scenario: either the whole bid or just the creative body.
  2. The system should allow the publisher's request to define whether the creative body (adm) should be returned even when cached. The default should be 'yes', because that's the current Prebid behavior.
  3. A full URL to the cached asset should be returned in each bid response.
    These attributes should be made available to renderers in all cache scenarios, including from the prebidServerBidAdapter
  4. The page should be able to specify an ad cache time-to-live (TTL) for each AdUnit. This is because some adunits may require longer cache periods than others. E.g. one customer wants to have a video unit where the VAST XML is cached for an hour while the default is 5 mins.
    1. Max TTLs should be configurable for each cache host company.
  5. Separate system mediaType default TTLs must be specifiable by the Prebid Server host company for video and for mobile. The hard coded system default TTL should be 300 seconds (5 mins) for both.
  6. The caching system should allow each publisher to be able to define their own TTL values by mediaType that override the system defaults.
  7. Prebid Server should use TTLs in this priority order:
    1. Request-specified TTL (e.g. this particular adunit has a TTL of 90 mins) (subject to configured Max TTL)
    2. Publisher mediaType configured TTL (e.g. all video for this publisher has a TTL of 60 mins) (server config)
    3. Format configured TTL (e.g. video on this cluster generally has a TTL of 30 mins) (server config)
    4. Hardcoded system default TTL (e.g. 5 min overall default) (server config)
  8. Operational reporting: Prebid Cache should log failed cache-writes and failed cache-reads as metrics.
  9. The system should support writing to multiple Prebid Cache servers. This enables operational redundancy so the same cache ID can be read from a cluster that didn't necessarily host the auction request. It would be better to do this with a distributed cache system, but this option could be useful for Prebid Server host companies.
    1. The HTTP return code should be for the primary local cluster write.
    2. Failures writing to a secondary cluster should be logged as a metric and to the local log file.
  10. Prebid Server should return an additional key value pair when an item was cached: hb_cache_hostpath. This value should be configurable for each cluster. It could be used by the Prebid Universal Creative to parameterize the cache settings for better portability.

Security

Security requirements for caching:

  1. The system should attempt to prevent specific cache IDs be written from unauthorized sources. The goal is to prevent an attack where malware is inserted into the cache on a valid key that might be retrieved by a user.
  2. The system should be able to detect suspicious cache write behavior, such as one client inserting a large number of entries.
  3. All cache writes and retrievals should be done over HTTPS.

Proposed OpenRTB2 request and response

Request extensions:
{
…
  "imp": [{
      "exp": 3600,    // openRTB location for request TTL
      ...
  }],
  "ext": {
    "prebid": {
      "cache": {
        "vastXml": {
          returnCreative: false,   // new: don't return the VAST, just cache it
        },
        "bids": {
          returnCreative: true, 
        }
      }
    }
  }
… 
}

Response extensions:

{
…
  "seatbid": [{
    "bid": [{
      …
      "ext": {
        "bidder": {
          ...
        },
        "prebid": {
          "targeting": {
             …
             "hb_cache_hostpath": "prebid.adnxs.com/pbc/v1/cache"
             … 
          },
          "cache": {
             "vastXml": {
                 "url": "FULL_CACHE_URL_FOR_THIS_ITEM",
                 "cacheId": "1234567890A"
             },
             "bids": {
                 "url": "FULL_CACHE_URL_FOR_THIS_ITEM",
                 "cacheId": "1234567890B"
             },
           }
         }
       }
    }
  }],
… 
}

Proposed Prebid.js Configuration

Prebid.js needs to be updated to allow the publisher to specify caching parameters. Suggested config:

pbjs.setConfig({
  "cache": {
    url: "https://prebid-server.pbs-host-company.com/cache",
    ttl: 300
  },
  "s2sConfig": {
    …
    "video": {           // new format selector
      "ext.prebid": {    // merged into the openRTB2 request
        "cache": {
          "vastXml": {
            returnCreative: false
          }
        }
      }
    }
    …
  }
});

Appendix - Changes to current systems

If all requirements above are to be implemented these are the changes that would be required.

Prebid.js - better support for s2s video header bidding

  • prebidServerBidAdapter: s2sConfig 'video.ext.prebid' support
  • prebidServerBidAdapter: making response.ext.prebid.cache values available.
  • prebidServerBidAdapter: always add ext.prebid.targeting.includewinners: true for openrtb
  • support ttl cache parameter

Prebid Cache

  • Support secondary cache config (cross-datacenter replication)
    • Accept and process new query parameter: secondaryCache
  • Establish graphite metrics for errors
    • org.prebid.cache.handlers.PostCacheHandler.error_existing_id
    • org.prebid.cache.handlers.PostCacheHandler.remote_error_rate
    • org.prebid.cache.handlers.GetCacheHandler.system_error_rate

Prebid Server

  • Accept and process new request parameter: returnCreative
  • Add new cache params to response
  • Generate hb_cache_hostpath targeting variable

Prebid SDK - in a server-side caching scenario

  • Add returnCreative=false to openRtb
  • Add asyncCaching option to SDK, pass async option through openRtb when specified
  • Add ttl option to SDK, pass ttl option through openRtb when specified
  • Make cache.url in response available to app code

Prebid Universal Creative

  • Support hb_cache_hostpath

(Note: async caching feature split out into #687)

@bretg
Copy link
Contributor Author

bretg commented Sep 7, 2018

Updated response example

@bretg
Copy link
Contributor Author

bretg commented Sep 12, 2018

Made a number of updates after feedback from AppNexus team:

  • removing cacheHost from resp.bid.ext.prebid.cache
  • supporting req.imp.exp instead of a value in req.ext.prebid.cache
  • moving per-account async config to be server-side
  • removed IP address detection security mechanism
  • clarified that PBS-assigned UUIDs support only async writes, not the datacenter replication scenario
  • added requirement for hb_cache_hostpath targeting variable
  • updated PBC metrics to utilize the metrics already implemented and fit new ones into that structure

Going to discuss the "two-endpoint" architecture with the team tomorrow.

@bretg
Copy link
Contributor Author

bretg commented Sep 17, 2018

Got feedback from another internal review that the ttl parameter on the PBC query string is unnecessary -- it's already supported within the protocol packet. So the proposal is to update PBJS to take a ttl argument on the cache object in setConfig and add it appropriately to the cache request.

@dbemiller
Copy link
Contributor

Could you give more details on what you mean by "within the protocol packet?"

@bretg
Copy link
Contributor Author

bretg commented Sep 18, 2018

Followup on the "two-endpoint" architecture. We've confirmed that both Redis and Aerospike support a mode where a given key can't be overwritten, and that performance of this mode is good. There's a slight cost (~10%). The proposal is that we make this feature configurable so PBS host companies can make the tradeoff between security and performance. So we don't intend to split out the uuid-specification feature to a separate endpoint -- instead, added requirement 21:

  1. The cache server should also have a configuration which defines whether uuid is accepted as a parameter. The general idea is that a PBS cluster will run in one of two modes: either the caching layer prevents cache entries from being overwritten or the cache won't accept UUIDs on the request, which disables the 'asynchronous cache' feature.

@bretg
Copy link
Contributor Author

bretg commented Sep 18, 2018

Could you give more details on what you mean by "within the protocol packet?"

Apparently Prebid Cache Go and Prebid Cache Java have diverged more than I realized. The Java version supports an 'expiry' attribute on the POST. And a uuid key.

@dbemiller
Copy link
Contributor

dbemiller commented Sep 18, 2018

The proposal is that we make this feature configurable so PBS host companies can make the tradeoff between security and performance.

Imagine the experience of a publisher who wants to switch PBS host companies, or one who starts out running PBS themselves and decides to use a host company instead because it's more trouble than it's worth.

Imagine a publisher trying to read documentation to figure out how to use PBS, if the behavior depends on configs that they can't even see, or which a host company might change at any time without their notice.

This seems like a bad idea for everyone involved.

@bretg
Copy link
Contributor Author

bretg commented Sep 18, 2018

Here's the proposed story:

  • As a PBS host company, you can decide whether to support asynchronous caching or not.
  • As a publisher, you can decide which PBS host company you want to use, considering many factors, including whether you want the asynchronous caching feature.

This does not appear to be an unreasonable or unworkable situation.

Having a two-VIP architecture adds fairly significant complexity in setup and debug. So it would only be utilized by PBS host companies that want to support asynchronous caching. So really it comes down to what sort of complexity is required to support asynchronous caching:

  1. two-vip solution
  • setup an internal-only VIP that responds to /cache-uuid
  • configure PBS to use /cache-uuid (per account)
  • no need for a caching layer that supports overwrite prevention
  1. configuration
  • configure PBS to use async caching (per account)
  • use a caching layer that supports overwrite prevention
  • configure PBC to support async caching (utilizing overwrite prevention)

Both cases require configuration, but #2 has fewer moving parts to break.

@bretg
Copy link
Contributor Author

bretg commented Sep 18, 2018

We do need to address the divergence between PBC-Go and PBC-Java. More on that in a separate conversation.

@dbemiller
Copy link
Contributor

Might be a good idea to break this proposal into smaller pieces. Many parts of it are good ideas no matter what... but there's a lot to discuss about this async one.

Our consensus over here as basically: "let's run an experiment." Config & publisher-facing options are great if there are legitimately good reasons to make different choices... but they're horrible if one way is just "better".

My intuition here is that async would just be better across the board... but intuition counts for much less than concrete math or experimental data.

If you're open to this, I can open a new issue for it and we can discuss in more detail.

@bretg
Copy link
Contributor Author

bretg commented Sep 18, 2018

Yes, we can leave the async caching feature aside for now.

Have split out the relevant requirements into a separate issue -- #687

@dbemiller
Copy link
Contributor

dbemiller commented Sep 20, 2018

Min and max TTLs should be configurable for each cache host company.

Max TTL config makes sense because host companies have hardware capacity... but what's the use-case for min TTLs?

The caching system should allow each publisher to be able to define their own TTL values by mediaType that override the system defaults.

The publisher already has per-AdUnit cache control in (4)... so this introduces a data redundancy in the request.

I see how this would be a convenient option for publishers... but it's worth noting that the Prebid Server API isn't really publisher-facing. Publishers use PBS through Prebid.js, and edit Stored Requests through a GUI.

Prebid Server should use TTLs in this priority order:

Asking for clarification: where do you see the configs the host company sets in this hierarchy?

Our opinion was that the "max TTL" config took precedence over everything, because only the host company knows what their hardware can support.

@hhhjort
Copy link
Collaborator

hhhjort commented Sep 20, 2018

Adding some support for reading exp from the imp and bid, and sending a ttl to prebid cache appropriately. Short term this will help optimize cache utilization. #684

@bretg
Copy link
Contributor Author

bretg commented Oct 1, 2018

what's the use-case for min TTLs?

It doesn't make sense to cache for less than a couple of seconds - it's an edge case, but the idea is to avoid read misses.

where do you see the configs the host company sets in this hierarchy?

Most of them are host company configs

  • Request-specified TTL (e.g. this particular adunit has a TTL of 90 mins) (from request)
  • Publisher mediaType configured TTL (e.g. all video for this publisher has a TTL of 60 mins) (in server config)
  • Format configured TTL (e.g. video on this cluster generally has a TTL of 30 mins) (in server config)
  • Hardcoded system default TTL (e.g. 5 min overall default) (in config and code?)

The idea behind PBS account-level config is that overrides will be rare and can be supported as config by the PBS host company for important accounts.

Also - updated the cache response to be able to carry cache urls for both vastXml and bids. This accounts for the use case where both are requested.

@hhhjort
Copy link
Collaborator

hhhjort commented Oct 1, 2018

Since we have stored requests, I am not sure that publisher level default TTLs are that needed. The stored requests do provide an even granular control, with the downside that it must be set per stored request rather than simply per media type. I am not against it per se, but would rather wait and see if there is a demand before adding it preemptively.

There is also the issue of adding too many controls on the TTL. The more rules we have as to how to set the TTL, the more difficult it becomes to debug why the cache expires when it does. And of course the system needs to run through the entire logic tree to determine the actual TTL on every cache request, which can eat up resources and latency.

For min TTL, I think it may be better to just let the ads fail to cache and have the issue caught quickly, rather than trying to second guess what the publisher meant. For example, let us say that we have a default TTL of 5 minutes, but the publisher realizes it can sometimes take a bit more than 5 minutes before the cache call is made. They want to bump it up to 10 minutes, but accidentally set it to 10 seconds instead. Now if we had a min TTL of 2 or 5 minutes, that TTL might still be enough to get the majority of the publisher's calls. But it could lead to a lot of confused debugging as they try to determine why the bump in TTL did not improve the cache performance, and perhaps degraded it. If however we let the 10 second TTL stand, they should recognize and catch the issue fairly quickly, and get the TTL they actually want in place much sooner.

@dbemiller
Copy link
Contributor

Most of them are host company configs

Yeah... sorry, I wasn't clear. I meant to ask about the Max TTL allowed by PBS host. You listed it as a requirement in (4), but it wasn't clear where it sat in the hierarchy of (7).

It seems to me like that should take the highest precedence, since otherwise it's a hardware liability for the PBS host.

@bretg
Copy link
Contributor Author

bretg commented Oct 11, 2018

  • Updated (7) to clarify that the incoming TTL is compared to the configured max. The other values are in the server config, so are under the control of the host company.
  • Remove min TTL

Here's the pseudo-code implemented by PBS-Java

if imp.exp then use that or configuredMaxTTL 
else if ext.prebid.cache.*.ttlseconds then use that or configuredMaxTTL 
else if account ID is available from request.app.publisher or request.site.publisher
       if mediatype config for the account is set up, use that
else if mediaType config for the cluster is set up, use that
else, finally, just use the default

Here are the server config values in the PBS-Java PR

  • auction.cache.expected-request-time-ms - approximate value in milliseconds for Cache Service interacting. This time will be subtracted from global timeout.
  • auction.cache.banner-ttl-seconds - how long (in seconds) banner will be available in Cache Service.
  • auction.cache.video-ttl-seconds - how long (in seconds) video creative will be available in Cache Service.
  • auction.cache.account.<ACCOUNT>.banner-ttl-seconds - how long (in seconds) banner will be available in Cache Service
    for particular publisher account. Overrides cache.banner-ttl-seconds property.
  • auction.cache.account.<ACCOUNT>.video-ttl-seconds - how long (in seconds) video creative will be available in Cache Service
    for particular publisher account. Overrides cache.video-ttl-seconds property.

It may be reasonable to place the account-level values in the Accounts DB table at some point, but for now we don't envision these values changing much, don't really want to encourage non-standard timeouts, and reading/caching/updating DB entries is harder than static config.

@stale
Copy link

stale bot commented Aug 8, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 8, 2019
@stale stale bot closed this as completed Aug 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants