Refreshing file-based stored configs #1402

laurb9 · 2020-07-16T18:55:30Z

Refreshable FileFetcher

The current FileFetcher eagerly loads all the configs in memory and never looks back. It does so by recursively scanning the directory, possibly intended to support a hashed path or other file organization. This makes it fast and shareable, but less suited if stored requests need to be updated within the lifetime of the server process, which is our use case.

I am proposing a FileFetcher whose behavior is more consistent with the other Fetchers, as in can retrieve items on-demand, but does not store the data itself. This allows it to work with the Cache like the others, which in turn makes it refreshable in a way that is configurable externally.

Implementation

If we need to maintain compatibility with existing path structures, the existing FileFetcher is not ideal for this. The recursive directory ingestion means that the 1:1 mapping between the requested key and the resource, needed for on-demand loading, does not exist. We could attempt direct mapping when it exists (flat dir), but this will result in inconsistent behavior.

Alternatively, a separate clean FileFetcher can be implemented with the defined behavior.

(edit) now that I wrote all this, I realize two filefetchers can be confusing so a config mode switch is probably better.

SyntaxNode · 2020-07-16T22:07:43Z

Thank you for the proposal.

The most common use case of the FileFetcher is local testing, which is likely why it was built to only load configurations at startup. I certainly have no issue with it being expanded to also include a lazy loading approach. I don't recall how the Cache wrapper is wired up, but so long as it would apply to the changes proposed, I'm on board.

I agree having two top level FileFetchers can be a bit confusing.

It looks like the FileFetcher currently returns an eagerFetcher. A nice approach could be implement a lazyFetcher which always goes direct to the file system and like you suggested, using a config flag to decide which one to instantiate. Perhaps another option can be to create a less forgiving Cache layer which binds to the first read and never again refreshes - moving that concern out of the FileFetcher entirely.. Again though, I'm not sure where the Cache is wired up and how easy that may be to change.

By "recursively scanning the directory" are you referring to it loading all files in the directory as opposed to searching for a specific file on demand? That's an area of code I haven't looked at in a while. I know there's an option somewhere in there to load a specific stored request... but I don't know how that's expressed in code. Do you know where that is?

(edit) now that I wrote all this, I realize two filefetchers can be confusing so a config mode switch is probably better.

+1

Are you planning to implement this proposal?

laurb9 · 2020-07-17T04:25:20Z

On closer look, I was wrong about the mapping. One only expects to find stored_imps, stored_requests and category mappings with a flat structure under them, even though the code would continue to recurse were that not the case (collectStoredData in https://github.com/prebid/prebid-server/blob/master/stored_requests/backends/file_fetcher/fetcher.go#L84). So the 1:1 mapping can in fact be done.

There seem to be two code bases in pbs, legacy and openrtb2, with different sets of backends, metrics and support code, and I often find myself in the wrong place after a code search... but I think the cache for openrtb2 is a layer on top of a set of the other fetchers - file, db, api: https://github.com/prebid/prebid-server/blob/master/stored_requests/fetcher.go#L157
The overall fetcher setup is a bit involved because it attempts to fall back down the list to complete the list of ids it couldn't find in earlier stages.

I think replacing with a lazyFetcher in this case could work, I'll give it a try. An option to preload all would replicate existing behavior but it would be simpler without. I'd like to see if this can be used for account configs as well, in lieu of listing all of them in a yaml file (I opened #1395 about that).

SyntaxNode · 2020-07-17T18:50:28Z

So the 1:1 mapping can in fact be done.

Fantastic.

There seem to be two code bases in pbs, legacy and openrtb2,

Correct. The legacy endpoints and model are outdated and no longer maintained. You don't need to worry about feature parity. We're working on defining an end date to remove that code completely. I think PrebidJS dropped their support for the legacy endpoints in 3.0.

The overall fetcher setup is a bit involved because it attempts to fall back down the list to complete the list of ids it couldn't find in earlier stages.

That's intended behavior. We want to support multiple fetchers to allow hosts to use multiple sources.

I think replacing with a lazyFetcher in this case could work, I'll give it a try.

Awesome! That's really appreciated.

I'd like to see if this can be used for account configs as well, in lieu of listing all of them in a yaml

I commented in the other issue. @bsardo is working on some aspects of account configs. It would great if you could collaborate and work on different parts of the account config system.

laurb9 · 2020-07-21T04:00:53Z

@SyntaxNode This is in principle what I was thinking about regarding the lazy fetcher: #1411
It's a quick draft but it's shorter and tests are passing. And adding account fetcher would take a few more lines.

I've found that FetchCategories is written differently and still caches the parsed json forever. Because this happens in http_fetcher too, a different approach looking upstream of it would be better.

Demonstrates the concept of fetching account configurations via the fetcher framework. db and http api can be added fairly easily on top. The account schema is defined in config so it can be loaded from yaml too if want. ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ``` Introduces the concept of default account, to control global account settings that were originally in the main config. For example, this can replace account_required=true: ``` default_account: disabled: true ``` - included sample test account (disabled) - no unit tests - no cache yet, turns out cache setup is specific to stored requests - no refactoring, although refactoring the code around could help a lot

- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ``` fixup

- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ```

- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ``` Added http api for accounts Same model as for requests aka `GET {endpoint}?account=["<accountID>"]`, to support the same API. AccountFetcher returns json.RawMessage - Use DefaultAccount as account defaults for every account read Changed DefaultAccount -> AccountDefaults. Use AccountDefaults.Disabled to require valid accounts to exist. Revert a02288d to remove account support from http api Without cache support, it is not production ready. Address feedback- check for empty pubID first in getAccount()- getAccount() returns nil account with error, instead of sometimes account and sometimes nil- remove not-yet used account fields from schema- remove warning about blacklisted_accts- fix accountDefaultDisabled parameter naming in tests getAccount() returns collected errors as a slice so they can be returned in the response amp_auction didn't handle error slice so I added some code borrowed from auction. multifetcher collects all FetchAccount errors to pass upstream Address prebid#1426 (comment) Add unit test for AccountFetcher. Fix MultiFetcher.FetchAccount() fallback, add unit tests FetchAccount() should return NotFound error instead of nil, so MultiFetcher tries the next fetcher in line (once we have it).

…s the Account object with per-publisher settings- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher- Expands account validation to require a valid account not just any non-empty value- Makes the resolved account object available to auction and analytics- Cache TTL can now be controlled at account level- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later)Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json````yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id"```Added http api for accountsSame model as for requests aka `GET {endpoint}?account=["<accountID>"]`, tosupport the same API.AccountFetcher returns json.RawMessage- Use DefaultAccount as account defaults for every account readChanged DefaultAccount -> AccountDefaults. Use AccountDefaults.Disabled to require valid accounts to exist.Revert a02288d to remove account support from http apiWithout cache support, it is not production ready.Address feedback- check for empty pubID first in getAccount()- getAccount() returns nil account with error, instead of sometimes account and sometimes nil- remove not-yet used account fields from schema- remove warning about blacklisted_accts- fix accountDefaultDisabled parameter naming in testsgetAccount() returns collected errors as a slice so they can be returned in the responseamp_auction didn't handle error slice so I added some code borrowed from auction.multifetcher collects all FetchAccount errors to pass upstreamAddress prebid#1426 (comment) unit test for AccountFetcher.Fix MultiFetcher.FetchAccount() fallback, add unit testsFetchAccount() should return NotFound error instead of nil,so MultiFetcher tries the next fetcher in line (once we have it).

SyntaxNode · 2020-09-12T05:13:50Z

PBS-Go does not have a caching layer which allows for backwards compatibility. The PR for this issue has been closed. Correct me if I'm wrong, @laurb9. I believe the plan is for a new caching layer to be added, giving the option of a "one shot" initialization. This cache would perform one initial load and then never hit the underlying store again. Once this is completed, the lazy file fetcher can be revisited.

bretg · 2022-12-02T15:59:14Z

I'd like to get a handle on the fetch options and data types supported by both platforms:

Fetch options: local file, HTTP, database
Data types: stored requests, account config, stored responses, long-form categories

bretg · 2023-09-08T14:51:58Z

Closing this due to inactivity. If there's a followup item, please post a new issue with details.

SyntaxNode added Intent to implement An issue describing a plan for a major feature. These are intended for community feedback enhancement labels Jul 17, 2020

laurb9 added a commit to openx/prebid-server that referenced this issue Jul 21, 2020

Lazy FileFetcher (prebid#1402)

6800d4a

laurb9 mentioned this issue Aug 6, 2020

Cache refactor #1431

Merged

laurb9 added a commit to openx/prebid-server that referenced this issue Aug 21, 2020

Lazy FileFetcher (prebid#1402)

c53b451

laurb9 added a commit to openx/prebid-server that referenced this issue Aug 22, 2020

Lazy FileFetcher (prebid#1402)

fb9e08f

laurb9 added a commit to openx/prebid-server that referenced this issue Aug 26, 2020

Lazy FileFetcher (prebid#1402)

8ce0e21

SyntaxNode added Ready For Dev Feature specification is ready to be developed. and removed Intent to implement An issue describing a plan for a major feature. These are intended for community feedback enhancement labels Sep 7, 2022

bretg added the projectboard label Sep 8, 2022

prebid-server-prioritization bot added this to Prebid Server Prioritization Sep 8, 2022

prebid-server-prioritization bot moved this to Triage in Prebid Server Prioritization Sep 8, 2022

bretg removed the projectboard label Sep 8, 2022

bretg moved this from Triage to Needs Requirements in Prebid Server Prioritization Dec 2, 2022

bretg closed this as completed Sep 8, 2023

github-project-automation bot moved this from Needs Requirements to Done in Prebid Server Prioritization Sep 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refreshing file-based stored configs #1402

Refreshing file-based stored configs #1402

laurb9 commented Jul 16, 2020 •

edited

Loading

SyntaxNode commented Jul 16, 2020

laurb9 commented Jul 17, 2020

SyntaxNode commented Jul 17, 2020 •

edited

Loading

laurb9 commented Jul 21, 2020

SyntaxNode commented Sep 12, 2020

bretg commented Dec 2, 2022

bretg commented Sep 8, 2023

Refreshing file-based stored configs #1402

Refreshing file-based stored configs #1402

Comments

laurb9 commented Jul 16, 2020 • edited Loading

Refreshable FileFetcher

Implementation

SyntaxNode commented Jul 16, 2020

laurb9 commented Jul 17, 2020

SyntaxNode commented Jul 17, 2020 • edited Loading

laurb9 commented Jul 21, 2020

SyntaxNode commented Sep 12, 2020

bretg commented Dec 2, 2022

bretg commented Sep 8, 2023

laurb9 commented Jul 16, 2020 •

edited

Loading

SyntaxNode commented Jul 17, 2020 •

edited

Loading