Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refreshing file-based stored configs #1402

Closed
laurb9 opened this issue Jul 16, 2020 · 7 comments
Closed

Refreshing file-based stored configs #1402

laurb9 opened this issue Jul 16, 2020 · 7 comments
Labels
Ready For Dev Feature specification is ready to be developed.

Comments

@laurb9
Copy link
Contributor

laurb9 commented Jul 16, 2020

Refreshable FileFetcher

The current FileFetcher eagerly loads all the configs in memory and never looks back. It does so by recursively scanning the directory, possibly intended to support a hashed path or other file organization. This makes it fast and shareable, but less suited if stored requests need to be updated within the lifetime of the server process, which is our use case.

I am proposing a FileFetcher whose behavior is more consistent with the other Fetchers, as in can retrieve items on-demand, but does not store the data itself. This allows it to work with the Cache like the others, which in turn makes it refreshable in a way that is configurable externally.

Implementation

If we need to maintain compatibility with existing path structures, the existing FileFetcher is not ideal for this. The recursive directory ingestion means that the 1:1 mapping between the requested key and the resource, needed for on-demand loading, does not exist. We could attempt direct mapping when it exists (flat dir), but this will result in inconsistent behavior.

Alternatively, a separate clean FileFetcher can be implemented with the defined behavior.

(edit) now that I wrote all this, I realize two filefetchers can be confusing so a config mode switch is probably better.

@SyntaxNode
Copy link
Contributor

Thank you for the proposal.

The most common use case of the FileFetcher is local testing, which is likely why it was built to only load configurations at startup. I certainly have no issue with it being expanded to also include a lazy loading approach. I don't recall how the Cache wrapper is wired up, but so long as it would apply to the changes proposed, I'm on board.

I agree having two top level FileFetchers can be a bit confusing.

It looks like the FileFetcher currently returns an eagerFetcher. A nice approach could be implement a lazyFetcher which always goes direct to the file system and like you suggested, using a config flag to decide which one to instantiate. Perhaps another option can be to create a less forgiving Cache layer which binds to the first read and never again refreshes - moving that concern out of the FileFetcher entirely.. Again though, I'm not sure where the Cache is wired up and how easy that may be to change.

By "recursively scanning the directory" are you referring to it loading all files in the directory as opposed to searching for a specific file on demand? That's an area of code I haven't looked at in a while. I know there's an option somewhere in there to load a specific stored request... but I don't know how that's expressed in code. Do you know where that is?

(edit) now that I wrote all this, I realize two filefetchers can be confusing so a config mode switch is probably better.

+1

Are you planning to implement this proposal?

@laurb9
Copy link
Contributor Author

laurb9 commented Jul 17, 2020

On closer look, I was wrong about the mapping. One only expects to find stored_imps, stored_requests and category mappings with a flat structure under them, even though the code would continue to recurse were that not the case (collectStoredData in https://github.com/prebid/prebid-server/blob/master/stored_requests/backends/file_fetcher/fetcher.go#L84). So the 1:1 mapping can in fact be done.

There seem to be two code bases in pbs, legacy and openrtb2, with different sets of backends, metrics and support code, and I often find myself in the wrong place after a code search... but I think the cache for openrtb2 is a layer on top of a set of the other fetchers - file, db, api: https://github.com/prebid/prebid-server/blob/master/stored_requests/fetcher.go#L157
The overall fetcher setup is a bit involved because it attempts to fall back down the list to complete the list of ids it couldn't find in earlier stages.

I think replacing with a lazyFetcher in this case could work, I'll give it a try. An option to preload all would replicate existing behavior but it would be simpler without. I'd like to see if this can be used for account configs as well, in lieu of listing all of them in a yaml file (I opened #1395 about that).

@SyntaxNode
Copy link
Contributor

SyntaxNode commented Jul 17, 2020

So the 1:1 mapping can in fact be done.

Fantastic.

There seem to be two code bases in pbs, legacy and openrtb2,

Correct. The legacy endpoints and model are outdated and no longer maintained. You don't need to worry about feature parity. We're working on defining an end date to remove that code completely. I think PrebidJS dropped their support for the legacy endpoints in 3.0.

The overall fetcher setup is a bit involved because it attempts to fall back down the list to complete the list of ids it couldn't find in earlier stages.

That's intended behavior. We want to support multiple fetchers to allow hosts to use multiple sources.

I think replacing with a lazyFetcher in this case could work, I'll give it a try.

Awesome! That's really appreciated.

I'd like to see if this can be used for account configs as well, in lieu of listing all of them in a yaml

I commented in the other issue. @bsardo is working on some aspects of account configs. It would great if you could collaborate and work on different parts of the account config system.

@SyntaxNode SyntaxNode added Intent to implement An issue describing a plan for a major feature. These are intended for community feedback enhancement labels Jul 17, 2020
laurb9 added a commit to openx/prebid-server that referenced this issue Jul 21, 2020
@laurb9
Copy link
Contributor Author

laurb9 commented Jul 21, 2020

@SyntaxNode This is in principle what I was thinking about regarding the lazy fetcher: #1411
It's a quick draft but it's shorter and tests are passing. And adding account fetcher would take a few more lines.

I've found that FetchCategories is written differently and still caches the parsed json forever. Because this happens in http_fetcher too, a different approach looking upstream of it would be better.

laurb9 added a commit to openx/prebid-server that referenced this issue Jul 31, 2020
Demonstrates the concept of fetching account configurations via the fetcher framework. db and http api can be added fairly easily on top. The account schema is defined in config so it can be loaded from yaml too if want.

 ```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
 ```
 Introduces the concept of default account, to control global account settings that were originally in the main config. For example, this can replace account_required=true:
 ```
 default_account:
   disabled: true
 ```

 - included sample test account (disabled)
 - no unit tests
 - no cache yet, turns out cache setup is specific to stored requests
 - no refactoring, although refactoring the code around could help a lot
@laurb9 laurb9 mentioned this issue Aug 6, 2020
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 13, 2020
Demonstrates the concept of fetching account configurations via the fetcher framework. db and http api can be added fairly easily on top. The account schema is defined in config so it can be loaded from yaml too if want.

 ```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
 ```
 Introduces the concept of default account, to control global account settings that were originally in the main config. For example, this can replace account_required=true:
 ```
 default_account:
   disabled: true
 ```

 - included sample test account (disabled)
 - no unit tests
 - no cache yet, turns out cache setup is specific to stored requests
 - no refactoring, although refactoring the code around could help a lot
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 14, 2020
- Defines the Account object with per-publisher settings
- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher
- Expands account validation to require a valid account not just any non-empty value
- Makes the resolved account object available to auction and analytics
- Cache TTL can now be controlled at account level
- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API
  (database to be added later)

Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json`
```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
```

fixup
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 14, 2020
- Defines the Account object with per-publisher settings
- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher
- Expands account validation to require a valid account not just any non-empty value
- Makes the resolved account object available to auction and analytics
- Cache TTL can now be controlled at account level
- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API
  (database to be added later)

Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json`
```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
```
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 19, 2020
- Defines the Account object with per-publisher settings
- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher
- Expands account validation to require a valid account not just any non-empty value
- Makes the resolved account object available to auction and analytics
- Cache TTL can now be controlled at account level
- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API
  (database to be added later)

Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json`
```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
```
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 19, 2020
- Defines the Account object with per-publisher settings
- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher
- Expands account validation to require a valid account not just any non-empty value
- Makes the resolved account object available to auction and analytics
- Cache TTL can now be controlled at account level
- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API
  (database to be added later)

Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json`
```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
```
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 21, 2020
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 22, 2020
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 22, 2020
- Defines the Account object with per-publisher settings
- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher
- Expands account validation to require a valid account not just any non-empty value
- Makes the resolved account object available to auction and analytics
- Cache TTL can now be controlled at account level
- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API
  (database to be added later)

Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json`
```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
```
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 22, 2020
- Defines the Account object with per-publisher settings
- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher
- Expands account validation to require a valid account not just any non-empty value
- Makes the resolved account object available to auction and analytics
- Cache TTL can now be controlled at account level
- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API
  (database to be added later)

Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json`
```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
```
laurb9 added a commit to openx/prebid-server that referenced this issue Aug 26, 2020
laurb9 added a commit to openx/prebid-server that referenced this issue Sep 1, 2020
- Defines the Account object with per-publisher settings
- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher
- Expands account validation to require a valid account not just any non-empty value
- Makes the resolved account object available to auction and analytics
- Cache TTL can now be controlled at account level
- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API
  (database to be added later)

Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json`
```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
```
laurb9 added a commit to openx/prebid-server that referenced this issue Sep 1, 2020
- Defines the Account object with per-publisher settings
- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher
- Expands account validation to require a valid account not just any non-empty value
- Makes the resolved account object available to auction and analytics
- Cache TTL can now be controlled at account level
- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API
  (database to be added later)

Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json`
```yaml
 accounts:
   filesystem:
     enabled: true
     directorypath: "./stored_requests/data/by_id"
```

Added http api for accounts

Same model as for requests aka `GET {endpoint}?account=["<accountID>"]`, to
support the same API.

AccountFetcher returns json.RawMessage

- Use DefaultAccount as account defaults for every account read

Changed DefaultAccount -> AccountDefaults. Use AccountDefaults.Disabled to require valid accounts to exist.

Revert a02288d to remove account support from http api

Without cache support, it is not production ready.

Address feedback- check for empty pubID first in getAccount()- getAccount() returns nil account with error, instead of sometimes account and sometimes nil- remove not-yet used account fields from schema- remove warning about blacklisted_accts- fix accountDefaultDisabled parameter naming in tests

getAccount() returns collected errors as a slice so they can be returned in the response

amp_auction didn't handle error slice so I added some code borrowed from auction.

multifetcher collects all FetchAccount errors to pass upstream

Address prebid#1426 (comment)

Add unit test for AccountFetcher.

Fix MultiFetcher.FetchAccount() fallback, add unit tests

FetchAccount() should return NotFound error instead of nil,
so MultiFetcher tries the next fetcher in line (once we have it).
laurb9 added a commit to openx/prebid-server that referenced this issue Sep 1, 2020
…s the Account object with per-publisher settings- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher- Expands account validation to require a valid account not just any non-empty value- Makes the resolved account object available to auction and analytics- Cache TTL can now be controlled at account level- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later)Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json````yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id"```Added http api for accountsSame model as for requests aka `GET {endpoint}?account=["<accountID>"]`, tosupport the same API.AccountFetcher returns json.RawMessage- Use DefaultAccount as account defaults for every account readChanged DefaultAccount -> AccountDefaults. Use AccountDefaults.Disabled to require valid accounts to exist.Revert a02288d to remove account support from http apiWithout cache support, it is not production ready.Address feedback- check for empty pubID first in getAccount()- getAccount() returns nil account with error, instead of sometimes account and sometimes nil- remove not-yet used account fields from schema- remove warning about blacklisted_accts- fix accountDefaultDisabled parameter naming in testsgetAccount() returns collected errors as a slice so they can be returned in the responseamp_auction didn't handle error slice so I added some code borrowed from auction.multifetcher collects all FetchAccount errors to pass upstreamAddress prebid#1426 (comment) unit test for AccountFetcher.Fix MultiFetcher.FetchAccount() fallback, add unit testsFetchAccount() should return NotFound error instead of nil,so MultiFetcher tries the next fetcher in line (once we have it).
@SyntaxNode
Copy link
Contributor

PBS-Go does not have a caching layer which allows for backwards compatibility. The PR for this issue has been closed. Correct me if I'm wrong, @laurb9. I believe the plan is for a new caching layer to be added, giving the option of a "one shot" initialization. This cache would perform one initial load and then never hit the underlying store again. Once this is completed, the lazy file fetcher can be revisited.

@SyntaxNode SyntaxNode added Ready For Dev Feature specification is ready to be developed. and removed Intent to implement An issue describing a plan for a major feature. These are intended for community feedback enhancement labels Sep 7, 2022
@bretg bretg removed the projectboard label Sep 8, 2022
@bretg
Copy link
Contributor

bretg commented Dec 2, 2022

I'd like to get a handle on the fetch options and data types supported by both platforms:

Fetch options: local file, HTTP, database
Data types: stored requests, account config, stored responses, long-form categories

@bretg bretg moved this from Triage to Needs Requirements in Prebid Server Prioritization Dec 2, 2022
@bretg
Copy link
Contributor

bretg commented Sep 8, 2023

Closing this due to inactivity. If there's a followup item, please post a new issue with details.

@bretg bretg closed this as completed Sep 8, 2023
@github-project-automation github-project-automation bot moved this from Needs Requirements to Done in Prebid Server Prioritization Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready For Dev Feature specification is ready to be developed.
Projects
Status: Done
Development

No branches or pull requests

3 participants