-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refreshing file-based stored configs #1402
Comments
Thank you for the proposal. The most common use case of the FileFetcher is local testing, which is likely why it was built to only load configurations at startup. I certainly have no issue with it being expanded to also include a lazy loading approach. I don't recall how the Cache wrapper is wired up, but so long as it would apply to the changes proposed, I'm on board. I agree having two top level FileFetchers can be a bit confusing. It looks like the FileFetcher currently returns an eagerFetcher. A nice approach could be implement a lazyFetcher which always goes direct to the file system and like you suggested, using a config flag to decide which one to instantiate. Perhaps another option can be to create a less forgiving Cache layer which binds to the first read and never again refreshes - moving that concern out of the FileFetcher entirely.. Again though, I'm not sure where the Cache is wired up and how easy that may be to change. By "recursively scanning the directory" are you referring to it loading all files in the directory as opposed to searching for a specific file on demand? That's an area of code I haven't looked at in a while. I know there's an option somewhere in there to load a specific stored request... but I don't know how that's expressed in code. Do you know where that is?
+1 Are you planning to implement this proposal? |
On closer look, I was wrong about the mapping. One only expects to find stored_imps, stored_requests and category mappings with a flat structure under them, even though the code would continue to recurse were that not the case (collectStoredData in https://github.com/prebid/prebid-server/blob/master/stored_requests/backends/file_fetcher/fetcher.go#L84). So the 1:1 mapping can in fact be done. There seem to be two code bases in pbs, legacy and openrtb2, with different sets of backends, metrics and support code, and I often find myself in the wrong place after a code search... but I think the cache for openrtb2 is a layer on top of a set of the other fetchers - file, db, api: https://github.com/prebid/prebid-server/blob/master/stored_requests/fetcher.go#L157 I think replacing with a lazyFetcher in this case could work, I'll give it a try. An option to preload all would replicate existing behavior but it would be simpler without. I'd like to see if this can be used for account configs as well, in lieu of listing all of them in a yaml file (I opened #1395 about that). |
Fantastic.
Correct. The legacy endpoints and model are outdated and no longer maintained. You don't need to worry about feature parity. We're working on defining an end date to remove that code completely. I think PrebidJS dropped their support for the legacy endpoints in 3.0.
That's intended behavior. We want to support multiple fetchers to allow hosts to use multiple sources.
Awesome! That's really appreciated.
I commented in the other issue. @bsardo is working on some aspects of account configs. It would great if you could collaborate and work on different parts of the account config system. |
@SyntaxNode This is in principle what I was thinking about regarding the lazy fetcher: #1411 I've found that FetchCategories is written differently and still caches the parsed json forever. Because this happens in http_fetcher too, a different approach looking upstream of it would be better. |
Demonstrates the concept of fetching account configurations via the fetcher framework. db and http api can be added fairly easily on top. The account schema is defined in config so it can be loaded from yaml too if want. ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ``` Introduces the concept of default account, to control global account settings that were originally in the main config. For example, this can replace account_required=true: ``` default_account: disabled: true ``` - included sample test account (disabled) - no unit tests - no cache yet, turns out cache setup is specific to stored requests - no refactoring, although refactoring the code around could help a lot
Demonstrates the concept of fetching account configurations via the fetcher framework. db and http api can be added fairly easily on top. The account schema is defined in config so it can be loaded from yaml too if want. ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ``` Introduces the concept of default account, to control global account settings that were originally in the main config. For example, this can replace account_required=true: ``` default_account: disabled: true ``` - included sample test account (disabled) - no unit tests - no cache yet, turns out cache setup is specific to stored requests - no refactoring, although refactoring the code around could help a lot
- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ``` fixup
- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ```
- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ```
- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ```
- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ```
- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ```
- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ```
- Defines the Account object with per-publisher settings - Adds a cfg.DefaultAccount for global settings that are overrideable per publisher - Expands account validation to require a valid account not just any non-empty value - Makes the resolved account object available to auction and analytics - Cache TTL can now be controlled at account level - Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later) Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json` ```yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id" ``` Added http api for accounts Same model as for requests aka `GET {endpoint}?account=["<accountID>"]`, to support the same API. AccountFetcher returns json.RawMessage - Use DefaultAccount as account defaults for every account read Changed DefaultAccount -> AccountDefaults. Use AccountDefaults.Disabled to require valid accounts to exist. Revert a02288d to remove account support from http api Without cache support, it is not production ready. Address feedback- check for empty pubID first in getAccount()- getAccount() returns nil account with error, instead of sometimes account and sometimes nil- remove not-yet used account fields from schema- remove warning about blacklisted_accts- fix accountDefaultDisabled parameter naming in tests getAccount() returns collected errors as a slice so they can be returned in the response amp_auction didn't handle error slice so I added some code borrowed from auction. multifetcher collects all FetchAccount errors to pass upstream Address prebid#1426 (comment) Add unit test for AccountFetcher. Fix MultiFetcher.FetchAccount() fallback, add unit tests FetchAccount() should return NotFound error instead of nil, so MultiFetcher tries the next fetcher in line (once we have it).
…s the Account object with per-publisher settings- Adds a cfg.DefaultAccount for global settings that are overrideable per publisher- Expands account validation to require a valid account not just any non-empty value- Makes the resolved account object available to auction and analytics- Cache TTL can now be controlled at account level- Uses fetcher framework to retrieve accounts from individual <id>.json files or HTTP API (database to be added later)Example configuration for accounts stored in `stored_requests/data/by_id/accounts/<id>.json````yaml accounts: filesystem: enabled: true directorypath: "./stored_requests/data/by_id"```Added http api for accountsSame model as for requests aka `GET {endpoint}?account=["<accountID>"]`, tosupport the same API.AccountFetcher returns json.RawMessage- Use DefaultAccount as account defaults for every account readChanged DefaultAccount -> AccountDefaults. Use AccountDefaults.Disabled to require valid accounts to exist.Revert a02288d to remove account support from http apiWithout cache support, it is not production ready.Address feedback- check for empty pubID first in getAccount()- getAccount() returns nil account with error, instead of sometimes account and sometimes nil- remove not-yet used account fields from schema- remove warning about blacklisted_accts- fix accountDefaultDisabled parameter naming in testsgetAccount() returns collected errors as a slice so they can be returned in the responseamp_auction didn't handle error slice so I added some code borrowed from auction.multifetcher collects all FetchAccount errors to pass upstreamAddress prebid#1426 (comment) unit test for AccountFetcher.Fix MultiFetcher.FetchAccount() fallback, add unit testsFetchAccount() should return NotFound error instead of nil,so MultiFetcher tries the next fetcher in line (once we have it).
PBS-Go does not have a caching layer which allows for backwards compatibility. The PR for this issue has been closed. Correct me if I'm wrong, @laurb9. I believe the plan is for a new caching layer to be added, giving the option of a "one shot" initialization. This cache would perform one initial load and then never hit the underlying store again. Once this is completed, the lazy file fetcher can be revisited. |
I'd like to get a handle on the fetch options and data types supported by both platforms: Fetch options: local file, HTTP, database |
Closing this due to inactivity. If there's a followup item, please post a new issue with details. |
Refreshable FileFetcher
The current FileFetcher eagerly loads all the configs in memory and never looks back. It does so by recursively scanning the directory, possibly intended to support a hashed path or other file organization. This makes it fast and shareable, but less suited if stored requests need to be updated within the lifetime of the server process, which is our use case.
I am proposing a FileFetcher whose behavior is more consistent with the other Fetchers, as in can retrieve items on-demand, but does not store the data itself. This allows it to work with the Cache like the others, which in turn makes it refreshable in a way that is configurable externally.
Implementation
If we need to maintain compatibility with existing path structures, the existing FileFetcher is not ideal for this. The recursive directory ingestion means that the 1:1 mapping between the requested key and the resource, needed for on-demand loading, does not exist. We could attempt direct mapping when it exists (flat dir), but this will result in inconsistent behavior.
Alternatively, a separate clean FileFetcher can be implemented with the defined behavior.
(edit) now that I wrote all this, I realize two filefetchers can be confusing so a config mode switch is probably better.
The text was updated successfully, but these errors were encountered: