Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ds Active flag blueprint #4654

Merged
merged 5 commits into from
May 11, 2020
Merged

Conversation

ocket8888
Copy link
Contributor

@ocket8888 ocket8888 commented Apr 21, 2020

What does this PR (Pull Request) do?

Adds a blueprint for changing the "active" property of Delivery Services from a boolean to an enumerated string constant like:

/**
 * This defines what other components of ATC will consider a Delivery Service
 * "active".
 *
 * It's not an object exposed through the API in its own right, just a
 * specification of the allowed values.
*/
enum DeliveryServiceActiveState {
	/**
	 * A Delivery Service that is ”active” is one that is functionally
	 * in service, and fully capable of delivering content.
	 *
	 * This means that its configuration is deployed to Cache Servers and it is
	 * available for routing traffic.
	*/
	ACTIVE = 'ACTIVE',
	/**
	 * A Delivery Service that is ”inactive” is not available for
	 * routing and has not had its configuration distributed to its assigned
	 * Cache Servers.
	*/
	INACTIVE = 'INACTIVE',
	/**
	 * A Delivery Service that is ”primed” has had its configuration
	 * distributed to the various servers required to serve its content.
	 * However, the content itself is still inaccessible via routing.
	*/
	PRIMED = 'PRIMED'
}

For more details, refer to the blueprint.

Which Traffic Control components are affected by this PR?

None - blueprint.

What is the best way to verify this PR?

Read the blueprint.

The following criteria are ALL met by this PR

  • Tests are unnecessary
  • Documentation is unnecessary
  • An update to CHANGELOG.md is not necessary
  • This PR includes any and all required license headers
  • This PR does not include a database migration
  • This PR DOES NOT FIX A SERIOUS SECURITY VULNERABILITY

@ocket8888 ocket8888 added the blueprint feature requirements / implementation details label Apr 21, 2020
@rawlinp
Copy link
Contributor

rawlinp commented Apr 21, 2020

This is related to issue #3746

@ocket8888 ocket8888 marked this pull request as ready for review April 29, 2020 21:26
Copy link
Member

@mitchell852 mitchell852 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks very good to me. I guess the real question for the community is "is there a real need for 3 states?" If not, then:

  • active=true means the DS is actively routed thru TR AND the delivery service is accounted for in cache config files

  • active=false means the the DS is NOT actively routed thru TR AND the delivery service is NOT accounted for in cache config files

But assuming, the 3 states are desired, I see no problem with this design. The only suggestion I would have is potentially changing Active to Status (active, inactive, primed) as it makes a bit more sense to me but i suppose that would be messy.

@ocket8888
Copy link
Contributor Author

I don't have a problem with changing the name of the field. But having discussed this with operators I do think three states are desired.

@rob05c
Copy link
Member

rob05c commented Apr 29, 2020

I guess the real question for the community is "is there a real need for 3 states?"

I vote yes.

We need to be able to send config to caches, before it's sent to the router (the existing "Inactive" option). Without it, there's no way to safely stop routing to a DS. So the existing behavior is absolutely necessary.

But currently, there's no way to not send a DS to caches. This means potentially having clients, malicious or accidental, making requests that shouldn't be possible to make, by requesting directly to the cache. It's very common to not yet want to delete the information behind a DS, even when it's no longer active, or not yet active, or temporarily not active. The third "don't send to caches" state makes that possible.

More than that, it's a Footgun. Because "inactive" sounds like it isn't being put anywhere. Unless the user has the TC docs memorized, it's a very easy mistake for new TC operators or tenants to make. Three unambiguous states eliminates that footgun.

@rob05c
Copy link
Member

rob05c commented Apr 29, 2020

I'll also add, as above, a DS shouldn't be immediately removed from (or changed in) routers and caches, without some drain time period, as well as the reverse, making sure a DS is on all Caches before sending to Routers.

In theory, this can be done by waiting long enough between Queue and Snap. But it's safer not to add risk to the already pernicious queue-snap order-of-operations.

So, I think we should make it impossible to change a DS directly from inactive<->active. Both the Portal and the API should require being moved from Active to Primed, and then Primed to Inactive.

Someone who needs to can always do it twice. But that makes it safer, and reduces the chance of operator misunderstanding.

Ideally with good error messages, e.g. "Error: Inactive delivery services must be primed before they're set to active, to make sure they're on the caches before being routed to."

@ocket8888
Copy link
Contributor Author

"I think we should make it impossible to change a DS directly from inactive<->active"

So would that also mean you can never create an ACTIVE DS?

@rob05c
Copy link
Member

rob05c commented Apr 29, 2020

So would that also mean you can never create an ACTIVE DS?

Yes, +1 on requiring created DSes to be Inactive, or possibly Primed. But, I'd be in favor of removing the Active field from the DS creation form altogether.

In theory, it's probably mostly ok to let someone make a DS as Active, because there are no successful clients anyway.

But it's possible someone making a DS has clients already requesting, and it's possible they handle NXDOMAIN better than 404. For example, maybe they're configured to retry NXDOMAINs as errors, but 404's generally mean something's wrong with the URL so they give the client an error. Worse, maybe they cache that 404 locally.

It's a bit of an edge case, I don't think it's a huge deal either way. But IMO making new DSes always start out Inactive adds an extra little bit of wrong-things-are-impossible safety.

@rawlinp
Copy link
Contributor

rawlinp commented Apr 29, 2020

We should probably make it clear why this is necessary when it seems like we can already accomplish these 3 states today:

proposed:      what it means:                 how it's done today:
=================================================================================
active    ==   routed, cached             ==  active,   has ds-server assignments
primed    ==   cached, but not routed     ==  inactive, has ds-server assignments
inactive  ==   not routed, not cached     ==  inactive, does not have ds-server assignments

Granted, we're hoping to remove ds-server assignments in the future, so that seems to make this change necessary.

@ocket8888
Copy link
Contributor Author

That depends on what you mean by "accomplish". You can render Delivery Services un-service-able, but that's not exactly the same thing as having their configuration ready to deploy at the push of a button - one button.

@ocket8888 ocket8888 changed the title Ds flag blueprint Ds Active flag blueprint May 11, 2020
@mitchell852 mitchell852 merged commit ca12161 into apache:master May 11, 2020
@ocket8888 ocket8888 deleted the ds-flag-blueprint branch May 12, 2020 02:12
@rimashah25 rimashah25 added this to the 8.0.0 milestone Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blueprint feature requirements / implementation details
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants