-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote Traceflow API to v1beta1 #5108
Conversation
5baf453
to
cdd8fa5
Compare
1349008
to
801a062
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luolanzone I don't see the CRD conversion webhook as part of this PR?
pkg/apis/crd/v1beta1/types.go
Outdated
DroppedOnly bool `json:"droppedOnly,omitempty"` | ||
// Timeout specifies the timeout of the Traceflow in seconds. Defaults | ||
// to 20 seconds if not set. | ||
Timeout uint16 `json:"timeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The K8s API design guide actually discourages the use of unsigned integers:
Do not use unsigned integers, due to inconsistent support across languages and libraries. Just validate that the integer is non-negative if that's the case.
As a rule I would say we should never use unsigned integers in API. There are only a few scenarios in which it may make sense, for example for packet field values, but even then I see that we use int32
for the IP protocol.
Note that the same document recommends using int32 / int64. I don't see int16s in the core K8s API. I feel like we could make this field an int32.
This change should also be mentioned in the commit message / PR description, once it is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, message updated.
pkg/apis/crd/v1beta1/types.go
Outdated
// See https://github.com/kubernetes/kubernetes/issues/86811 | ||
StartTime *metav1.Time `json:"startTime,omitempty"` | ||
// DataplaneTag is a tag to identify a traceflow session across Nodes. | ||
DataplaneTag uint8 `json:"dataplaneTag,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in this case, I think we can easily use a signed integer. This is entirely managed by Antrea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use lower 6 bits in this value as DSCP, thus using int8 can work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
801a062
to
7707f20
Compare
Hi @antoninbas I didn't see the API change in this PR requires CRD conversion webhook. I think the default |
I think you're correct. It would be good to mention it briefly in the commit message / PR description:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the API changes look good to me.
However, we should use the OpenAPI schema to validate that integer fields are in a valid range. For example:
ttl:
type: integer
minimum: 0
maximum: 255
7707f20
to
747f645
Compare
The commit message for default |
I'm confused by the latest version of the changes:
|
668e5b5
to
3875ac3
Compare
Hi @antoninbas thanks to point out, I made the change by mistake, revert the change and make the validation check in v1beta1 only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are more integer fields for which we should ideally constraint the range of supported values using the openapi schema.
I have a question for @tnqn: if we restrict a field to a smaller range of values than before, does that break backwards-compatibility, and what should we do in that case.
I'll take 2 concrete examples:
- The IP length field used to be a
uint16
. In this API change, we are transforming it to anint32
, as per K8s API best practices. We can add OpenAPI constraints to restrict the range to [0, 65535]. In theory, the new range of possible values matches the old range of possible values. In practice, would it have been possible for someone to store a larger integer value in etcd by providing a large value in the YAML / JSON? - The TTL field is an
int32
. In this API change, we are adding OpenAPI constraints to restrict the range to [0, 255] (which is the valid range for the field), In this case, it is definitely possible for an earlier object to have been stored in etcd with a value > 255. Is this a breaking change?
On a more general note: should we mandate minimum
and maximum
constrains for all integer
fields moving forward?
3875ac3
to
3cf06dd
Compare
I added the validation for IP length to [0, 65535] too, I feel it should be fine considering traceflow CR is a short-lived CR, the failed or succeed CR won't be processed again. Let me know if you have any concern @antoninbas @tnqn @gran-vmv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but we should get another review as this is an API change
37e1c34
to
ca88c01
Compare
I have some opinions about the API.
|
@shi0rik0, thanks for the input. @gran-vmv welcome to input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Do we need to keep at least 1 e2e case to check old traceflow API?
Agree with @luolanzone |
What's kind of e2e case? check old API availability? I did verification manually before, it looks good and both versions are served. User can get CRs via two versions. |
@antoninbas @tnqn please help to take a look, thanks. |
Yes, we can have 1 e2e case to check old API availability, to avoid unexpected break change in future. |
ca88c01
to
45a9363
Compare
I am not sure if it's worthy to do so considering we just simply changed the storage as false for the old API. I didn't see there is any way that will cause the old API become unavailable. What's kind of unexpected break change you are referring to? |
Another suggestion: we can group all live-traffic-related specs together to a
This can tell users what fields are related to live-traffic traceflows, and it can avoid problems like what if we specify |
@shi0rik0, thanks for the input, I would like to avoid big changes on this PR, if your suggestion is for live sampling, I would suggest to include the change in your PR for the live sampling. @tnqn @gran-vmv @antoninbas @jianjuns since there is a task for firstN sampling feature in Traceflow which is taken by @shi0rik0 , I think there might be some big changes for Traceflow, I am wondering should we hold this PR and wait for the firstN sampling feature, or we promote the API to v1beta1 with small changes first? |
Length int32 `json:"length,omitempty"` | ||
IPHeader *IPHeader `json:"ipHeader,omitempty"` | ||
IPv6Header *IPv6Header `json:"ipv6Header,omitempty"` | ||
TransportHeader TransportHeader `json:"transportHeader,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is not a pointer?
|
||
// TraceflowSpec describes the spec of the traceflow. | ||
type TraceflowSpec struct { | ||
Source Source `json:"source,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all Source, Destination, Packet be pointers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all Source, Destination, Packet be pointers?
The pros and cons of setting the types to pointers:
- Pros: we can distinguish whether the field is specified by users or not because we allow nil value.
- Cons: we have to check whether this field is nil or not every time we access it. If we forget, a panic may be raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we needs to check individual fields in a struct to know the struct is specified by users, I think a pointer would be better.
Also feel we should have a consistent style. Should not have some fields to be pointers and some not with no reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to defer related changes to next release considering we may need more efforts for verification.
@shi0rik0 please check if you can take care of this when you make the firstN sampling changes, including those suggestions you raise in the comments. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion about changing these fields to pointers, but if we do make that change, it should be done in its own PR (not when implementing a new feature like firstN sampling).
Current
I think the |
I think we can group
|
I think let's only make necessary changes in the PR. For semantic improvements, UE enhancements, let's discuss and implement each of them separately, taking compatibility, cost, and and importance into consideration. |
45a9363
to
9273573
Compare
Code conflicts resolved. |
9273573
to
f995aa3
Compare
@jianjuns @tnqn @antoninbas could you take a look again? the code conflicts are resolved, and I feel we can defer to next release with more discussions and verification for some comments. Thanks. |
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm, I had one last question which I left as a comment
you need to rebase the PR
// IPv6Header describes spec of an IPv6 header. | ||
type IPv6Header struct { | ||
// NextHeader is the IPv6 protocol. | ||
NextHeader *int32 `json:"nextHeader,omitempty" yaml:"nextHeader,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick question: why do we use a pointer for NextHeader
, but not for Protocol
?
sorry if this was discussed before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not discussed before for this field, I don't know if there is any specific reason to use pointer for NextHeader
but not for Protocol
, @gran-vmv may know the history.
There was a few discussions about other fields as well, I feel we may go through the whole spec to make the fields type consistent if it's more proper to use 'pointer' in a separate PR with more discussions and verification.
1. Promote Traceflow API to v1beta1. 2. Remove srcIP field in IPHeader and IPv6Header since it's never used and duplicated with parent struct. 3. Change IPHeader field to a pointer in Packet struct. 4. Change Flags field to a pointer in the TCPHeader struct. 5. Change Timeout field in TraceflowSpec from uint16 to int32. 6. Change Length field in Packet from uint16 to int32. 7. Add OpenAPISpec schema validation for a few integer type of fields. Note: Given the limited nature of the changes between the 2 API versions, the default None conversion strategy can be used. Signed-off-by: Lan Luo <[email protected]>
f995aa3
to
bc00ddc
Compare
Code conflicts are resolved. |
Note: Given the limited nature of the changes between the 2 API versions, the default None conversion strategy can be used.