-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using cache from informer for operator deleted #150
Using cache from informer for operator deleted #150
Conversation
Added logging on operator deletion status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
if (openShiftClient != null) { | ||
isDeleted = isDeleted && cachedRoute(managedKafka) == null; | ||
} | ||
log.info("Admin Server isDeleted = {}", isDeleted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan on going back and adjusting the log levels? Seems like the logs will fill up pretty quickly at an info level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's planned and I would love to use the same logging library across operator and sync. We should make a final decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same library and/or same usage of levels? Didn't @kornys have a pr at one point to switch the operator to jboss logging? Any chance we could resurrect that along with a tweak of the logging levels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering why we should use the jboss one. Tbh I have never seen people using it. Just because it's the default for Quarkus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's the major reason.
I'm not too concerned if we use different logging frameworks - the only issue will be making sure we know which formatting to use.
But we should try to be consistent with the levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to walk through a little bit more on what is and is not making api calls. Currently when kafkaInstance.createOrUpdate or delete is called, that will eventually make an api call per resource correct? I don't see anything in the nested createOrDelete logic that is comparing the desired to the known state first.
With this change, when isDeleted is called it won't make api calls - but isDeleted is called with roughly the same frequency (less as it's first checking the isDeleted spec flag) as createOrUpdate correct?
I had this discussion multiple times with Java operator SDK team and they don't consider to be correct to compare old with new but just apply the resource as idempotent operation. This is why I am getting every time the current and building the new from that while filling/reverting the fields we need. The drawback is maybe related to the fact that events are raised not only when something change but even due to the re-sync interval from informer. |
To rephrase a little - either you can monitor the dependent resources and flip them back to the desired state almost immediately, or you could rely on the resync interval of ManagedKafka and just flip them back on the resync. |
Even if it is an idempotent operation, it is an extra API call and in the end, someone has to do that check. If there is a simple way to avoid it why not? |
Comparing resources is not simple because Kubernetes starts to add a lot of fields that will make the comparing to fail even if they are the same. I raised this point long time ago. Otherwise we have to compare field by field what is meaningful to us for saying that two resources are different or not. |
Not sure if we can rely on the resourceVersion resource ... sometimes I had problems in the past using it. |
To weigh in more. There are two concerns here:
On the second one there is some cross-talk with the other pr. I'll assume that we'll have that conversation over there. On the first there's a couple of considerations. For what Paolo is doing he can't just look at the resourceVersion of dependent resources. The resourceVersion is updated even for status changes. Beyond that what he really cares about is the comparison to his ideal state - not the before/after of update event the informer is reporting. As for what should be compared, presumably it should just be the spec correct? Are you seeing kube or dependent resources add stuff to specs? If yes that makes this comparison quite difficult. If not, then we have the option of using lombok (for any api we can effectively update), or compare based upon the json / yaml serialization (kube is probably doing something similar once it gets the update request). Ultimately whether we should be doing that is based upon the number of update calls that are being made - if it's not that many it's not worth worrying about this. |
That should never happen, that is the contract.
Is there any reason why lombok based equality check on Spec not preferable? |
There seems to be some bias against lombok at least from Max. You also need to make sure your objects provide the desired quality - use a Set when possible instead of a List if ordering of something doesn't matter. Beyond that this about dependent resources, so you'd be asking strimzi to add that. |
It's not about Strimzi only. Not just |
Yes, I realize that it's other things as well, which is why I brought up comparing the serialized document. But the bigger question is the frequency of these update - if it's not that much we don't need to worry about this. As far as I know you have a resync interval of a minute - so at worst every minute you are trying to update every resource. On top of this you are updating for every event including status changes - from what we say strimzi does it's own heart beat of status updates at a two minute interval, and other than that there aren't a lot of rapid status updates. So can we tolerate a set of mostly no-op updates against all of the resources (1000s of ManagedKafkas x resource fanout) every minute. If no, then consider increasing the resync or we'll need to consider adding comparison logic. If yes, then we can drop this for now. |
Just to add another problem to the comparison logic, is the |
@rareddy what's being done here is a defined pattern - see https://www.openshift.com/blog/kubernetes-operators-best-practices level based triggering. |
@ppatierno @rareddy its worth revisiting this another time as I didn't completely describe the number of updates performed. There are 10 resources created currently for each managedkafka - each of them is producing an update event on the resync interval which then causes 10 updates. That's actually 100 updates per ManagedKafka per resync. And since there's been no movement on fabric8io/kubernetes-client#2812 - it's likely double that. Given the potential for 1000s of managedkafka, this does need more consideration. At a minimum I propose filtering out events from the informer onUpdate where the resourceVersion has not changed - that will filter out the bulk of the resync events you don't care about. |
Also maybe one can reduce any updates by checking the namespace of resource with namepace of managed kafka if matched we can reduce that calls back to 10 events per MK CR. That still is 100 updates but capped at that number. Otherwise we need to multiply that with #Mk-CR deployed. Update: looks like this filtering is already done during the EventSource conversation based on Owner Reference, so ignore this |
To clarify I'm proposing #154 |
This PR refactor the operand deleted part using the cache from informer and it adds logging the deletion state.
It also refactor the methods to get ConfigMap and Secret related to Kafka.