-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flow Visibility] Add policy recommendation Spark job #3064
Conversation
4cfd261
to
0356e66
Compare
Codecov Report
@@ Coverage Diff @@
## main #3064 +/- ##
==========================================
+ Coverage 58.76% 59.88% +1.11%
==========================================
Files 292 334 +42
Lines 24724 28639 +3915
==========================================
+ Hits 14530 17151 +2621
- Misses 8622 9620 +998
- Partials 1572 1868 +296
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was it decided that this will live in the main antrea repo?
Hi Antonin, we decided not to open a separate repo for flow visibility long term arch. |
Note that one commit can have multiple co-authors, but the commit message needs to be written carefully (https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors). However, I am not sure we need a feature branch in this case, as long as the changes are incremental and independent, they can be merged into main separately.
I don't think this is the right place for this (a Python script for policy recommendation definitely doesn't belong under |
Thanks Antonin, after another round discussion with team, we think our flow visibility long term arch has three main parts: Clickhouse database deployment, Grafana UI implementation and Python Spark jobs. There will be some dependencies between PRs inside each part. So we are thinking about creating 3 branches for each part which could have better integration/e2e tests of each part before merging into main branch. |
Given that this doesn't affect main Antrea functions, and I don't expect that we will deprecate current flow visibility stuff until we make good progress with the new version, either option is fine with me. Note that managing multiple feature branches can be a bit of a pain, so if possible I would still recommend merging into main directly. By "if possible" I mean that each PR that is merged should of course not break any existing functionality or cause any CI job to fail. |
Thanks Antonin, I agree with your opinion and had won the agreement of other team members. We will merge the long term PRs into main branch directly in the future. |
0356e66
to
5458b30
Compare
ee39314
to
5408048
Compare
5408048
to
7da486a
Compare
efb0a2a
to
96a5f1a
Compare
96a5f1a
to
d4fb2d2
Compare
return "pod_to_external" | ||
|
||
def map_flow_to_egress(flow): | ||
src = "#".join([flow.sourcePodNamespace, flow.sourcePodLabels]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is #
a self-defined delimiter? If so, I think we'd better define it as a constant.
# This library is used to define Antrea Network Policy related CRDs in Python. | ||
# Code structure is following the Kubernetes Python Client library (https://github.com/kubernetes-client/python). | ||
# Could be improved by using openAPI generator in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file auto-generated or manually created? If there is a change in Antrea Network Policy related CRD, this file also needs to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review Yiou.
This file is manually created now, yes we need to change this file if there is a related change on Antrea side affecting the policy recommendation. This file could be changed to auto-generated by using openAPI generator in the future like the K8s python lib.
def get_IP_version(IP): | ||
return "v4" if type(ip_address(IP)) is IPv4Address else "v6" | ||
|
||
def camel(s): | ||
s = sub(r"(_|-)+", " ", s).title().replace(" ", "") | ||
if not s: return s | ||
return s[0].lower() + s[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel many of these utility functions can be moved into a separate file for maintenance.
.option("dbtable", table_name) \ | ||
.save() | ||
|
||
def initial_recommendation_job(spark, db_jdbc_address, table_name, limit=100, option=1, start_time=None, end_time=None, ns_allow_list=NAMESPACE_ALLOW_LIST): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to describe the main functions (initial_recommendation_job and subsequent_recommendation_job) with some comments. Same for fields like option (what are the values 1,2,3 for).
return (dst, (src, "")) | ||
|
||
def combine_network_peers(a, b): | ||
if a[0] != "" and b[0] != "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain a little bit more on why the case a[0] != "" and b[0] != ""
is possible?
I assume in the ingress_rdd
, for each key - dst
, there will only be one element (key, val) - (dst, (src0|src1|src2|..., ""))
, after performing the reduceByKey in L413.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Anlan, I agree this if condition is redundant, will also remove L66.
66979d1
to
9bd91e8
Compare
elif destinationPodLabels != "": | ||
return "pod_to_pod" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to always store podLabels in the database even if the optional podLabels are set as false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently yes becasue we could not create policies without pod lables.
We may need additional service inside k8s cluster to fetch Pod labels and store in dababase, if there is no podLabels fields when flow aggregator exported.
Signed-off-by: Yongming Ding <[email protected]>
Signed-off-by: Yongming Ding <[email protected]>
11447de
to
00ca02d
Compare
In this PR, we are adding the initial policy recommendation Spark job for flow visility long term architecture.