From b46f398d3b9a7501c4b05ad4b79ac3265d67489a Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Sat, 22 Jun 2024 22:03:11 +0200 Subject: [PATCH 1/9] Bump urllib3 from 2.2.1 to 2.2.2 in /Tests (#641) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.1 to 2.2.2. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kurt Garloff --- Tests/requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Tests/requirements.txt b/Tests/requirements.txt index 990104713..e225645b2 100644 --- a/Tests/requirements.txt +++ b/Tests/requirements.txt @@ -118,7 +118,7 @@ stevedore==5.2.0 # keystoneauth1 typing-extensions==4.12.2 # via dogpile-cache -urllib3==2.2.1 +urllib3==2.2.2 # via # kubernetes-asyncio # requests From f16e4e8b86b116f410d31d9341d1da09adca85e6 Mon Sep 17 00:00:00 2001 From: cah-hbaum <95478065+cah-hbaum@users.noreply.github.com> Date: Tue, 25 Jun 2024 14:25:39 +0200 Subject: [PATCH 2/9] Kubernetes cluster hardening standard (previously "K8s cluster baseline security") (#581) * Update baseline cluster security (#475) Update baseline cluster security Made a small adjustment to read-only port section in order to address some mentions by @bitkeks. Made a small adjustment to related documents in order to address some mentions by @bitkeks. --------- Signed-off-by: Hannes Baum Co-authored-by: Dominik Pataky <33180520+bitkeks@users.noreply.github.com> --- .../scs-0217-v1-baseline-cluster-security.md | 143 ------ Standards/scs-0217-v1-cluster-hardening.md | 475 ++++++++++++++++++ 2 files changed, 475 insertions(+), 143 deletions(-) delete mode 100644 Standards/scs-0217-v1-baseline-cluster-security.md create mode 100644 Standards/scs-0217-v1-cluster-hardening.md diff --git a/Standards/scs-0217-v1-baseline-cluster-security.md b/Standards/scs-0217-v1-baseline-cluster-security.md deleted file mode 100644 index f5dc82688..000000000 --- a/Standards/scs-0217-v1-baseline-cluster-security.md +++ /dev/null @@ -1,143 +0,0 @@ ---- -title: Kubernetes cluster baseline security -type: Standard -status: Draft -track: KaaS ---- - -## Introduction - -Due to the regular changes and updates, there are always new security features to deploy and use in Kubernetes. -Nevertheless, a provider (or even a customer) needs to take action in order to achieve a -hardened, secure cluster due to the myriad of configurations possible. This is especially -the case since Kubernetes ships with insecure features and configurations out of the box, -which will need to be mitigated by an administrator with the proper knowledge. -Hardened, secure Kubernetes clusters are desirable regardless of the possible threat model, -since higher security doesn't necessarily mean higher complexity in this case. - -## Terminology - -| Term | Meaning | -|------|-----------------------------| -| TLS | Transport Layer Security | -| CA | Certificate Authority | -| CSR | Certificate Signing Request | - -## Motivation - -Kubernetes clusters are highly configurable, which also gives rise to different security -problems, if the configuration isn't done properly. -These security risks can potentially be exposed in many different parts of a cluster, e.g. -different APIs, authorization and authentication procedures or even Pod privilege mechanisms. -In order to mitigate these problems, different steps and mechanisms could be used to increase -the security of a Kubernetes setup. - -## Design Considerations - -### External CA - -Kubernetes provides an API to provision TLS certificates that can be signed by a CA. -This CA can be controlled by the cluster provider, which enables much more tight control -over the clusters communication and therefore also better controllable security. - -In order to do this, the CA certificate bundle needs to be added to the trusted certificates -of the server. -To provide a certificate, the following steps need to be undertaken: - -1. Create a CSR -2. Send the CSR manifest to the k8s API -3. Approve the CSR -4. Sign CSR with your CA -5. Upload the signed certificate to the server - -This certificate could now be used by a user in a pod in order to provide a trusted certificate. - -It is also possible for the Kubernetes controller manager to provide the signing functionality. -To enable this, `--cluster-signing-cert-file` and `--cluster-signing-key-file` need to be set with -a reference to the CA keypair, which was used in the previous example to sign a CSR. - -### Protected Kubernetes endpoints - -In order to secure a Kubernetes cluster, the protection of endpoints is important. -To do this, different approaches can be taken. - -#### TLS for all internal/API traffic - -It is already expected by Kubernetes that all API communication internally is encrypted with TLS. -Nevertheless, some endpoints of internal components could be/will be exposed without the necessary -encryption, which could lead to weak points in the system. -A list of the default service endpoints can be seen in the following table - -| Protocol | Port Range | Purpose | Notes | -|----------|-------------|-------------------------|-----------------------------------------------------------------------------------------| -| TCP | 6443* | Kubernetes API Server | - | -| TCP | 2379-2380 | etcd server client API | - | -| TCP | 10250 | Kubelet API | - | -| TCP | 10251/10259 | kube-scheduler | 10251 could be insecure before 1.13, after that only the secure port 10259 is available | -| TCP | 10252/10257 | kube-controller-manager | 10252 could be insecure before 1.13, after that only the secure port 10257 is available | -| TCP | 30000-32767 | NodePort Services | Service endpoints, could be HTTP | - -The usage of `readOnlyPort` (enabling a read-only Kubelet API port on 10255) by design neither provides authentication nor authorization. Its usage is strongly discouraged! - -#### Authentication and Authorization - -All API clients should authenticate and authorize in order to be able to access an API or even -specific functions of this API. This is the case for users as well as internal components. - -Most internal clients (like proxies or nodes) are typically authenticated via service accounts or -x509 certificates, which will normally be created automatically during the setup of a cluster. -External users can authenticate via an access pattern of choice, which is typically decided by -the cluster provider. - -Authorization is (normally) done by the Role-Based Access Control (RBAC), which matches a request -by a user with a set of permissions, also called a role. Kubernetes deploys some roles out-of-the-box; -additional roles need to be carefully checked, since some permissions for specific resources allow -modification of other resources. - -This whole process is especially important for the Kubelet, which allows anonymous requests in its -default configuration. This is obviously a security risk, since everybody with access to its endpoint -could manipulate resources that are managed with the Kubelet. - -To disable anonymous requests, the Kubelet should be started with `--anonymous-auth=false`. -Authentication can be provided either through x509 client certificates or API bearer tokens. -How to set up both approaches can be found in the [Kubelet Authentication and Authorization](https://kubernetes.io/docs/reference/access-authn-authz/kubelet-authn-authz/). - -Kubelet authorization is set to `AlwaysAllow` as a default mode. This can be quite problematic, -since all authenticated users can do all actions. To mitigate this, it is possible to delegate -authorization to the API server by: - -- enabling the `authorization.k8s.io/v1beta1` API group -- starting the Kubelet with the `--authorization-mode=Webhook` and the `--kubeconfig` flags - -After that, the Kubelet calls the `SubjectAccessReview` API in order to determine the authorization of a request. - -## Decision - -This standard tries to increase security for a Kubernetes cluster in order to provide a -solid baseline setup with regard to security. For this to work, multiple measures need to be undertaken. - -A self-controlled CA SHOULD be used in order to be in control of the TLS certificates, which -enables operators to provide and revoke certificates according to their own requirements. - -All internal endpoints found in the section [TLS for all internal/API traffic] MUST be -encrypted with TLS in order to secure internal traffic. - -The Kubernetes API (kubeAPI) MUST be secured by authenticating and authorizing the users -trying to access its endpoints. How a user is authenticated is up to the provider of the -cluster and/or the wishes of the customer. Authorization MUST be done by providing fine-grained RBAC. -The authentication and authorization steps MUST also be applied to the Kubelet, which in its default configuration -doesn't enable them. A way to do this can be found in the section [Authentication and Authorization]. - -## Related Documents - -- [Managing TLS in a cluster](https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/) -- [Securing a cluster](https://kubernetes.io/docs/tasks/administer-cluster/securing-a-cluster/) -- [Controlling access](https://kubernetes.io/docs/concepts/security/controlling-access/) -- [Kubernetes Security Checklist](https://kubernetes.io/docs/concepts/security/security-checklist/) -- [Kubelet Authentication and Authorization](https://kubernetes.io/docs/reference/access-authn-authz/kubelet-authn-authz/) -- [Authentication](https://kubernetes.io/docs/reference/access-authn-authz/authentication/) -- [OWASP Kubernetes Security Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Kubernetes_Security_Cheat_Sheet.html) - -## Conformance Tests - -Conformance Tests will be written in another issue diff --git a/Standards/scs-0217-v1-cluster-hardening.md b/Standards/scs-0217-v1-cluster-hardening.md new file mode 100644 index 000000000..b1a8539dd --- /dev/null +++ b/Standards/scs-0217-v1-cluster-hardening.md @@ -0,0 +1,475 @@ +--- +title: Kubernetes cluster hardening +type: Standard +status: Draft +track: KaaS +--- + +## Introduction + +Due to the regular changes and updates, there are always new security features to deploy and use in Kubernetes. +Nevertheless, a provider (or even a customer) needs to take action in order to achieve a +baseline-secure cluster due to the myriad of configurations possible. This is especially +the case since Kubernetes ships with insecure features and configurations out of the box, +which will need to be mitigated by an administrator with the proper knowledge. +Secure Kubernetes clusters are desirable regardless of the possible threat model, +since higher security doesn't necessarily mean higher complexity in this case. + +## Terminology + +| Term | Meaning | +|------|--------------------------------| +| TLS | Transport Layer Security | +| CA | Certificate Authority | +| JWT | JSON Web Token | +| ABAC | Attribute-based access control | +| RBAC | Role-based access control | + +## Motivation + +Kubernetes clusters are highly configurable, which also gives rise to different security +problems, if the configuration isn't done properly. +These security risks can potentially be exposed in many different parts of a cluster, e.g. +different APIs, authorization and authentication procedures or even Pod privilege mechanisms. +In order to mitigate these problems, different steps and hardening mechanisms could be used +to increase the security of a Kubernetes setup. +Due to the focus of the SCS KaaS standards on the providers, best practices for security +that are more focused on user environments aren't described here, e.g., the possibility for +network traffic control between pods. This could theoretically be set up by a provider, +but isn't very practical for the user, since he would probably need to request changes +regularly in this case. + +## Hardening Kubernetes + +This section is non-authoritative and only describes concepts and design considerations. + +### Regular updates + +Due to the risk associated with running older versions of software, e.g. known security vulnerabilities, +bugs or missing features as well as the difficulty of tracking or identifying attack vectors, +it is advised to first and foremost keep the version of the Kubernetes components up-to-date. +It should be especially important to keep on track with the patch-level [versions of Kubernetes][kubernetes-releases], +since they include bugfixes and security patches, which are also backported to the previous +three minor-level versions, depending on their severity and the feasibility. It is also recommended +to refer to the version skew policy for more details about [component versions][kubernetes-version-skew]. + +### Securing etcd + +The etcd database is the storage for Kubernetes, containing information about cluster workloads, states and secrets. +Gaining access to this critical infrastructure part would enable a bad actor to read the aforementioned information; +write access would be equivalent to administrative access on the Kubernetes cluster and information could be manipulated +while ignoring any restrictions or validations put in place by other Kubernetes components. + +Securing etcd can be done through different or a combination of +many mechanisms, including strong security credentials for the etcd server, the isolation of the etcd servers behind a firewall, separate etcd +instances for components beside the API-server, ACL restrictions for read-write-access to subsets of the keyspace and +a separate CA for etcd communication, which limits the trusted partners of the etcd database to clients with a certificate from this CA. +These strategies will be explained a bit more in-depth in the following subsections. + +#### Strong authentication + +If an etcd instance wasn't secured correctly, it could be possible that a bad actor would try to authenticate against +the database. +It is therefore advised to use strong security credentials (see e.g. [the strong credentials requirements by NIST][strong-credentials]) for +all user accounts on the etcd server as well as the machines running this critical component. +This is obviously a fact for all possibly accessible components, but especially true for etcd, since it contains +the complete cluster state. + +#### Multiple etcd instances + +etcd is a critical component that needs to be protected from +bad actors as well as outages. Kubernetes recommends a [five-member cluster](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#multi-node-etcd-cluster) for durability and high-availability as well as regular back-ups of the data. +For more information on high-availability, look into the [Kubernetes Node Distribution and Availability Standard](scs-0214-v1-k8s-node-distribution.md). +It would also be possible to use these etcd instances in order to select specific instances +that aren't the current etcd leader for interaction with different components (e.g. Calico), since access to the primary etcd instance could be considered dangerous, because the full keyspace could be viewed without further restrictions (see [here](https://cheatsheetseries.owasp.org/cheatsheets/Kubernetes_Security_Cheat_Sheet.html#limiting-access-to-the-primary-etcd-instance) or [here](https://docs.tigera.io/calico/latest/reference/etcd-rbac/kubernetes-advanced)). +This approach should still be paired with [etcd ACL](#acl-restrictions) to better restrict access. + +#### etcd isolation + +The etcd database should at best be isolated from the rest of a Kubernetes cluster. +Access should only be granted to components that need it, which is in most cases mainly (or only) +the API server. Best practice would be to host etcd on machines separate from the Kubernetes cluster +and block access from machines or networks that don't need access with specific firewall rules. +In most cases, only the API server machines should need access to etcd on ports 2379-2380. + +#### ACL restrictions + +etcd implements access control lists (ACL) and authentication since version 2.1 [1][etcd-auth]. +etcd provides users and roles; users gain permissions through roles. When authentication is enabled, +each request to etcd requires authentication and the transaction is only allowed, if the user has the correct access rights. +etcd can also be launched with `--client-cert-auth=true`, which enables authentication via +the Common Name (CN) field of a client TLS certificate without a password. +This option enables Kubernetes components to authenticate as a user without providing a password, +which is neither possible for Kubernetes components nor planned in future releases. +This method is recommended in order to implement ACL for different Kubernetes components and +not give the Kubernetes API full root access to the etcd instance; instead, a separate user can be created. + +#### TLS communication + +etcd should use TLS for peer- and cluster-communication, so that traffic between different peered etcd instances as well +as the communication with the Kubernetes cluster can be secured. +etcd provides options for all these scenarios, including `--peer-key-file=peer.key` and `--peer-cert-file=peer.cert` +for securing peer communication and the flags `--key-file=k8sclient.key` and `--cert-file=k8sclient.cert` for securing +client communication (and therefore cluster communication). +Additionally, HTTPS should be used as the URL schema. +It is also possible to use a separate CA for the etcd in order to separate and better control access through client +certificates, since etcd by default trusts all the certificates issued by the root CA [2][nsa-cisa]. +More information about authentication via TLS is provided in the chapter [ACL restrictions](#acl-restrictions). + +### Securing endpoints + +Kubernetes provides a well-defined set of ports in its default configuration. These ports are +used for inter-component communication as well as external access. Due to the distribution of information +about Kubernetes clusters, it is easy for a bad actor to identify a clusters +ports and try to attack them. In order to minimize the attack vector, internal ports (and therefore components) +should not be accessible from external networks, except if there are requirements to enable this behavior. + +A good way to restrict access would be a combination of firewalls with port +blocking and the integration of network separation. +How this is done is highly dependent on the specific setup of the provider. +An additional document could be provided in the future to give basic +guidelines for this task. + +A list of the default ports used in Kubernetes as well as the components accessing them can be found below: + +#### Control plane nodes + +| Ports | Protocol | Purpose | Used by | Access type | +|-----------|----------|-------------------------|-----------------------|--------------------| +| 6443 | TCP | API server | All | External, internal | +| 2379-2380 | TCP | etcd server | kube-apiserver, etcd | Internal | +| 10250 | TCP | Kubelet API | Self, Control plane | Internal | +| 10255 | TCP | Read-only Kubelet API | External applications | External, Internal | +| 10257 | TCP | kube-controller-manager | Self | Internal | +| 10259 | TCP | kube-scheduler | Self | Internal | + +Hint: `Self` in the `Used by` context means, that a resource will access its own port for requests. + +#### Worker nodes + +| Ports | Protocol | Purpose | Used by | Access type | +|-------------|----------|-----------------------|-----------------------|--------------------| +| 10250 | TCP | Kubelet API | Self, Control plane | Internal | +| 10255 | TCP | Read-only Kubelet API | External applications | External, internal | +| 30000-32767 | TCP | NodePort Services | | External | + +### API security, authentication and authorization + +In order to secure Kubernetes against bad actors, limiting and securing access to API requests +is recommended, since requests to those are able to control the entire Kubernetes cluster. +Access control is applied to both human users and Kubernetes service accounts, which goes through +several stages after a request reaches the API. + +1. The Kubernetes API server listens on port 6443 on the first non-localhost network interface by default, +protected by TLS [3][controlling-access]. The TLS certificate can either be signed with a private CA or based on a public key +infrastructure with a widely recognized CA behind it. +2. The authentication step checks the request for correct authentication based on different possible +authentication modules like password, plain tokens or JWT. Only one of these methods needs to succeed +in order to allow a request to pass to the next stage. +3. The authorization step authorizes a request, if a user is allowed to carry out a specific operation. +The request must contain the username of the requester, the requested action and the affected object. +Kubernetes supports different authorization modules like ABAC, RBAC or Webhooks. Only one of these +modules need to approve the request in order for it to be authorized. +4. The last step are Admission control modules, which can modify or reject requests after accessing +the objects contents. + +#### Authentication + +Kubernetes provides different internal authentication mechanisms, that can be used depending +on the requirements of the cluster provider and user. Multiple authentication systems can +be enabled and the [Kubernetes documentation][kubernetes-auth] recommends at least using two methods, +including Service Account Tokens and another method. Methods directly provided by Kubernetes include +the following (a more complete or up-to-date list may be found in the [Kubernetes authentication docs][kubernetes-auth]): + +- *Static Token Files* + + This method reads bearer tokens from requests and checks them against a CSV file provided to Kubernetes containing + three columns named `token`, `username` and `uid`. These tokens last indefinitely and the list can't be changed + without a restart of the API server. This makes this option unsuitable for production clusters. + +- *Service Account Tokens* + + A service account is an authenticator that uses signed bearer tokens for request verification. + Service accounts can be given to the API server with a file containing PEM-encoded X509 RSA or + ECDSA private or public keys that verify the Service Account Tokens. + Service Accounts are normally created automatically by the API server and associated with the + pods through the `ServiceAccount` admission controller. Tokens are signed JSON Web Tokens, + that can be used as a Bearer Token or mounted into the pods for API server access. + Since Service Account Tokens are mainly used to allow workloads accessing the API server, + they're not really intended to authenticate users in production clusters. + +- *X509 client certificates* + + Client certificate authentication can be enabled by providing a `Certificate Authority` + file to the API server via the `--client-ca-file=` option. The file contains one + or more CAs that a presented client certificate is validated against. + In this case the common subject name is used as the username for the request; + additionally, a group membership can be indicated with the certificates organization field. + These certificates are unsuitable for production use, because Kubernetes does not + support certificate revocation. This means user credentials can't be modified or + revoked without rotating the root CA and re-issuing all cluster certificates. + +As outlined, most internal authentication mechanisms of Kubernetes aren't really +usable in productive environments at the current time. Instead, external authentication +should be used in order to provide production-ready workflows. +The Kubernetes documentation lists a few examples for external authenticators, e.g. + +- [OpenIDConnect][openidconnect] +- Bearer Tokens with [Webhook Token Authentication][webhook-token] +- Request Header Authentication with an [Authenticating Proxy][authenticating-proxy] + +All of these examples are useful to set up for an organization or can be used with +an already in-place solution. More information can be found in their respective +part of the Kubernetes documentation. +Most of these are good solutions for productive setups, since they enable easy +user management, access revocation and things like short-lived access tokens. +What will be used by your organization depends on the present setup and the use case. + +#### Authorization + +Authorization is done after the authentication step in order to check the rights +of a user within the system. Kubernetes authorizes API requests with the API server, +which evaluates requests against all policies in place and then allows or denies these requests. +By default, a request would be denied. + +Kubernetes provides several authentication modes to authorize a request: + +- *Node* + + The [Node authorization mode][node-authorization] grants permission to a Kubelet + based on the scheduled pods running on them. It allows a Kubelet to perform specific + API operations. The goal is to have a minimal set of permissions to ensure + the Kubelet can operate correctly. + Each Kubelet identifies with credentials belonging to the `system:nodes` group and + a username `system:nodes:` against this authorizer. + +- *ABAC (Attribute-based access control)* + + ABAC grants access rights based on policies dependent on attributes like + user attributes, resource attributes or environment attributes. + An example would be the `resource` attribute, which could limit access for a user + to only `Pod` resources. + +- *RBAC (Role-based access control)* + + RBAC is a method of regulating access to the resources based on the roles of + individual users. A user therefore must have the ability to perform a specific set + of tasks with a set of resources based on his role. + Kubernetes implements `Role`s to accomplish this and binds these with `Role Binding`s + to a user in order to specify his permission set. + +- *Webhook* + + Webhook authorization uses an HTTP callback to check the authorization of a user + against a URL provided for this mode. This externalises the authorization part + outside of Kubernetes. + +Most organizations and deployments work with RBAC, most often due to organizational or +customer-owner-relationship-like structures in place. +Nonetheless, neither ABAC, RBAC nor Webhook authorization can be recommended over the +other, since this all depends on the use case and required structure of a deployment. +Using at least one of these modes is recommended. + +It is also recommended to enable the Node authorizer in order to limit Kubelet +permissions to a minimum operational state. + +#### Admission Controllers + +Admission controllers intercept requests to the Kubernetes API after the +authentication and authorization step, which validate and/or mutate the request. +This step is limited to `create`, `modify` and `delete` objects as well as custom +verbs, but other requests are not blocked. +Kubernetes provides multiple admission controllers, some of which are enabled by default. + +One recommended admission controller is the [`NodeRestriction` controller][node-restriction], +which limits the `Node` and `Pod` objects a Kubelet is allowed to modify to their own `Node` or +objects that are bound to them. It also disallows updating or removing taints and prevents changing +or adding labels with a `node-restriction.kubernetes.io/` prefix. +Be aware that Kubelets will only be limited by this admission controller, if the user credentials +in the `system:nodes` group begin with a `system:node:` username. Administrators must therefore +configure their Kubelets correctly, if the `NodeRestriction` controller should be fully functional. + +### Kubelet access control + +The Kubelet is the node agent that runs on each node. It registers with the API +server and ensures, that pods handed over to it are running and healthy according +to the specification provided to it. The HTTPS endpoint of a Kubelet exposes APIs +with varying access to sensitive data and also enables various levels +of performant operations enabling manipulation of node data and containers. +There is also a read-only HTTP endpoint that was used for monitoring a Kubelet and +its information. This port was also used by applications like `kubeadm` to check +the health status of the Kubelet. +This port is still available, but it is planned to be [removed][ro-port-removal] +in a future version. At the moment, the port is disabled by default since [Kubernetes 1.10][ro-port-disabled] +and shortly later also in [`kubeadm`][ro-port-disabled-kubeadm]. +Different sources recommend disabling this port [4][ro-port-s1] [5][ro-port-s2] due to possible +security risks, but since this standard recommends restricting accessibility of internal ports, +this port wouldn't be accessible from external networks. +It is nevertheless recommended to keep this port disabled, since Kubernetes also acknowledged +its risks and plans to remove it. + +By default, the API server does not verify the Kubelets serving certificate and +requests to the HTTPS endpoint that are not rejected by other authentication +methods are treated as anonymous requests with the combination of name `system:anonymous` +and group `system:unauthenticated`. +This can be disabled by starting the Kubelet with the flag `--anonymous-auth=false`, +which return `401 Unauthorized` for unauthenticated requests. +It is also possible to enable internal authentication methods for the Kubelet. +Possibilities include X509 client certificates as well as API bearer tokens to +authenticate against the Kubelet; details for these methods can be found in the [Kubernetes docs][kubelet-auth]. + +After a request is authenticated, the authorization for it is checked, with the default +being `AlwaysAllow`. Requests should at best be authorized depending on their source, +so differentiation of access makes sense for the Kubelet; not all users should have +the same access rights. How access can be configured and delegated to the Kubernetes +API server can be found in the [Kubernetes docs][kubelet-auth]. The process works like the API request +authorization approach with verbs and resources being used as identifiers in roles and role bindings. + +### Pod security policies + +Pod security plays a big part in securing a Kubernetes cluster, since bad actors could use pods to gain +privileged access to the systems underneath. The security risk here is mainly influenced by the capabilities +and privileges given to a container. It is therefore recommended to apply the principal of "least privilege", +which should limit the security risk to a minimum. + +Kubernetes defines the [*Pod security standards*][pod-security-standards] +in the form of three policies that try to cover the range of the security spectrum. +These policies can be found in the following list and define a list of restricted fields that can only be +changed to a set of allowed values. An up-to-date list of these values can be found [here][pod-security-standards]. + +- *Privileged* + + Unrestricted policy, providing the widest possible level of permissions. + This policy allows for known privilege escalations. + +- *Baseline* + + Minimally restrictive policy which prevents known privilege escalations. + Allows the default (minimally specified) Pod configuration. + +- *Restricted* + + Heavily restricted policy, following current Pod hardening best practices. + +Kubernetes also offers the *Pod security* admission controller, which enforces +the *Pod security standards* on a namespace level during pod creation. +The admission controller defines the standard to be used with the three levels +`privileged`, `baseline` and `restricted`. Each namespace can be configured to enforce +a different control mode, which defines what action the control plane takes +after a violation of the selected *Pod security* is detected. + +- `enforce` + + Policy violations will cause the pod to be rejected. + +- `audit` + + Policy violations will trigger the addition of an audit annotation to the event + recorded in the audit log, but are otherwise allowed. + +- `warn` + + Policy violations will trigger a user-facing warning, but are otherwise allowed. + +Be aware, that `enforce` is not applied to workload resources, only to the pods created from their template. + +### Further measurements + +While researching this topic, further measurements were considered such as container image verification, +distroless images, usage of `ImagePolicyWebhook`, network policy enforcement, +container sandboxing and prevention of kernel module loading. +Most of these were taken out of the document during writing due to either being the responsibility +of the clusters' user (and therefore not possible to implement for the provider), being more relevant +for high security clusters or changing the expected cluster environment too much, so that normally +expected operations could potentially not work in such a modified cluster. +These measurements will possibly be introduced in a future document about higher security clusters. + +## Standard + +This standard provides the baseline security requirements for a cluster in the SCS context. + +Kubernetes clusters MUST be updated regularly in order to receive bugfixes and security patches. +For more information refer to the [SCS K8s Version Policy](scs-0210-v2-k8s-version-policy.md), +which outlines the version update policies of the SCS. + +Hardening etcd is important due to it being a critical component inside a Kubernetes cluster. +etcd SHOULD be isolated from the Kubernetes cluster by being hosted on separate (virtual) machines. +If this is the case, access to these instances MUST be configured, so that only the API server and +necessary cluster components requiring access can access etcd. +Communication with etcd MUST be secured with TLS for both peer- and cluster-communication. +It is RECOMMENDED to use a CA separate from the one used for the Kubernetes cluster for etcd in +order to better control and issue certificates for clients allowed to access etcd. +ACL MUST be enabled for etcd, which allows better control of the access rights to specific key sets +for specific users. Authentication MUST be done via the Common Name (CN) field of the TLS client +certificates (since normal username-password-authentication isn't implemented for Kubernetes). + +Kubernetes' endpoints MUST be secured in order to provide a small attack surface for bad actors. +It MUST NOT be possible to access Kubernetes ports from outside the internal network hosting the +Kubernetes cluster except for the ports of the API server (default 6443) and the NodePort Services +(default 30000-32767). The read-only Kubelet API port (default 10255), which is mostly used for monitoring, +SHOULD be disabled altogether if it isn't in use, mainly because the port is HTTP-only +and can deliver sensitive information to the outside. +Endpoints MUST be secured via HTTPS. + +Securing Kubernetes via authentication and authorization is another important topic here. +Authentication is possible through multiple mechanisms, including Kubernetes-provided systems as well as external +authentication processes. +A cluster MUST implement at least two methods for authentication. One of these MUST be *Service Account Tokens*, in order +to provide full functionality to Pods. A second authentication mechanisms can be chosen depending on the requirements +of the provider and/or customer. + +Authorization also can be provided through multiple mechanisms. +A cluster MUST activate at least two authorization methods, one of which MUST be *Node authorization* and another one +consisting of either ABAC, RBAC or Webhook authorization depending on the required use case. +We RECOMMEND RBAC due to it fitting most use cases and being very well documented, but your setup might require another solution. + +In order to harden Kubelet access control, a Kubelet SHOULD only be accessible internally via HTTPS. This is already the +case for the Kubelet API, except for the read-only port, which is only available as HTTP. As mentioned earlier, this port +should be disabled. +Kubelets MUST disable anonymous request authentication to disallow non-rejected requests to go through as anonymous requests. +OPTIONALLY, X509 client certificate authentication or API bearer token authentication can be enabled. +Request authorization for the Kubelet MUST be delegated to the API server via `Webhook` authorization as it is recommended +by the [Kubernetes documentation][kubelet-auth]. +Additionally, the `NodeRestriction` admission controller MUST be activated in order to limit interactions between +different Kubelets by disallowing modification of `Pod` objects, if they're not bound to the Kubelet requesting the modification. + +At last, *Pod security standards* in the form of policies MUST be activated for the cluster. The SCS REQUIRES at least +the *Baseline* policy with the *Restricted* policy CAN also be used. +The *Pod security* admission controller MUST also be activated in order to enforce these policies on a namespace level. +We RECOMMEND the `enforce` level to be used for this admission controller setup. + +## Conformance Tests + +Conformance Tests will be written within another issue. + +## Related Documents + +- [OWASP Kubernetes Security Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Kubernetes_Security_Cheat_Sheet.html) +- [Kubernetes security concepts](https://kubernetes.io/docs/concepts/security/) +- [Securing a cluster](https://kubernetes.io/docs/tasks/administer-cluster/securing-a-cluster/) +- [Controlling access](https://kubernetes.io/docs/concepts/security/controlling-access/) +- [Pod security standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) +- [NSA CISA Kubernetes hardening](https://kubernetes.io/blog/2021/10/05/nsa-cisa-kubernetes-hardening-guidance/) +- [Configure etcd](https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/) +- [Google Kubernetes cluster trust](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-trust) + +[kubernetes-releases]: https://kubernetes.io/releases/ +[kubernetes-version-skew]: https://kubernetes.io/releases/version-skew-policy/ +[strong-credentials]: https://pages.nist.gov/800-63-3/sp800-63b.html +[kubernetes-auth]: https://kubernetes.io/docs/reference/access-authn-authz/authentication/ +[node-authorization]: https://kubernetes.io/docs/reference/access-authn-authz/node/ +[node-restriction]: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#noderestriction +[kubelet-auth]: https://kubernetes.io/docs/reference/access-authn-authz/kubelet-authn-authz/#kubelet-authorization +[pod-security-standards]: https://kubernetes.io/docs/concepts/security/pod-security-standards/ +[openidconnect]: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#openid-connect-tokens +[webhook-token]: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#webhook-token-authentication +[authenticating-proxy]: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#authenticating-proxy +[controlling-access]: https://kubernetes.io/docs/concepts/security/controlling-access/ + +[ro-port-removal]: https://github.com/kubernetes/kubernetes/issues/12968 +[ro-port-disabled]: https://github.com/kubernetes/kubernetes/pull/59666 +[ro-port-disabled-kubeadm]: https://github.com/kubernetes/kubeadm/issues/732 +[ro-port-s1]: https://www.stigviewer.com/stig/kubernetes/2021-04-14/finding/V-242387 +[ro-port-s2]: https://docs.datadoghq.com/security/default_rules/cis-kubernetes-1.5.1-4.2.4/ +[nsa-cisa]: https://kubernetes.io/blog/2021/10/05/nsa-cisa-kubernetes-hardening-guidance/ +[etcd-auth]: https://etcd.io/docs/v3.3/op-guide/authentication/ From 13fa0bda8a7e6ef1e8d3a15306cf759a594369a9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20B=C3=BCchse?= Date: Tue, 25 Jun 2024 14:42:57 +0200 Subject: [PATCH 3/9] Feature: main check script allows selecting individual tests (#634) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Matthias Büchse --- Tests/scs-compliance-check.py | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/Tests/scs-compliance-check.py b/Tests/scs-compliance-check.py index 0bdd18fda..aa7ecd667 100755 --- a/Tests/scs-compliance-check.py +++ b/Tests/scs-compliance-check.py @@ -20,6 +20,7 @@ import os import os.path import uuid +import re import sys import shlex import getopt @@ -48,6 +49,7 @@ def usage(file=sys.stdout): -V/--version VERS: Force version VERS of the standard (instead of deriving from date) -s/--subject SUBJECT: Name of the subject (cloud) under test, for the report -S/--sections SECTION_LIST: comma-separated list of sections to test (default: all sections) + -t/--tests REGEX: regular expression to select individual tests -o/--output REPORT_PATH: Generate yaml report of compliance check under given path -C/--critical-only: Only return critical errors in return code -a/--assign KEY=VALUE: assign variable to be used for the run (as required by yaml file) @@ -91,13 +93,14 @@ def __init__(self): self.output = None self.sections = None self.critical_only = False + self.tests = None def apply_argv(self, argv): """Parse options. May exit the program.""" try: - opts, args = getopt.gnu_getopt(argv, "hvqd:V:s:o:S:Ca:", ( + opts, args = getopt.gnu_getopt(argv, "hvqd:V:s:o:S:Ca:t:", ( "help", "verbose", "quiet", "date=", "version=", - "subject=", "output=", "sections=", "critical-only", "assign", + "subject=", "output=", "sections=", "critical-only", "assign", "tests", )) except getopt.GetoptError as exc: print(f"Option error: {exc}", file=sys.stderr) @@ -128,6 +131,8 @@ def apply_argv(self, argv): if key in self.assignment: raise ValueError(f"Double assignment for {key!r}") self.assignment[key] = value + elif opt[0] == "-t" or opt[0] == "--tests": + self.tests = re.compile(opt[1]) else: print(f"Error: Unknown argument {opt[0]}", file=sys.stderr) if len(args) < 1: @@ -239,6 +244,7 @@ def main(argv): "assignment": config.assignment, "sections": config.sections, "forced_version": config.version or None, + "forced_tests": None if config.tests is None else config.tests.pattern, "invocations": {}, }, } @@ -274,11 +280,12 @@ def main(argv): for standard in vd.get("standards", ()): check_keywords('standard', standard) optional = condition_optional(standard) - printnq("*******************************************************") - printnq(f"Testing {'optional ' * optional}standard {standard['name']} ...") - printnq(f"Reference: {standard['url']} ...") + if config.tests is None: + printnq("*******************************************************") + printnq(f"Testing {'optional ' * optional}standard {standard['name']} ...") + printnq(f"Reference: {standard['url']} ...") checks = standard.get("checks", ()) - if not checks: + if not checks and config.tests is None: printnq(f"WARNING: No check tool specified for {standard['name']}", file=sys.stderr) for check in checks: check_keywords('check', check) @@ -288,6 +295,12 @@ def main(argv): if id_ in seen_ids: raise RuntimeError(f"duplicate id: {id_}") seen_ids.add(id_) + if config.tests is not None: + if not config.tests.match(id_): + # print(f"skipping check '{id_}': doesn't match tests selector") + continue + printnq("*******************************************************") + print(f"running check {id_}") if 'executable' not in check: # most probably a manual check print(f"skipping check '{id_}': no executable given") From 3f68cc48cd0bdc86fe49ad351af3c6a4040af79d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20B=C3=BCchse?= Date: Tue, 25 Jun 2024 14:52:42 +0200 Subject: [PATCH 4/9] Change scs-0104-v1 to Procedural; add Implementation Notes (#631) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Matthias Büchse --- Standards/scs-0104-v1-standard-images.md | 2 +- ...-0104-w1-standard-images-implementation.md | 61 +++++++++++++++++++ 2 files changed, 62 insertions(+), 1 deletion(-) create mode 100644 Standards/scs-0104-w1-standard-images-implementation.md diff --git a/Standards/scs-0104-v1-standard-images.md b/Standards/scs-0104-v1-standard-images.md index 220add31b..af845a271 100644 --- a/Standards/scs-0104-v1-standard-images.md +++ b/Standards/scs-0104-v1-standard-images.md @@ -1,6 +1,6 @@ --- title: SCS Standard Images -type: Standard +type: Procedural status: Stable stabilized_at: 2024-02-21 track: IaaS diff --git a/Standards/scs-0104-w1-standard-images-implementation.md b/Standards/scs-0104-w1-standard-images-implementation.md new file mode 100644 index 000000000..9806285e4 --- /dev/null +++ b/Standards/scs-0104-w1-standard-images-implementation.md @@ -0,0 +1,61 @@ +--- +title: "SCS Standard Images Standard: Implementation Notes" +type: Supplement +track: IaaS +status: Proposal +supplements: + - scs-0104-v1-standard-images.md +--- + +## Introduction + +The SCS standard on standard images does not in itself lay down what images are actually +required or recommended; rather it specifies the format of a YAML file that in turn serves +said purpose. The particular YAML file that an implementer (a cloud service provider or operator) +has to comply with is given in the respective version of the certificate scope "SCS-compatible IaaS" +as a parameter to the standard. This document is intended to give implementers a +step-by-step guide on how to comply with the SCS certificate scope. + +## Step-by-step walkthrough + +### Option A: pragmatic + +Run the test script on your environment and check the error messages :) + +1. Check out the [standards repository](https://github.com/SovereignCloudStack/standards). + + ```shell + git clone https://github.com/SovereignCloudStack/standards.git + cd standards + ``` + +2. Install requirements: + + ```shell + python3 -m venv .venv && source .venv/bin/activate + pip install -r requirements.txt + ``` + +3. Make sure that your `OS_CLOUD` environment variable is set. +4. Run the main check script: + + ```shell + python3 ./Tests/scs-compliance-check.py ./Tests/scs-compatible-iaas.yaml -t standard-images-check \ + -s $OS_CLOUD -a os_cloud=$OS_CLOUD -o report.yaml -C + ``` + +5. Inspect console output (stderr) for error messages. + +### Option B: principled + +1. Find your intended version of the certificate scope in the [overview table](https://docs.scs.community/standards/scs-compatible-iaas). It will most likely be one whose 'State' is 'Effective' or 'Stable'. +2. In (or below) the row labeled 'scs-0104: Standard images', you find a link to the YAML file that lists mandatory and recommended images, such as [scs-0104-v1-images.yaml](https://github.com/SovereignCloudStack/standards/blob/main/Tests/iaas/scs-0104-v1-images.yaml) for v4 of the certificate scope. +3. For each entry under `images`, ensure the following (either manually or by using the OpenStack Image Manager described in the section "Operational Tooling"): + - if the entry says `status: mandatory`, your environment MUST provide this image, i.e., an image whose name matches the `name_scheme` or (in absence of a name scheme) the `name`. + - every actual image in your environment _that matches the `name_scheme` or (in absence of a name scheme) the `name`_ has the correct `image_source` property: its value MUST start with one of the prefixes listed under `source`. + +## Operational Tooling + +The [openstack-image-manager](https://github.com/osism/openstack-image-manager) is able to +create all standard, mandatory SCS images for you given image definitions from a YAML file. +Please see [its documentation](https://docs.scs.community/docs/iaas/components/image-manager/) for details. From d76fa9fa27c2371a65c095a93369f6aef891fcea Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20B=C3=BCchse?= Date: Tue, 25 Jun 2024 15:11:13 +0200 Subject: [PATCH 5/9] Bugfix: do not leak secrets into the log; rotate affected api key (#646) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Matthias Büchse --- .zuul.d/secure.yaml | 40 +++++++++++++++---------------- compliance-monitor/bootstrap.yaml | 2 +- playbooks/compliance_check.yaml | 1 + 3 files changed, 22 insertions(+), 21 deletions(-) diff --git a/.zuul.d/secure.yaml b/.zuul.d/secure.yaml index 2afcdd29c..6abe89cf9 100644 --- a/.zuul.d/secure.yaml +++ b/.zuul.d/secure.yaml @@ -3,27 +3,27 @@ name: SECRET_STANDARDS data: zuul_ci_api_key: !encrypted/pkcs1-oaep - - bsOAcG0vC2ZM7L3rvHuav2c7FS0a5exfbAQ5s134bn4idh5olm7a1tNO+bTmtmz5DiFRJ - j/N2nkxjcP4n9ISPh0bXwFjzK3AsRD3AVqiFBkPZ5UBj6WGwhIzAZVkf79GH8CWV0E2Rz - 9JxqhjzlreCrGBuFtmS7HSKqW0Wn5OgQlDXUdRQmDb9Za2MPVwKXDkKPEL7+SSs+1YiJC - JKTDJ5wXnnG6axjtU1MD9aGxX3PEaffJBcyZjYuQd9ipSSnBhR+8Ze3dpi2GVl72xvAdi - 88mAVT39ru0/QQCc8NppToynG19+AAIi3kGaol8mropi+Q3K/uWKoVh8mQ1IA5sJg/1Kq - KDBfoL8xzOUk42tav5Rb8C7A7pNi8u0WscQXhoDlSxJLyQ7/LlpovH+H8KVHCnW4xC3MV - zhdWGsR2RJK8B5hUgfXJsV4dDJU6fE5XZHCA+pJ69Lya6+aGtVwACYaSLXYbhb496V6W+ - FAbJmmHOXqWsJyjTouLMxKa/OcbAi+4TClF33qqSEda/hzgGNsUjhbhFQ9HZqsyoZzpVs - t/za1YIN1CSZRRofqIqUx1L+YpFyUqyhuxMkQJPllp9Uxzcffyxy1vYEsyplnwhJM3uXa - SoDs5fxitV1OnLBTcWvXWvDMY6dOvdTHhPu3wuAw+YiWfXegp58kVn3z0oLXBo= + - jUzccSZN0bjWMX8BPntW4dct5gT5YtlNPQUbQRELCKBlce7y1Ao41g4u06CXBMfzUZIYG + eWoxhDsn2QfMnN8RLxkWEjGBG/+n7ql2N1CC+FuSBqU8FkDliSPNZdlC75BFBQYksVeG/ + XbnP90D4u3cOzE6nK72Ftr5hK90Is/PquIfcStXjxoirZjO4tsgz8kI4elBGGg2q96guu + 5LTVejYcxH/iwr78AM4YvRSnLto7L0tNxfTI3K6l9gFLWHY52DINYhmZq18MmGIx99Yat + wvnLefMeX634FzlS+qzemC029T5FKot+rNG4zD2JbpyN7Raqt6HBLRh1SIyte7RLqTmmA + sONOho8I+LhRvb0no3vjslNm/OJU1SSdcTPilFv6Vyvr0wt892ihsqHn6F0ZT3HAXpkeI + h5Vax81vIl6T+ZUCqgqea7zgQHr1RntqOc1tu8nx1PtrzHRySZBS+CQwo9GKbXR0HdLOk + ANItov3dMRqgDewGQPoX7UyI23zfa5JmFAkDkM016l+r4xDfo4v/eGVArloyTEF362gMh + o3NDMUlGaL2kNehk6k/ol8HyWilVRceGFDVRZjYlGWFU+jFTNYi0vqq8mP1gv9CIVOe/b + K6dgp6XZo/apeESp4e8xlfJA2Tf+UpylmBtsgcHXjOWPNH9JaalA8pv6LWjcf8= zuul_ci_basic_auth: !encrypted/pkcs1-oaep - - T8f5tOKbDFKlxJaj7FbB0U6G61zUNrxxXZmrAQnzfhoXUMd9eWhrzXjUKbFUEa+3IaNnF - Uyu5VYeIMRxZ3dBltDpHCUDs+3ei+LYm7D/d8W1x5wg03HrLUv8OOeiMM7PrvQMINXz5x - cEC0d16sCcapmN20Yleamfi7JWUCh1M4zibWCSCCbaXgKeHvUQEMIy4pLr1F6dN98W+Tq - ecinhrSJcbqQUg5hdt0Qn3UKCm1ncl9smvY4TvRABO14LZv0NSvzpwXvPq/OiNwviPPSh - MU5KERRAyDBlJYLOtdXqDTwJ26K83h52jNAHWH2CEi03864ORNQgt8eRxfsgSXkdk1EXF - HHFwJfqztaPOqpKBXmNr8oe4E/iuWa/buLhknYknA/HeGMOZWFvFtv9iuLKBNd9W0/Asn - p3ZkxTSx17sS1/mLJyINi3p/g+bmZ/R4qSwVsvFUl93evddg/dFrxg6/S6cMAj8iUMEsD - KTt/5smVB+LTW4ZTbCzBq9ygrVOKOTlY0d43AwlYbVu65+8YLJAgmKWQ3G1LISXzlIqoA - yynW3UafFYZUDCWAFO9acabAHlpsaivDjvgG+2S6Fj3GeQic+7vKZzoCUAJFZmgRDXNBb - LuNzERqm/m4ZeN4D4HHbJAmtza+sT5DM4ZvpdGLJaWQUs+ki+fes2h8FQhBMYg= + - cGUlUyavav0SwvkMeaZu5VZs/FD7qMUXx+vkskQULii1Kjmob9wpIpwBasB6C5wjkz7Yu + SZc9kDeYLGNlLS0liGEnNzQEHtSToVCLmEsuEQQWVLKgjVFHhuYG92D//OTnPhwGdrWR0 + YbN++e6dfdoUTFgNomE9yQ/AiP31frb5xsnkzBOd4Yck7ctlCaoEFqsLDpg9fCvmHuTOD + rI5lwaxHjaNCsBdWBvGrV/y06wWz3Dd4DI/mK9gzQT8LQUJb14WuM7Skif0piig/3X2+p + 4AFYEj9ZsxTaGL+IhHsMQKHQOxZ8qfDsBTVYjaQxo7qGrGtZ9sZcRcmhScA5OgSiMI3kA + jrFtm3TMT15SZVaaG/fqmaYykGl9JXD5jBvpwIY6oPmjjY9hNmc5bm0tmsJ/RF6f8ZChP + sR5e/wOJUnYbUxTyuId1ZrFL7kFQxfc1HWg1VCwRTfeYwFal5K3CqXWu9u6O/6foa/DFn + uHYP40Lq1Rd4LRQjT3TW9TybgM2Jvt6sd7sCKM7KQ4/fGyj8BswRThBNzuJMN83QmmAiq + phthkp6X2A7ELMd22wjrOy7ruwZfXObhagJNis4x/t55fdDpnZcW3KeeqOJAv2xkD4StF + 32RSKLdIRbtWLsouOYPNloVFwPAykbrFkDfH2lyy0LJS9gWyK7t6u4Ks3P7hUE= gx_scs_key: !encrypted/pkcs1-oaep - gbtzWcQo4LytBGfTskgeFs0bFXzAZo2R33ljlNFAfNzdzlrPDnrEljyys+Bp9+yjEfcG/ rq8YeW0FVVYYulnmkasfgbUP8lMmsMHli5AwCLB00QjqCsy6Ixc1ELUr+KTTSYXyU8qhT diff --git a/compliance-monitor/bootstrap.yaml b/compliance-monitor/bootstrap.yaml index b8a990180..3228aa40c 100644 --- a/compliance-monitor/bootstrap.yaml +++ b/compliance-monitor/bootstrap.yaml @@ -13,7 +13,7 @@ accounts: - admin - subject: zuul_ci api_keys: - - "$argon2id$v=19$m=65536,t=3,p=4$m3OOcQ4BoNR6L0VoDUFIyQ$Y/cCKggkIuyHHFGv1Ng5ijMIIJjyvfkjvdASL2LFiPA" + - "$argon2id$v=19$m=65536,t=3,p=4$1/o/RwihlFIKAaAUolQKAQ$4MAuy6myIaVNofSW9KLlf81/y7WotHCfRl8dxKJ2rjQ" roles: - append_any - subject: gx-scs diff --git a/playbooks/compliance_check.yaml b/playbooks/compliance_check.yaml index 4d7305fd2..ce2662847 100644 --- a/playbooks/compliance_check.yaml +++ b/playbooks/compliance_check.yaml @@ -15,6 +15,7 @@ msg: "{{ result.stdout }} {{ result.stderr }}" - name: sign result YAML and upload to compliance monitor + no_log: true # do not leak the secret in the shell part below ansible.builtin.shell: | ssh-keygen -Y sign -f ~/id_subject -n report report.yaml curl --data-binary @report.yaml.sig --data-binary @report.yaml -H "Content-Type: application/x-signed-yaml" -H "Authorization: Basic {{ clouds_conf.zuul_ci_basic_auth }}" https://compliance.sovereignit.cloud/reports From e6dbea418a6030519e21a4bd25b102bed058bf10 Mon Sep 17 00:00:00 2001 From: cah-hbaum <95478065+cah-hbaum@users.noreply.github.com> Date: Wed, 26 Jun 2024 09:34:31 +0200 Subject: [PATCH 6/9] Cosmetic file changes (#644) * Cosmetic changes Incorporate some cosmetic changes, mainly grammar and wording. Made some small adjustments according to the things mentiond by @mbuechse. --------- Signed-off-by: Hannes Baum --- .../scs-0001-v1-sovereign-cloud-standards.md | 8 ++-- Standards/scs-0002-v2-standards-docs-org.md | 2 +- ...-0003-v1-sovereign-cloud-standards-yaml.md | 6 +-- .../scs-0004-v1-achieving-certification.md | 2 +- Standards/scs-0100-v1-flavor-naming.md | 32 ++++++------- Standards/scs-0100-v2-flavor-naming.md | 16 +++---- Standards/scs-0100-v3-flavor-naming.md | 12 ++--- Standards/scs-0101-v1-entropy.md | 4 +- Standards/scs-0102-v1-image-metadata.md | 12 ++--- Standards/scs-0103-v1-standard-flavors.md | 4 +- Standards/scs-0104-v1-standard-images.md | 2 +- Standards/scs-0110-v1-ssd-flavors.md | 12 ++--- .../scs-0111-v1-volume-type-decisions.md | 26 +++++------ Standards/scs-0112-v1-sonic.md | 16 +++---- ...0113-v1-security-groups-decision-record.md | 16 +++---- Standards/scs-0114-v1-volume-type-standard.md | 8 ++-- ...15-v1-default-rules-for-security-groups.md | 16 +++---- .../scs-0210-v1-k8s-new-version-policy.md | 4 +- Standards/scs-0210-v2-k8s-version-policy.md | 8 ++-- .../scs-0213-v1-k8s-nodes-anti-affinity.md | 46 ++++++++++--------- Standards/scs-0215-v1-robustness-features.md | 2 +- ...requirements-for-testing-cluster-stacks.md | 2 +- ...equirements-for-sso-identity-federation.md | 40 ++++++++-------- Standards/scs-0301-v1-naming-conventions.md | 14 +++--- Standards/scs-0302-v1-domain-manager-role.md | 14 +++--- ...scs-0400-v1-status-page-create-decision.md | 18 ++++---- ...-page-reference-implementation-decision.md | 8 ++-- ...02-v1-status-page-openapi-spec-decision.md | 18 ++++---- ...cs-0403-v1-csp-kaas-observability-stack.md | 12 ++--- ...cs-0410-v1-gnocchi-as-metering-database.md | 2 +- 30 files changed, 192 insertions(+), 190 deletions(-) diff --git a/Standards/scs-0001-v1-sovereign-cloud-standards.md b/Standards/scs-0001-v1-sovereign-cloud-standards.md index c6916151b..911c6a35a 100644 --- a/Standards/scs-0001-v1-sovereign-cloud-standards.md +++ b/Standards/scs-0001-v1-sovereign-cloud-standards.md @@ -18,12 +18,12 @@ It strives for interoperable and sovereign cloud stacks which can be deployed and used by a wide range of organizations and individuals. Wherever feasible, transparency and openness both in respect to the inner workings of the platforms standardised by SCS, -as well as the SCS organisation itself +as well as the SCS organization itself are a paradigm we intend to live. ## Requirements -The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). In addition, "FORBIDDEN" is to be interpreted equivalent to "MUST NOT". @@ -297,7 +297,7 @@ it can be deprecated. Obsoletions SHOULD be announced ahead of their execution by setting the `deprecated_at` field to a future date and moving the `status` to `Deprecated`. This signals current and future implementors -that the subject matter of the document +that the subject of the document is not considered necessary or state of the art anymore. If one or more replacement documents for the document exists, @@ -349,7 +349,7 @@ The advantages of such an approach are: The disadvantages of that approach are: - It is possible to make breaking changes after stabilization. - Potentially, an hypothetical SCS-1234 document might refer to something completely different + Potentially, a hypothetical SCS-1234 document might refer to something completely different in a hypothetical R15 release than what it meant in R5, if there have been sufficient, gradual breaking changes to the document. diff --git a/Standards/scs-0002-v2-standards-docs-org.md b/Standards/scs-0002-v2-standards-docs-org.md index 71583ceef..0a9be93f5 100644 --- a/Standards/scs-0002-v2-standards-docs-org.md +++ b/Standards/scs-0002-v2-standards-docs-org.md @@ -155,7 +155,7 @@ Docusaurus' robust toolkit assists in crafting and maintaining quality documenta #### Special Implementation Details -SCS's unique architecture necessitates a unique approach to documentation. To ensure seamless integration of reference documentation for Components and components developed for SCS, we have created a custom workflow. This workflow automatically syncs upstream repositories, pulling the most recent documentation at regular intervals. +The unique architecture of SCS necessitates a unique approach to documentation. To ensure seamless integration of reference documentation for Components and components developed for SCS, we have created a custom workflow. This workflow automatically syncs upstream repositories, pulling the most recent documentation at regular intervals. We have accomplished this by utilizing a Node.js post-install script found [here](https://github.com/SovereignCloudStack/docs-page/blob/main/getDocs.js). diff --git a/Standards/scs-0003-v1-sovereign-cloud-standards-yaml.md b/Standards/scs-0003-v1-sovereign-cloud-standards-yaml.md index 5db52ab16..43d8219ed 100644 --- a/Standards/scs-0003-v1-sovereign-cloud-standards-yaml.md +++ b/Standards/scs-0003-v1-sovereign-cloud-standards-yaml.md @@ -21,7 +21,7 @@ SCS plans to offer six kinds of certificates with varying scope. These scopes ca - SCS-open - SCS-sovereign 2. _cloud layer_, of which there are two: - - infastructure as a service (IaaS) + - infrastructure as a service (IaaS) - Kubernetes as a service (KaaS) So, for instance, a certificate can have the scope _SCS-compatible IaaS_ or _SCS-sovereign KaaS_. @@ -245,13 +245,13 @@ must be announced 14 days in advance via the corresponding mailing list. ### File format -In order to have a document that can be processed by a wide range of tools, we need to opt for a simple but yet well supported format. +In order to have a document that can be processed by a wide range of tools, we need to opt for a simple but yet well-supported format. YAML offers readability for humans as well as good support by many frameworks. Since YAML is heavily used in the cloud and container domain, the choice is obvious. ### Dependency graph for certifications -This standard only allows exactly one depending certification, otherwise we would need to use a list of mappings. Since this is +This standard only allows depending on exactly one certification, otherwise we would need to use a list of mappings. Since this is in accordance to the current plan of the SIG Standardization & Certification, we can safely ignore multiple dependency of certification for now. diff --git a/Standards/scs-0004-v1-achieving-certification.md b/Standards/scs-0004-v1-achieving-certification.md index 682accfbc..cb6f39a89 100644 --- a/Standards/scs-0004-v1-achieving-certification.md +++ b/Standards/scs-0004-v1-achieving-certification.md @@ -41,7 +41,7 @@ As operator, I want to obtain a certificate with the scope SCS-compatible IaaS o 6. Once the certificate is granted by the SCS certification assessment body, the operator SHOULD use the corresponding logo and publicly state the certified "SCS compatibility" on the respective layer for the time of the validity of the certification. In case of a public cloud, this public display is even REQUIRED. In any case, the logo MUST be accompanied by a hyperlink (a QR code for printed assets) to the respective certificate status page. -7. If the certificate is to be revoked for any reason, it will be included in a publicly available Certificate Revokation List (CRL). This fact will also be reflected in the certificate status page. +7. If the certificate is to be revoked for any reason, it will be included in a publicly available Certificate Revocation List (CRL). This fact will also be reflected in the certificate status page. 8. If any of the automated tests or manual checks fail after the certificate has been issued, the certificate is not immediately revoked. Rather, the automated tests MUST pass 99.x % of the runs, and the operator SHALL be notified at the second failed attempt in a row at the latest. In case a manual check fails, it has to be repeated at a date to be negotiated with SCS. It MAY NOT fail more than two times in a row. diff --git a/Standards/scs-0100-v1-flavor-naming.md b/Standards/scs-0100-v1-flavor-naming.md index cc4c31bfc..d874cdf53 100644 --- a/Standards/scs-0100-v1-flavor-naming.md +++ b/Standards/scs-0100-v1-flavor-naming.md @@ -94,7 +94,7 @@ the lack of workload management that would prevent worst case performance < 20% #### Insufficient microcode Not using these mitigations must be indicated by an additional `i suffix` for insecure -(weak protection against CPU vulns through insufficient microcode, lack of disabled hyperthreading +(weak protection against CPU vulnerabilities through insufficient microcode, lack of disabled hyperthreading on L1TF susceptible CPUs w/o effective core scheduling or disabled protections on the host/hypervisor). #### Examples @@ -299,22 +299,22 @@ The optional `h` suffix to the comput unit count indicates high-performance (e.g high bandwidth gfx memory such as HBM); `h` can be duplicated for even higher performance. -`-ib` indicates Inifinband networking. +`-ib` indicates Infiniband networking. More extensions will be forthcoming. -Extensions need to be specified in the above mentioned order. +Extensions need to be specified in the above-mentioned order. ## Proposal Examples -| Example | Decoding | -| ------------------------- | ----------------------------------------------------------------------------------------------- | -| SCS-2C:4:10n | 2 dedicated cores (x86-64), 4GiB RAM, 10GB network disk | -| SCS-8Ti:32:50p-i1 | 8 dedicated hyperthreads (insecure), Skylake, 32GiB RAM, 50GB local NVMe | -| SCS-1L:1u:5 | 1 vCPU (heavily oversubscribed), 1GiB Ram (no ECC), 5GB disk (unspecific) | -| SCS-16T:64:200s-GNa:64-ib | 16 dedicated threads, 64GiB RAM, 200GB local SSD, Inifiniband, 64 Passthrough nVidia Ampere SMs | -| SCS-4C:16:2x200p-a1 | 4 dedicated Arm64 cores (A78 class), 16GiB RAM, 2x200GB local NVMe drives | -| SCS-1V:0.5 | 1 vCPU, 0.5GiB RAM, no disk (boot from cinder volume) | +| Example | Decoding | +| ------------------------- | ---------------------------------------------------------------------------------------------- | +| SCS-2C:4:10n | 2 dedicated cores (x86-64), 4GiB RAM, 10GB network disk | +| SCS-8Ti:32:50p-i1 | 8 dedicated hyperthreads (insecure), Skylake, 32GiB RAM, 50GB local NVMe | +| SCS-1L:1u:5 | 1 vCPU (heavily oversubscribed), 1GiB Ram (no ECC), 5GB disk (unspecific) | +| SCS-16T:64:200s-GNa:64-ib | 16 dedicated threads, 64GiB RAM, 200GB local SSD, Infiniband, 64 Passthrough nVidia Ampere SMs | +| SCS-4C:16:2x200p-a1 | 4 dedicated Arm64 cores (A78 class), 16GiB RAM, 2x200GB local NVMe drives | +| SCS-1V:0.5 | 1 vCPU, 0.5GiB RAM, no disk (boot from cinder volume) | ## Standard SCS flavors @@ -376,14 +376,14 @@ for usability and easier portability, even beyond the mandated flavors. You must be very careful to expose low vCPU guarantees (`L` instead ov `V`), insecure hyperthreading/microcode `i`, non-ECC-RAM `u`, memory oversubscription `o`. Note that omitting these qualifiers is overstating your security, reliability or performance properties and may be reason for -clients to feel betrayed or claim damages. It might in extreme cases also cause SCS to withdraw certification +clients to feel betrayed or claim damages. It might, in extreme cases, also cause SCS to withdraw certification along with public statements. -You may offer additional SCS- flavors, following the naming scheme outlined here. +You may offer additional `SCS-` flavors, following the naming scheme outlined here. You may offer additional flavors, not following above scheme. -You must not offer flavors with the SCS- prefix which do not follow this naming scheme. +You must not offer flavors with the `SCS-` prefix which do not follow this naming scheme. You must not extend the SCS naming scheme with your own suffices; you are encouraged however to suggest extensions that we can discuss and add to the official scheme. @@ -434,8 +434,8 @@ on the flavor list compliance of the cloud environment. Some providers might offer VM services ("IaaS") without trying to adhere to SCS standards, yet still finding the flavor naming standards useful. The Gaia-X Technical Committee's -Provider Working Group (WG) would seem like a logical place for such dicussions then. +Provider Working Group (WG) would seem like a logical place for such discussions then. If so, we could -replace the SCS- prefix with a GX- prefix and transfer the naming scheme governance from +replace the `SCS-` prefix with a GX- prefix and transfer the naming scheme governance from the SCS project to the Gaia-X Provider WG (where we participate). SCS certification would then reference the Gaia-X flavor naming standard as a requirement. diff --git a/Standards/scs-0100-v2-flavor-naming.md b/Standards/scs-0100-v2-flavor-naming.md index ef18ef161..38d405828 100644 --- a/Standards/scs-0100-v2-flavor-naming.md +++ b/Standards/scs-0100-v2-flavor-naming.md @@ -40,8 +40,8 @@ Note that not all relevant properties of flavors can be discovered; creating a s to address this is a separate but related effort to the name standardization. Commonly used infrastructure-as-code tools do not provide a way to use discoverability features to express something like "I want a flavor with 2 vCPUs, 8GiB of RAM, a local -20GB SSD disk and Infiniband support but I don't care whether it's AMD or intel" in a -reasonable manner. Using flavor names to express this will thus continue to be useful +20GB SSD disk and Infiniband support, but I don't care whether it's AMD or intel" in a +reasonable manner. Using flavor names to express this will thus continue to be useful, and we don't expect the need for standardization of flavor names to go away until the commonly used IaC tools work on a higher abstraction layer than they currently do. @@ -75,7 +75,7 @@ encoding all details) as well as very detailed longer names. | `SCS-` | N`L/V/T/C`\[`i`\] | `-`N\[`u`\]\[`o`\] | \[`-`\[M`x`\]N\[`n/s/l/p`\]\] | \[`_`EXT\] | Note that `N` and `M` are placeholders for numbers here. -The optional fields are denoted in brackets (and have opt: in the header. +The optional fields are denoted in brackets (and have `opt:` in the header). See below for extensions. Note that all letters are case-sensitive. @@ -123,7 +123,7 @@ the lack of workload management that would prevent worst case performance < 20% #### Insufficient microcode Not using these mitigations must be indicated by an additional `i` suffix for insecure -(weak protection against CPU vulns through insufficient microcode, lack of disabled hyperthreading +(weak protection against CPU vulnerabilities through insufficient microcode, lack of disabled hyperthreading on L1TF susceptible CPUs w/o effective core scheduling or disabled protections on the host/hypervisor). #### Examples @@ -142,7 +142,7 @@ on L1TF susceptible CPUs w/o effective core scheduling or disabled protections o Cloud providers should use ECC memory. Memory oversubscription should not be used. -It is allowed to specify half GiBs (e.g. 3.5), though this is should not be done for larger memory sizes (>= 10GiB). +It is allowed to specify half GiBs (e.g. 3.5), though this should not be done for larger memory sizes (>= 10GiB). #### No ECC @@ -317,9 +317,9 @@ create all standard, mandatory SCS flavors for you. ## Extensions Extensions provide a possibility for providers that offer a very differentiated set -of flavors to indicate hypervisors, support for hardware/nested virtuatlization, +of flavors to indicate hypervisors, support for hardware/nested virtualization, CPU types and generations, high-frequency models, GPU support and GPU types as -well as Inifiniband support. (More extensions may be appended in the future.) +well as Infiniband support. (More extensions may be appended in the future.) Using the systematic naming approach ensures that two providers that offer flavors with the same specific features will use the same name for them, thus simplifying @@ -465,7 +465,7 @@ high bandwidth gfx memory such as HBM); More extensions may be forthcoming and appended in a later revision of this spec. -Extensions need to be specified in the above mentioned order. +Extensions need to be specified in the above-mentioned order. ### Naming options advice diff --git a/Standards/scs-0100-v3-flavor-naming.md b/Standards/scs-0100-v3-flavor-naming.md index 990814aae..ce09dd0ee 100644 --- a/Standards/scs-0100-v3-flavor-naming.md +++ b/Standards/scs-0100-v3-flavor-naming.md @@ -41,8 +41,8 @@ Note that not all relevant properties of flavors can be discovered; creating a s to address this is a separate but related effort to the name standardization. Commonly used infrastructure-as-code tools do not provide a way to use discoverability features to express something like "I want a flavor with 2 vCPUs, 8GiB of RAM, a local -20GB SSD disk and Infiniband support but I don't care whether it's AMD or intel" in a -reasonable manner. Using flavor names to express this will thus continue to be useful +20GB SSD disk and Infiniband support, but I don't care whether it's AMD or intel" in a +reasonable manner. Using flavor names to express this will thus continue to be useful, and we don't expect the need for standardization of flavor names to go away until the commonly used IaC tools work on a higher abstraction layer than they currently do. @@ -76,7 +76,7 @@ encoding all details) as well as very detailed longer names. | `SCS-` | N`L/V/T/C`\[`i`\] | `-`N\[`u`\]\[`o`\] | \[`-`\[M`x`\]N\[`n/h/s/p`\]\] | \[`_`EXT\] | Note that N and M are placeholders for numbers here. -The optional fields are denoted in brackets (and have opt: in the header. +The optional fields are denoted in brackets (and have `opt:` in the header). See below for extensions. Note that all letters are case-sensitive. @@ -131,7 +131,7 @@ the lack of workload management that would prevent worst case performance < 20% #### Insufficient microcode Not using these mitigations must be indicated by an additional `i` suffix for insecure -(weak protection against CPU vulns through insufficient microcode, lack of disabled hyperthreading +(weak protection against CPU vulnerabilities through insufficient microcode, lack of disabled hyperthreading on L1TF susceptible CPUs w/o effective core scheduling or disabled protections on the host/hypervisor). #### Examples @@ -150,7 +150,7 @@ on L1TF susceptible CPUs w/o effective core scheduling or disabled protections o Cloud providers should use ECC memory. Memory oversubscription should not be used. -It is allowed to specify half GiBs (e.g. 3.5), though this is should not be done for larger memory sizes (>= 10GiB). +It is allowed to specify half GiBs (e.g. 3.5), though this should not be done for larger memory sizes (>= 10GiB). #### No ECC @@ -541,7 +541,7 @@ However, we have been reaching out to the OpenStack Public Cloud SIG and the ALA members to seek further alignment. Getting upstream OpenStack support for flavor aliases would provide more flexibility -and ease migrations between providers, also providers that don't offer the SCS- +and ease migrations between providers, also providers that don't offer the `SCS-` flavors. We also would like to see upstream `extra_specs` standardizing the discoverability of some diff --git a/Standards/scs-0101-v1-entropy.md b/Standards/scs-0101-v1-entropy.md index 7b9a10744..2b719079f 100644 --- a/Standards/scs-0101-v1-entropy.md +++ b/Standards/scs-0101-v1-entropy.md @@ -52,7 +52,7 @@ a HRNG, they are not treated as such by the kernel, i.e., they _do not_ appear as `/dev/hwrng`! The Linux kernel combines multiple sources of entropy into a pool. To this -end, it will use all of the sources discussed so far with one exception: +end, it will use all the sources discussed so far with one exception: the HRNG must be fed into the pool (if so desired) via the daemon `rngd`. The kernel converts the entropy from the pool into cryptographically secure random numbers that appear under `/dev/random` and `/dev/urandom`. @@ -78,7 +78,7 @@ be used to feed it into the kernel's entropy pool. On a side note, the kernel exposes available HRNGs via the special directory `/sys/devices/virtual/misc/hw_random`. In particular, the -file `rng_available` lists availabe HRNGs while the file `rng_current` +file `rng_available` lists available HRNGs while the file `rng_current` contains the HRNG currently used. In summary, with current kernels and CPUs entropy in virtual instances diff --git a/Standards/scs-0102-v1-image-metadata.md b/Standards/scs-0102-v1-image-metadata.md index 907da3751..a861591e0 100644 --- a/Standards/scs-0102-v1-image-metadata.md +++ b/Standards/scs-0102-v1-image-metadata.md @@ -16,7 +16,7 @@ description: | ## Motivation Many clouds offer standard Operating System images for their users' convenience. -To make them really useful, they should contain meta data (properties) to allow +To make them really useful, they should contain metadata (properties) to allow users to understand what they can expect using these images. The specification is targeting images that are managed by the service provider, @@ -164,13 +164,13 @@ The provider makes an effort to replace images upon critical security issues out - Mandatory: `image_source` needs to be a URL to point to a place from which the image can be downloaded. (Note: This may be set to the string "private" to indicate that the image can not be freely downloaded.) -- Mandatory: `image_description` needs to be an URL (or text) with release notes and other human readable +- Mandatory: `image_description` needs to be a URL (or text) with release notes and other human-readable data about the image. - Recommended _tag_: `managed_by_VENDOR` Note that for most images that come straight from an upstream source, `image_description` should point -to a an upstream web page where these images are described. If download links are available as well +to an upstream web page where these images are described. If download links are available as well on that page, `image_source` can point to the same page, otherwise a more direct link to the image should be used, e.g. directly linking the `.qcow2` or `.img` file. If providers have their own image building machinery or do some post-processing on top of @@ -187,7 +187,7 @@ upstream images, they should point to the place where they document and offer th the patch status. - Mandatory: `image_original_user` is the default login user for the operating system which can connect to the image via the injected SSH key or provided password. (This can be set to `none` if no default - user name exists for the operating system.) + username exists for the operating system.) - Optional: `patchlevel` can be set to an operating specific patch level that describes the patch status — typically we would expect the `image_build_date` to be sufficient. @@ -208,10 +208,10 @@ might not use any of these properties, except maybe `maintained_until`. Note tha Windows images would typically require `license_included`, `subscription_included`. A boolean property that is not present is considered to be `false`. -- Optional: `license_included` (boolean) indicates whether or not the flavor fee +- Optional: `license_included` (boolean) indicates whether the flavor fee includes the licenses required to use this image. This field is mandatory for images that contain software that requires commercial licenses. -- Optional: `license_required` (boolean) indicates whether or not a customer must bring +- Optional: `license_required` (boolean) indicates whether a customer must bring its own license to be license compliant. This can not be true at the same time as the previous setting. This field is mandatory IF customers need to bring their own license to use the image. diff --git a/Standards/scs-0103-v1-standard-flavors.md b/Standards/scs-0103-v1-standard-flavors.md index 46c4b7c7c..b29678581 100644 --- a/Standards/scs-0103-v1-standard-flavors.md +++ b/Standards/scs-0103-v1-standard-flavors.md @@ -34,8 +34,8 @@ The following extra specs are recognized, together with the respective semantics measured over the course of one month (1% is 7,2 h/month). The `cpu-type=shared-core` corresponds to the `V` cpu modifier in the [flavor-naming spec](./scs-0100-v3-flavor-naming.md), other options are `crowded-core` (`L`), `dedicated-thread` (`T`) and `dedicated-core` (`C`). -- `scs:diskN-type=ssd` (where `N` is a nonnegative integer, usually `0`) means that the - root disk `N` must support 1000 _sequential_ IOPS per VM and it must be equipped with +- `scs:diskN-type=ssd` (where `N` is a non-negative integer, usually `0`) means that the + root disk `N` must support 1000 _sequential_ IOPS per VM, and it must be equipped with power-loss protection; see [scs-0110-v1-ssd-flavors](./scs-0110-v1-ssd-flavors.md). The `disk`N`-type=ssd` setting corresponds to the `s` disk modifier, other options are `nvme` (`p`), `hdd` (`h`) and `network` (`n`). Only flavors without disk and diff --git a/Standards/scs-0104-v1-standard-images.md b/Standards/scs-0104-v1-standard-images.md index af845a271..d5cbbc4c6 100644 --- a/Standards/scs-0104-v1-standard-images.md +++ b/Standards/scs-0104-v1-standard-images.md @@ -107,7 +107,7 @@ The YAML file is generally located under [https://github.com/SovereignCloudStack/standards/blob/main/Tests/iaas/](https://github.com/SovereignCloudStack/standards/blob/main/Tests/iaas/). Any change that could render existing installations non-conformant (i.e., when new -specifications are added, when the name scheme of a specification is changed so as to +specifications are added, when the name scheme of a specification is changed to match more names than before, when the status of an existing specification changes to mandatory, or when some source prefix is removed) requires a new YAML file to be created. As a consequence, any currently valid certificates stay valid; the change can only take diff --git a/Standards/scs-0110-v1-ssd-flavors.md b/Standards/scs-0110-v1-ssd-flavors.md index 8819b143a..2a8716743 100644 --- a/Standards/scs-0110-v1-ssd-flavors.md +++ b/Standards/scs-0110-v1-ssd-flavors.md @@ -62,7 +62,7 @@ requires write latencies in the range of a single-digit ms (or better). #### One-node etcd (backed by redundant storage) -If k8s uses only one control plane node, there will only be only one etcd node, +If k8s uses only one control plane node, there will only be one etcd node, avoiding timed out heartbeats. Single node control planes are typically not recommended for production workloads though. They are limited with respect to control plane performance, have a higher chance to fail (as a single node failure @@ -107,7 +107,7 @@ which is not typically critical if done within reasonable limits. This change however does not fully address the issue — occasional write latencies above 100ms will still cause failed heartbeats, just less often. -This change has been implemented in SCS's +This change has been implemented in the [k8s-cluster-api-provider](https://etcd.io/docs/v3.5/op-guide/hardware/#example-hardware-configurations) reference implementation: The heartbeat has been changed from 1/100ms (10/s) to 1/250ms (4/s) and the reelection timeout from 1s to 2.5s. @@ -145,7 +145,7 @@ written out. Recovery from such a scenario can range from smooth to impossible. In a multi-node cluster, this may not be as bad as it sounds — if only one node is affected by a disruption, the crashed node can be recovered by resyncing the data from other nodes. In practice an inconsistent state would be considered -too risky and it should be preferred to set up a fresh node to join the +too risky, and it should be preferred to set up a fresh node to join the existing etcd cluster. This would need to be implemented to make this option less risky. @@ -222,9 +222,9 @@ Disk IO QoS is not part of this spec but may be considered in another one. Live-migration with local storage is significantly more difficult than with networked storage: The contents of the local disks also need to be replicated over to the new host. Live-migration for these VMs may thus take significantly -longer or not be possible at all, depending the configuration from the provider. +longer or not be possible at all, depending on the configuration from the provider. Not supporting live-migration is OK for flavors with local disks according -to the flavor naming spec — a capability to indicate whether or not +to the flavor naming spec — a capability to indicate whether live-migration is supported will be subject to a flavor-metadata discoverability spec that is planned for the future. @@ -252,7 +252,7 @@ to solve the latency requirements for databases and etcd may emerge. When we standardize QoS features there, we may amend this standard with QoS recommendations or possibly requirements. -A future flavor metadata discoverability standard will indicate whether or not +A future flavor metadata discoverability standard will indicate whether these flavors can be live-migrated. A future VM metadata standard will allow users to request live-migration and/or cold migration or restart to be or to not be performed. diff --git a/Standards/scs-0111-v1-volume-type-decisions.md b/Standards/scs-0111-v1-volume-type-decisions.md index 9b4ebf08c..28fc32e8b 100644 --- a/Standards/scs-0111-v1-volume-type-decisions.md +++ b/Standards/scs-0111-v1-volume-type-decisions.md @@ -9,27 +9,27 @@ track: IaaS Volumes in OpenStack are virtual drives. They are managed by the storage service Cinder, which abstracts creation and usage of many different storage backends. While it is possible to use a backend like lvm which can reside on the same host as the hypervisor, the SCS wants to make a more clear differentiation between volumes and the ephemeral storage of a virtual machine. For all SCS deployments we want to assume that volumes are always residing in a storage backend that is NOT on the same host as a hypervisor - in short terms: Volumes are network storage. Ephemeral storage on the other hand is the only storage residing on a compute host. It is created by creating a VM directly from an Image and is automatically los as soon as the VM cease to exist. Volumes on the other hand have to be created from Images and only after that can be used for VMs. They are persistent and will remain in the last state a VM has written on them before they cease to exit. Being persistent and not relying on the host where the VM resides, Volumes can easily be attached to another VM in case of a node outage and VMs be migrated way more easily, because only metadata and data in RAM has to be shifted to another host, accelerating any migration or evacuation of a VM. -Volume Types are used to classify volumes and provide a basic decision for what kind of volume should be created. These volume types can sometimes very be backend-specific and it might be hard for a user to choose the most suitable volume type, if there is more than one default type. Nevertheless the most of configuration is done in the backends themself, so volume types only work as a rough classification. +Volume Types are used to classify volumes and provide a basic decision for what kind of volume should be created. These volume types can sometimes very be backend-specific, and it might be hard for a user to choose the most suitable volume type, if there is more than one default type. Nevertheless, most of the configuration is done in the backends themselves, so volume types only work as a rough classification. ## Motivation -We want to standardize a few varieties of volume types. While a user can choose simple things like size when creating a volume, Volume Types define a few broader aspects of volume. Encryption of volumes for example is solely decided by the volume type. And whether the volume will be replicated is a mix between definiton in the volume type and backend specific configuration, but it's visiblity can only be reached in the volume type. +We want to standardize a few varieties of volume types. While a user can choose simple things like size when creating a volume, Volume Types define a few broader aspects of volume. Encryption of volumes for example is solely decided by the volume type. And whether the volume will be replicated is a mix between definition in the volume type and backend specific configuration, but it's visibility can only be reached in the volume type. -In General: what the different volume types are capable of is highly dependend on both the used backend and the configurations of OpenStack. A few options are worth being at least recommended. +In General: what the different volume types are capable of is highly dependent on both the used backend and the configurations of OpenStack. A few options are worth being at least recommended. ## Design Considerations We want to have a discoverable Standard. So there should be no naming conventions as per request by operators. -This first decision will have impacts on upstream OpenStack development, as those things, that would be nice to discover, may not be currently dicoverable by users or not at all. +This first decision will have impacts on upstream OpenStack development, as those things, that would be nice to discover, may not be currently discoverable by users or not at all. -There are severel aspects of volume types, which will be discussed in the following: +There are several aspects of volume types, which will be discussed in the following: ### Options considered #### Encryption -Encryption for volumes is an option which has to be configured within the volume type. As an admin it is possible to set encryption-provider, key size, cipher and control location. As an admin it is also currently possible to see these configurations in a volume type with list and show commands. A user should not see these parameters in detail, but a boolean value that descibes whether encryption is used or not. Currently this is not possible in upstream OpenStack. +Encryption for volumes is an option which has to be configured within the volume type. As an admin it is possible to set encryption-provider, key size, cipher and control location. As an admin it is also currently possible to see these configurations in a volume type with list and show commands. A user should not see these parameters in detail, but a boolean value that describes whether encryption is used or not. Currently, this is not possible in upstream OpenStack. **Conclusion**: This is a solid aspect to be standardized. But it will need work on OpenStack, to have a boolean value presented to the users. @@ -41,7 +41,7 @@ OpenStack Cinder works with a lot of different backends. They all have some kind #### Availability Zones -Availability Zones are used in Nova and Cinder seperatly to provide an often also physical separation of compute hosts or storage nodes. This leads to two options to consider: +Availability Zones are used in Nova and Cinder separately to provide an often also physical separation of compute hosts or storage nodes. This leads to two options to consider: 1. Multiple Volume AZs: This might be used if there are different backends present in one IaaS structure. The different volume types are usually used for the different volume AZs. This makes migration between those AZs only be possible for administrators. @@ -49,24 +49,24 @@ Availability Zones are used in Nova and Cinder seperatly to provide an often als Another question is how many providers use one of these options or both. -**Conclusion**: The first part doesn't make much sense to standardize, as migration between the volume types can only be done by admins. However the second part might be noteable, but due to the variety of configuration options very hard to standardize. +**Conclusion**: The first part doesn't make much sense to standardize, as migration between the volume types can only be done by admins. However, the second part might be noteable, but due to the variety of configuration options very hard to standardize. #### Multiattach -It is possible in a few backends to attach a volume to multiple VMs. This has to be configured in the Volume Type and this information is also accessable for users. Nevertheless this option also needs a lot of work from users, as those types of volumes have to have a file system, that is capable of multiattach. Many providers don't provide multiattach. +It is possible in a few backends to attach a volume to multiple VMs. This has to be configured in the Volume Type and this information is also accessible for users. Nevertheless, this option also needs a lot of work from users, as those types of volumes have to have a file system, that is capable of multiattach. Many providers don't provide multiattach. **Conclusion**: It might be noteable, that this already is a discoverable option. #### Replication -Replication states, whether or not there are multiple replicas of a volume. Thus answers the question, whether the data could survive a node outage. Again there are different ways to achive replicated volumes. It can either be defined in the volume type and is discoverable also by normal users or it is configured in the backend. The last option is usually used with ceph for example. This makes it hard to discover, whether a volume is replicated or not. Another point is the number of replicas, that exist. +Replication states, whether there are multiple replicas of a volume. Thus answers the question, whether the data could survive a node outage. Again there are different ways to achieve replicated volumes. It can either be defined in the volume type and is discoverable also by normal users, or it is configured in the backend. The last option is usually used with ceph for example. This makes it hard to discover, whether a volume is replicated or not. Another point is the number of replicas, that exist. -**Conclusion**: Replication is a good option to be standardized. Whether this should be done as a boolean option or if the number of replicas is also something users need to know should still be discussed. Nevertheless due to the different options to configure replication this will be quite complex. +**Conclusion**: Replication is a good option to be standardized. Whether this should be done as a boolean option or if the number of replicas is also something users need to know should still be discussed. Nevertheless, due to the different options to configure replication this will be quite complex. #### QoS -Quality of Service parameters can be stated in a volume qos object. These objects can then be associated to a volume type (or directly to a volume as an admin only option). But this is optional and thus even good or very good volume QoS parameters that are aquired through hardware configuration and storage parameters, might go by unmentioned. -Furthermore the indirection makes it harder to discover the qos for a volume type. Only admins will see the associated qos ID and will have to take a closer look at the qos after discovering the volume type. PLUS: there can only be one qos association for one volume type. But a qos can be used for multiple volumes. +Quality of Service parameters can be stated in a volume qos object. These objects can then be associated to a volume type (or directly to a volume as an admin only option). But this is optional and thus even good or very good volume QoS parameters that are acquired through hardware configuration and storage parameters, might go by unmentioned. +Furthermore, the indirection makes it harder to discover the qos for a volume type. Only admins will see the associated qos ID and will have to take a closer look at the qos after discovering the volume type. PLUS: there can only be one qos association for one volume type. But a qos can be used for multiple volumes. **Conclusion**: The benefit of displaying qos parameters is clear, thus this option should be noted. But are volume qos objects widely used? If not, standardization process would be too much work. diff --git a/Standards/scs-0112-v1-sonic.md b/Standards/scs-0112-v1-sonic.md index 06ac03c69..6bb00fda4 100644 --- a/Standards/scs-0112-v1-sonic.md +++ b/Standards/scs-0112-v1-sonic.md @@ -9,7 +9,7 @@ description: | ## Introduction -SONiC support in [SCS](https://scs.community) was considered within the context of [VP04 Networking](https://scs.community/tenders/lot4), sub-lot 1 SDN scalability. Different challenges and approaches to SDN scalability have been explored and more specifically those who require support in the underlay network. Using SONiC in the underlay can have benefits for SCS users by using a standardized OS for network devices and also having a clear path for network scalability when using SONiC. For this to work, we have to define how SONiC is used and supported architecturally in SCS. This document outlines the architectural decisions in regards to SONiC support and integration in SCS. +SONiC support in [SCS](https://scs.community) was considered within the context of [VP04 Networking](https://scs.community/tenders/lot4), sub-lot 1 SDN scalability. Different challenges and approaches to SDN scalability have been explored and more specifically those who require support in the underlay network. Using SONiC in the underlay can have benefits for SCS users by using a standardized OS for network devices and also having a clear path for network scalability when using SONiC. For this to work, we have to define how SONiC is used and supported architecturally in SCS. This document outlines the architectural decisions in regard to SONiC support and integration in SCS. ## Motivation @@ -19,7 +19,7 @@ In respect to SDN scalability improvements in Openstack and SCS, there are sever In many network designs for Openstack, configuration of the actual network hardware by Openstack Neutron service is required. The following network designs apply: -- VLANs. Uisng VLANs to segment tenant networks requires the network switch to be configured. This can be manual or dynamic configuration via the ML2 Neutron driver. +- VLANs. Using VLANs to segment tenant networks requires the network switch to be configured. This can be manual or dynamic configuration via the ML2 Neutron driver. - EVPN/VXLAN on the switch. In this use case, SONiC runs on leaf switches. Leafs terminate VXLAN endpoints and run BGP/EVPN for the control plane. Again, the ML2 Neutron driver is used to dynamically configure the network switch. The link between the switch and the service is regular VLAN. @@ -41,19 +41,19 @@ There are different ways SONiC support can be implemented in SCS, very similar t #### Option 1: SCS distribution of SONiC -With this approach, SCS will create it's own distribution of SONiC, similar to what Debian or Arch are for Linux. This distribution will be based on the SONiC community distribution, but will have SCS specific modules, which will be developed and maintained by SCS. SCS will contribute its code to dedicated SCS repositories and build its own SONiC images. The code can eventually be pushed upstream, but not as top priority. This approach will allow SCS to have a clear path for SONiC support and integration in SCS, but will also require SCS to maintain a distribution of SONiC, which is a significant effort. Upstream/downstream changes will have to be managed and maintained. However the advantage is that SCS will have full control over the distribution and can make changes as needed. Users will have to use the SCS distribution of SONiC, which will be based on the community distribution. If users already deploy community or enterprise SONiC, a migration path to SCS SONiC will be needed. +With this approach, SCS will create its own distribution of SONiC, similar to what Debian or Arch are for Linux. This distribution will be based on the SONiC community distribution, but will have SCS specific modules, which will be developed and maintained by SCS. SCS will contribute its code to dedicated SCS repositories and build its own SONiC images. The code can eventually be pushed upstream, but not as top priority. This approach will allow SCS to have a clear path for SONiC support and integration in SCS, but will also require SCS to maintain a distribution of SONiC, which is a significant effort. Upstream/downstream changes will have to be managed and maintained. However, the advantage is that SCS will have full control over the distribution and can make changes as needed. Users will have to use the SCS distribution of SONiC, which will be based on the community distribution. If users already deploy community or enterprise SONiC, a migration path to SCS SONiC will be needed. #### Option 2: SCS will support SONiC but will not change it -SCS supports enterprise ans community versions of SONiC but will not develop its own code for it. This will significantly limit the ability to develop new features for SDN, because all changes will be done in the tooling around SONiC and not in the OS itself. The advantages are that SCS will still improve SONiC support and will have minimal effort for this. The downside is that some features like OVN control plane for SONiC will not be possible. +SCS supports enterprise and community versions of SONiC but will not develop its own code for it. This will significantly limit the ability to develop new features for SDN, because all changes will be done in the tooling around SONiC and not in the OS itself. The advantages are that SCS will still improve SONiC support and will have minimal effort for this. The downside is that some features like OVN control plane for SONiC will not be possible. #### Option 3: SCS develops SCS-specific modules as add-on for any SONiC (Community or Enterprise) -In option 3, SCS will change SONiC by releasing its own modules for it. Those module can be provided as add-ons and installed on top of any version, community or enterprise. While compatability between the modules the SONiC releases will need to be maintained, there will be much broader support for SONiC and users will be able to pick and chose distributions based on their existing relationships and experience and use SCS independent of this. In cases where SCS provides contributions to core SONiC, those can be made in upstream Community repositories, so that the whole community including the propriatory vendors can adopt them eventually. +In option 3, SCS will change SONiC by releasing its own modules for it. Those module can be provided as add-ons and installed on top of any version, community or enterprise. While compatibility between the modules the SONiC releases will need to be maintained, there will be much broader support for SONiC and users will be able to pick and chose distributions based on their existing relationships and experience and use SCS independent of this. In cases where SCS provides contributions to core SONiC, those can be made in upstream Community repositories, so that the whole community including the propitiatory vendors can adopt them eventually. #### Option 4: SCS does not adopt SONiC at all -This option entails no dedicated effort on SCS's part in supporting SONiC network equipement for it's users and software stack. Users can still use SONiC from what is available by other projects or if they invest the effort themselves. This has several disadvantages: +This option entails no dedicated effort on SCS's part in supporting SONiC network equipment for its users and software stack. Users can still use SONiC from what is available by other projects or if they invest the effort themselves. This has several disadvantages: - SCS is not contributing to the SONiC community - the value for SCS by users who already use or plan to invest in SONiC is diminished @@ -76,7 +76,7 @@ Multiple vendor distributions. Expensive in general New tags appears on different periods, once 2 times per month, other 3 months between releases. -- adoption penetration - how many vendors use it? What type of venders (big, medium and large)? +- adoption penetration - how many vendors use it? What type of vendors (big, medium and large)? Good initial adoption: Microsoft, Target. Adoption requires time and money @@ -90,7 +90,7 @@ The SONiC community is healthy and growing, however progress is slower due to fa ## Decision -IaaS team recommends to use Option 3: SCS develops SCS-specific modules as add-on for any SONiC (Community or Enterprise). It has the best tradeoff between time and resource investment and benefits for the community. Adopting this strategy would allow SCS to be agile and quickly adopt SONiC, by providing users with clear path while allowing the freedom to chose different hardware and software vendors. SCS code can be packaged independently of each SONiC distribution and installed as add-on. Also SCS contributions to core SONiC will be done directly upstream, so that the whole community can benefit from them. +IaaS team recommends to use Option 3: SCS develops SCS-specific modules as add-on for any SONiC (Community or Enterprise). It has the best tradeoff between time and resource investment and benefits for the community. Adopting this strategy would allow SCS to be agile and quickly adopt SONiC, by providing users with clear path while allowing the freedom to choose different hardware and software vendors. SCS code can be packaged independently of each SONiC distribution and installed as add-on. Also, SCS contributions to core SONiC will be done directly upstream, so that the whole community can benefit from them. Work on hardware support in SONiC should be raised in upstream and SCS shouldn't make significant investments in this area. diff --git a/Standards/scs-0113-v1-security-groups-decision-record.md b/Standards/scs-0113-v1-security-groups-decision-record.md index 3b7c3c11c..dae926566 100644 --- a/Standards/scs-0113-v1-security-groups-decision-record.md +++ b/Standards/scs-0113-v1-security-groups-decision-record.md @@ -40,7 +40,7 @@ By design of OpenStack and when not changed, default rules in the default securi ### Reasons for and against a standard for security groups -Considering having most likely similiar security groups within different projects, it might make sense to standardize a few security groups for often used cases like ssh, http, https and maybe icmp. +Considering having most likely similar security groups within different projects, it might make sense to standardize a few security groups for often used cases like ssh, http, https and maybe icmp. What speaks for standardizing a certain set of security groups: 1. Having a set of correctly configured security groups could reduce misconfiguration from users @@ -53,7 +53,7 @@ What are the downsides of having a set of standardized security groups: 1. A bug or misconfiguration is a single point of failure for ALL customers 2. Users might apply the wrong security group to their port or VM because they lack the domain knowledge, unknowingly opening themselves to attacks 3. Users will not inspect such default security groups: this may result in applying a wrong group and opening traffic too much -4. the central authority managing the groups does not necessarily know the usecase of the user, the user/operator must know best what kind of security their workload needs. What is a necessary port for 99% of deployments might be a security disaster for my deployment +4. the central authority managing the groups does not necessarily know the use case of the user, the user/operator must know best what kind of security their workload needs. What is a necessary port for 99% of deployments might be a security disaster for my deployment 5. Providing default groups could have the effect of stopping customers to think about their specific security needs and instead just copying default groups and or rules This leads to a conclusion, that a set of default security groups is only more valuable than harmful for users: @@ -91,12 +91,12 @@ stack@devstack:~/devstack$ openstack default security group rule list ``` Those rules can be edited, which may pose a security risk for customers consuming the default security group. -This should be adressed as a pre-requirement [here](https://github.com/SovereignCloudStack/standards/issues/521). +This should be addressed as a pre-requirement [here](https://github.com/SovereignCloudStack/standards/issues/521). ### Option 1: operator usage of network rbac -The `network rbac` endpoint[^2] manages the possibitity to share and access certain network-specific resources such as security groups. -For admins it is possible to use this endpoint to share a security group with ALL projects within the the cloud including ALL projects of ALL domains: +The `network rbac` endpoint[^2] manages the possibility to share and access certain network-specific resources such as security groups. +For admins, it is possible to use this endpoint to share a security group with ALL projects within the cloud including ALL projects of ALL domains: ```bash stack@devstack:~/devstack$ openstack network rbac create --target-all-projects --action access_as_shared --type security_group group-for-everyone @@ -112,7 +112,7 @@ stack@devstack:~/devstack$ openstack network rbac create --target-all-projects - +-------------------+--------------------------------------+ ``` -This would fulfill our goal to grant access to predefined security groups for all projects and all groups recieved as shared do not count into the projects quota for security groups. +This would fulfill our goal to grant access to predefined security groups for all projects and all groups received as shared do not count into the projects quota for security groups. But there are a few downsides to this: 1. This should be strictly bound to the admin: no other user should be able to share security groups so to not confuse user. @@ -158,7 +158,7 @@ The biggest downside: As soon as a security group is shared, everyone from every Using and adhering the project scope of the security groups has the consequence, that: 1. either an admin has to set up security groups for each project -2. or the SCS project only provides a guide on how to setup and use some recommended security groups. +2. or the SCS project only provides a guide on how to set up and use some recommended security groups. As users are allowed to, will and should edit their security groups, there is no way to ensure, that a certain set of security groups with certain rules is always present in a project. So packing an extra burden on admins is unreasonable. @@ -174,7 +174,7 @@ That would include identifying what kind of network permission a single VM needs The default Security Group Rules should be standardized as a pre-requirement (Option 0). Using the `network rbac` endpoint (Option 1) would not solve the idea of having pre-defined and administrator audited Security Groups, because it is possible for any user to edit the rules of shared Security Groups. -Instead the project-scope of the Security Groups should by focused and a guide prepared, that gives insight about creating and using Security Groups with a few examples but with a clear security focus (Mix of Option 2 and 3). +Instead, the project-scope of the Security Groups should by focused and a guide prepared, that gives insight about creating and using Security Groups with a few examples but with a clear security focus (Mix of Option 2 and 3). ## Consequences diff --git a/Standards/scs-0114-v1-volume-type-standard.md b/Standards/scs-0114-v1-volume-type-standard.md index 630ee63ec..9ed0d730c 100644 --- a/Standards/scs-0114-v1-volume-type-standard.md +++ b/Standards/scs-0114-v1-volume-type-standard.md @@ -7,7 +7,7 @@ track: IaaS ## Introduction -A volume is a virtual drive that is to be used by an instance (i. e., a virtual machine). With OpenStack, +A volume is a virtual drive that is to be used by an instance (i.e., a virtual machine). With OpenStack, each volume is created per a type that determines basic features of the volume as provided by the backend, such as encryption, replication, or quality of service. As of the writing of this document, presence or absence of these features can not be discovered with full certainty by non-privileged users via the OpenStack API. @@ -37,11 +37,11 @@ All considerations can be looked up in detail in the [Decision Record for the Vo ### Systematic Description of Volume Types -To test whether a deployment has volume types with certain aspects, the discoverability of the parameters in the volume type has to be given. As for the time this standard is created, there is no way for users to discover all aspects through OpenStack commands. Therefore the aspects, that are fulfilled within a volume type, should be stated in the beginning of the **description** of a volume type in the following manner: +To test whether a deployment has volume types with certain aspects, the discoverability of the parameters in the volume type has to be given. As for the time this standard is created, there is no way for users to discover all aspects through OpenStack commands. Therefore, the aspects, that are fulfilled within a volume type, should be stated in the beginning of the **description** of a volume type in the following manner: `[scs:aspect1, aspect2, ..., aspectN]...` -The mentioned aspects MUST be sorted alphebetically and every aspect should only be mentioned to the maximal amount of one. +The mentioned aspects MUST be sorted alphabetically and every aspect should only be mentioned to the maximal amount of one. ### Standardized Aspects @@ -93,7 +93,7 @@ openstack volume type show LUKS ### Replication -Replication states whether or not there are multiple replicas of a volume. Thus, it answers the question whether the data could survive a node outage. Unfortunately there are two ways replication can be achieved: +Replication states whether or not there are multiple replicas of a volume, i.e., whether the data could survive a node outage. Unfortunately, there are two ways replication can be achieved: 1. In the configuration of a volume type. It then is visible as extra_spec in the properties of a volume type. 2. Via the used backend. Ceph for example provides automatic replication, that does not need to be specified in the volume type. This is currently not visible for users. diff --git a/Standards/scs-0115-v1-default-rules-for-security-groups.md b/Standards/scs-0115-v1-default-rules-for-security-groups.md index 9cae149d3..b118dcf1f 100644 --- a/Standards/scs-0115-v1-default-rules-for-security-groups.md +++ b/Standards/scs-0115-v1-default-rules-for-security-groups.md @@ -18,7 +18,7 @@ Security Group (abbr. SG) Set of ip table rules, used for tenant network security. Security Group Rule (abbr. SG Rule) - A single ip table rule, that is part of a SG. + A single ip table rule, that is part of a Security Group. Administrator (abbr. Admin) Operator = User of an OpenStack cloud with the admin role. @@ -46,13 +46,13 @@ In recent OpenStack releases, both presets can be adjusted independently by admi ## Motivation The rules of a Security Group can be edited by default by any user with the member role within a project. -But when a Security Group is created it automatically incorporates a few SG rules that are configured as default rules. +But when a Security Group is created it automatically incorporates a few Security Group rules that are configured as default rules. Since the 2023.2 release, the default set of Security Group rules can be adjusted. This functionality is only available to administrators[^1][^2]. -In combination with the OpenStack behavior that when a VM is created with no Security Group specified, the default SG of the project is automatically applied to the ports of the VM, +In combination with the OpenStack behavior that when a VM is created with no Security Group specified, the default Security Group of the project is automatically applied to the ports of the VM, a user cannot be sure which firewall rules are applied to such a VM. -Therefore this standard proposes default Security Group rules that MUST be set by administrators to avoid divergence in default network security between different IaaS environments. +Therefore, this standard proposes default Security Group rules that MUST be set by administrators to avoid divergence in default network security between different IaaS environments. [^1]: [Tracking of development for editable default SG rules](https://bugs.launchpad.net/neutron/+bug/1983053) [^2]: [Release Notes of Neutron 2023.2](https://docs.openstack.org/releasenotes/neutron/2023.2.html) @@ -72,7 +72,7 @@ There are two ways to approach a standard for the default rules of Security Grou OpenStack's default rules for Security Groups already provide a good baseline for port security, because they allow all egress traffic and for the default Security Group only ingress traffic from the same group. Allowing more rules would not benefit the security level, while reducing or limiting the existing rules would barely improve it. - Nevertheless a standard could hold up the current security level against possible future release with more open default rules. + Nevertheless, a standard could hold up the current security level against possible future release with more open default rules. Changing the default rules will not change the rules of any existing Security Groups. 2. **With the already strict OpenStack default rules users are required in most use cases to create and manage their own Security Groups.** @@ -92,13 +92,13 @@ And it would make it necessary for users to check and change the rules of their This standard should only be applied onto versions of OpenStack that implement the new endpoint for the default Security Group rules, which would only include 2023.2 or higher releases. -It is possible to have different default Security Group rules for the default SG and custom SGs. +It is possible to have different default Security Group rules for the default Security Group and custom Security Groups. And it is arguable to have a more strict standard for default rules for the default Security Group than for the custom Security Groups. Because the latter ones are not automatically applied to a VM but are always edited by the users to apply to their requirements. -The allowlisting concept of Security Group rules makes it hard to allow traffic with an exception of certain ports. +The allowlisting concept of Security Group rules makes it hard to allow traffic with an exception to certain ports. It would be possible to just define many rules to achieve what a blocklist would achieve. -But having many rules may confuse users and they may not disable unnecessary default rules in their SGs. +But having many rules may confuse users, and they may not disable unnecessary default rules in their Security Groups. ## Standard diff --git a/Standards/scs-0210-v1-k8s-new-version-policy.md b/Standards/scs-0210-v1-k8s-new-version-policy.md index 69bcefd01..366fa8fae 100644 --- a/Standards/scs-0210-v1-k8s-new-version-policy.md +++ b/Standards/scs-0210-v1-k8s-new-version-policy.md @@ -17,7 +17,7 @@ description: | Here we will describe how fast providers need to keep up with the upstream Kubernetes version. -To create a informed decision we summarize here the Kubernetes rules regarding versioning at the time of writing (2023-01-16): +To create an informed decision we summarize here the Kubernetes rules regarding versioning at the time of writing (2023-01-16): Kubernetes usually provides about **3 minor** releases per year (see [Kubernetes Release Cycle][k8s-release-cycle]). @@ -37,7 +37,7 @@ The remaining 2 months are only for: Kubernetes is a fast-paced project. We want to achieve that providers keep up to date with upstream and do not fall behind Kubernetes releases. -This ensures that users are able to upgrade their clusters to address security issues, bug fixes and new features when using SCS compliant clusters in regards of Kubernetes. +This ensures that users are able to upgrade their clusters to address security issues, bug fixes and new features when using SCS compliant clusters in regard to Kubernetes. However, providers should have reasonable time to implement the new Kubernetes versions and test them. ## Decision diff --git a/Standards/scs-0210-v2-k8s-version-policy.md b/Standards/scs-0210-v2-k8s-version-policy.md index 88ed5738b..f09773034 100644 --- a/Standards/scs-0210-v2-k8s-version-policy.md +++ b/Standards/scs-0210-v2-k8s-version-policy.md @@ -24,14 +24,14 @@ More information can be found under [Kubernetes Support Period]. The [Kubernetes release cycle][k8s-release-cycle] is set around 4 months, which usually results in about **3 minor** releases per year. -Patches to these releases are provided monthly, with the exception of the first patch, +Patches to these releases are provided monthly, except for the first patch, which is usually provided 1-2 weeks after the initial release (see [Patch Release Cadence][k8s-release-cadence]). ## Motivation Kubernetes is a living, fast-paced project, which follows a pre-defined release cycle. -This enables forward planning with regards to releases and patches, but also implies a +This enables forward planning with regard to releases and patches, but also implies a necessity to upgrade to newer versions quickly, since these often include new features, important security updates or especially if a previous version falls out of the support period window. @@ -40,7 +40,7 @@ We want to achieve an up-to-date policy, meaning that providers should be mostly sync with the upstream and don't fall behind the official Kubernetes releases. This is achievable, since new versions are released periodical on a well communicated schedule, enabling providers and users to set up processes around it. -Being up to date ensures that security issues and bugs are addressed and new features +Being up-to-date ensures that security issues and bugs are addressed and new features are made available when using SCS compliant clusters. It is nevertheless important to at least support all Kubernetes versions that are still @@ -60,7 +60,7 @@ the provided Kubernetes versions should be kept up-to-date with new upstream rel - This time period MUST be even shorter for patches that fix critical CVEs. In this context, a critical CVE is a CVE with a CVSS base score >= 8 according to the CVSS version used in the original CVE record (e.g., CVSSv3.1). - It is RECOMMENDED to provide a new patch version in a 2 day time period after their release. + It is RECOMMENDED to provide a new patch version in a 2-day time period after their release. - New versions MUST be tested before being rolled out on productive infrastructure; at least the [CNCF E2E tests][cncf-conformance] should be passed beforehand. diff --git a/Standards/scs-0213-v1-k8s-nodes-anti-affinity.md b/Standards/scs-0213-v1-k8s-nodes-anti-affinity.md index a4c5231f4..6d2e10ad4 100644 --- a/Standards/scs-0213-v1-k8s-nodes-anti-affinity.md +++ b/Standards/scs-0213-v1-k8s-nodes-anti-affinity.md @@ -7,8 +7,8 @@ track: KaaS ## Introduction -A Kubernetes instance is provided as a cluster, which consists of a set of worker machines, -so called nodes. A cluster is composed of a control plane and at least one worker node. +A Kubernetes instance is provided as a cluster, which consists of a set of worker machines, also called nodes. +A cluster is composed of a control plane and at least one worker node. The control plane manages the worker nodes and therefore the pods in the cluster by making decisions about scheduling, event detection and global decisions. Inside the control plane, multiple components exist, which can be duplicated and distributed over multiple machines @@ -36,20 +36,22 @@ could fail, they should be distributed over multiple nodes on different machines This can be steered with the Affinity or Anti Affinity features, which are separated by Kubernetes into two features: -Node Affinity -The Node Affinity feature allows to match pods according to logical matches of -key-value-pairs referring to labels of nodes. -These can be defined with different weights or preferences in order to allow fine-grained -selection of nodes. The feature works similar to the Kubernetes nodeSelector. -It is defined in the PodSpec using the nodeAffinity field in the affinity section. - -Pod Affinity -Pod Affinity or Pod Anti Affinity allows the constraint of pod scheduling based on the -labels of pods already running on a node. -This means the constraint will match other pods on a node according to their labels key-value-pairs -and then either schedule the pod to the same (Affinity) or another (Anti Affinity) node. -This feature is also defined in the PodSpec using the podAffinity and podAntiAffinity -fields in the affinity section. [3] +- Node Affinity + + The Node Affinity feature allows to match pods according to logical matches of + key-value-pairs referring to labels of nodes. + These can be defined with different weights or preferences in order to allow fine-grained + selection of nodes. The feature works similar to the Kubernetes nodeSelector. + It is defined in the PodSpec using the nodeAffinity field in the affinity section. + +- Pod Affinity + + Pod Affinity or Pod Anti Affinity allows the constraint of pod scheduling based on the + labels of pods already running on a node. + This means the constraint will match other pods on a node according to their labels key-value-pairs + and then either schedule the pod to the same (Affinity) or another (Anti Affinity) node. + This feature is also defined in the PodSpec using the podAffinity and podAntiAffinity + fields in the affinity section. [3] Both features allow the usage of "required" or "preferred" keywords, which create "hard" or "soft" affinities. By using a hard affinity, a pod would need to be scheduled @@ -97,7 +99,7 @@ assign them to different nodes, but at this point, a redundant setup like presen So Anti Affinity in this context probably means more like distribution over multiple physical machines, which needs to be planned beforehand on the machine/server level. -Therefore would it be preferred for the control plane to use a redundant setup, which +Therefore, would it be preferred for the control plane to use a redundant setup, which is separated over different physical machines, meaning at least half of the control plane nodes runs on a different physical machine as the rest. The currently used ClusterAPI enables this by establishing the concept of "failure domains". These are used to control @@ -128,11 +130,11 @@ of them. This should provide at least the minimum requirements for a fault-toler For the standard, there is also a possibility to define multiple stages of distributed infrastructure and only make sensible ones a requirement and the rest optional, e.g. -* non-distributed clusters -* High-Availability clusters that are - * distributed over multiple machines/availability zones - * distributed over multiple clouds - * distributed over multiple physical locations/datacenters +- non-distributed clusters +- High-Availability clusters that are + - distributed over multiple machines/availability zones + - distributed over multiple clouds + - distributed over multiple physical locations/datacenters The worker nodes are RECOMMENDED to be distributed over different machines. In order to provide clear information to the users, the nodes should be labeled to reflect the diff --git a/Standards/scs-0215-v1-robustness-features.md b/Standards/scs-0215-v1-robustness-features.md index e0ad3dc88..0085c3702 100644 --- a/Standards/scs-0215-v1-robustness-features.md +++ b/Standards/scs-0215-v1-robustness-features.md @@ -179,7 +179,7 @@ csr-9wvgt 112s kubernetes.io/kubelet-serving system:node:worker-1 Further information and examples can be found in the Kubernetes documentation: [Kubeadm certs](https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/) -[Kubelete TLS bootstrapping](https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/) +[Kubelet TLS bootstrapping](https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/) ## Decision diff --git a/Standards/scs-0216-v1-requirements-for-testing-cluster-stacks.md b/Standards/scs-0216-v1-requirements-for-testing-cluster-stacks.md index d0b614239..82ea1856e 100644 --- a/Standards/scs-0216-v1-requirements-for-testing-cluster-stacks.md +++ b/Standards/scs-0216-v1-requirements-for-testing-cluster-stacks.md @@ -70,7 +70,7 @@ Two potential approaches for testing cluster stacks are the use of an IaaS provi - Challenges with monitoring and debugging. - Potential downtime and difficulty in running concurrent tests. -### Local Environment (Docker, Kubevirt) +### Local Environment (Docker, KubeVirt) #### Pros diff --git a/Standards/scs-0300-v1-requirements-for-sso-identity-federation.md b/Standards/scs-0300-v1-requirements-for-sso-identity-federation.md index 65eee33e2..70f3ddc47 100644 --- a/Standards/scs-0300-v1-requirements-for-sso-identity-federation.md +++ b/Standards/scs-0300-v1-requirements-for-sso-identity-federation.md @@ -25,7 +25,7 @@ premises or e.g. as an external 3rd party cloud service. To ease onboarding of customer employees (or e.g. customer contracted 3rd party admin staff) as SCS users, it would be good to be able to consume these external identities in SCS. -For customers this avoids the neccessity to explicitly maintain an additional +For customers this avoids the necessity to explicitly maintain an additional dedicated account in SCS and this also reduces what SCS needs to do with respect to taking care of persisting user account information. @@ -34,7 +34,7 @@ authentication to external identity providers and map those users to roles in SCS that can be used for authorization decisions when users access SCS services. In addition to user identities there we also see the necessity to support the -use of "machine identites" (aka "workload identities" or "service accounts"). +use of "machine identities" (aka "workload identities" or "service accounts"). These will probably be SCS-local accounts and have for example the purpose to grant CaaS workload access to storage resources served by the infrastructure layer. Exact architectural details for this are still in active discussion, @@ -50,11 +50,11 @@ authorization. One thing these services have in common, is that they are able to use SSO protocols like OAuth 2.0 or OpenID Connect (OIDC) on top of it to delegate authentication. They are service providers (SAML terminology) and can -be relying parties (OIDC terminology) of a protocol compliant identity provider +be relying on parties (OIDC terminology) of a protocol compliant identity provider (IdP). So the idea is, to run an SSO IdP as part of SCS to provide a dedicated point -of entry for identites, which the SCS service layers can use as a common +of entry for identities, which the SCS service layers can use as a common interface to consume external user identities. The purpose of this document is to specify what requirements a specific @@ -66,10 +66,10 @@ in the context of SCS. As a central service for identity handling, the IdP service needs to be robust and reliable. -Customers shall be able to access self service, so that +Customers shall be able to access self-service, so that they can make reasonable adjustments e.g. to role mapping. At the time of writing this document it's still undecided -if SCS has the requirement of a dedicated "self service" service +if SCS has the requirement of a dedicated "self-service" service that serves as a frontend to provision and re-configure customer specific data, abstracting e.g. from IdP specific user interface particularities. @@ -77,7 +77,7 @@ user interface particularities. Keycloak is currently being deployed as part of the IaaS reference implementation. Technically this IdP component shall be shifted from the management plane to be run on the basis of a "minimal" Kubernetes (e.g. K3S), -e.g. to make use of the "self healing" and scaling features achievable +e.g. to make use of the "self-healing" and scaling features achievable with that. So one of the considerations is if the solution will work well on a @@ -98,7 +98,7 @@ Quarkus instead of WildFly/JBoss. The project maintains several means of community contributions as listed on the [community section](https://www.keycloak.org/community) -of the project website. It uses [Github issues](https://github.com/keycloak/keycloak/issues) +of the project website. It uses [GitHub issues](https://github.com/keycloak/keycloak/issues) to track development. It offers a REST API for administration and there's a separately maintained @@ -111,7 +111,7 @@ in adopting to protocol standard changes and extensions. This has been observed in the case of logout support (backend and frontend variants) in OIDC. It offers a concept of "Identity Brokering", where Keycloak is not just IdP -but also "client" to other IdPs. This allows daisy chaining of identity +but also "client" to other IdPs. This allows daisy-chaining of identity federation. In this configuration it can work as a point of protocol transition between different supported SSO protocols (SAML, OAuth 2.0, etc.). @@ -122,7 +122,7 @@ e.g.). Keycloak's implementation makes some design decisions, that are specific to it and have consequences for clients of the service. E.g. Keycloak has a concept of management "Realms", which have their own specific -set of HTTP API entrypoints, both for administration as well as for IdP +set of HTTP API entrypoints, both for administration and for IdP requests. Commonly Keycloak realms can be used to map them 1:1 to user domains, @@ -145,9 +145,9 @@ for all aspects of its administration interface. For storage of Keycloak configuration and local user metadata (e.g. from which external IdP a user account originally came from) -Keycloak supports several SQL backends through JDBC. Thus +Keycloak supports several SQL backends through JDBC. Thus, it can be hooked up to a Postgres Database or to a -MariaDB/Galera cluster e.g.. +MariaDB/Galera cluster e.g. As of April 11, 2023, Keycloak joined the CNCF as an incubating project. @@ -157,9 +157,9 @@ Zitadel is a newer implementation of an SSO IdP. It is implemented in Go and under active development and maintained by ZITADEL. The project is open for community [contributions](https://github.com/zitadel/zitadel/blob/main/CONTRIBUTING.md) -to all parts of the eco system. -Feature requests and bugs being tracked on [Github](https://github.com/orgs/zitadel/projects/2/views/5) for development. -Community questions can be asked in the [public chat](https://zitadel.com/chat) or via [Github Discussions](https://github.com/zitadel/zitadel/discussions). +to all parts of the ecosystem. +Feature requests and bugs being tracked on [GitHub](https://github.com/orgs/zitadel/projects/2/views/5) for development. +Community questions can be asked in the [public chat](https://zitadel.com/chat) or via [GitHub Discussions](https://github.com/zitadel/zitadel/discussions). ZITADEL offers support for the commonly used authentication and authorization protocols such as OIDC, OAuth2, SAML2. It is a compliant and certified OpenID Connect provider with support for various Grant Types for both human users and machine users. Compared to Keycloak SPIs, ZITADEL offers Actions to customize and integrate (eg, calling external APIs, Webhooks, customizing pre-built workflows, customizing tokens) @@ -175,7 +175,7 @@ in the following areas: - For client services (single set of HTTP API endpoints). - For SCS operators for provisioning customer [organizations](https://zitadel.com/docs/concepts/structure/organizations) - and robust configuraton by using templated client, role and mapping + and robust configuration by using templated client, role and mapping configuration. - For SCS customers for a robust user experience for self servicing. @@ -188,8 +188,8 @@ Managers that receive granted Projects can assign users permissions to use the p for multiple areas of use and configuration. It recently also added support for the [Device Authorization Grant](https://github.com/zitadel/oidc/issues/141), -which, at time of writing, is a feauture that is relevant -for SCS to be able use OpenStack CLI and APIs with federated +which, at time of writing, is a feature that is relevant +for SCS to be able to use OpenStack CLI and APIs with federated identities ([Device Authorization Grant](https://github.com/SovereignCloudStack/issues/issues/221)). Support for consumption of LDAP backends is available since [Zitadel v2.23.0](https://github.com/zitadel/zitadel/releases/tag/v2.23.0) @@ -203,7 +203,7 @@ to use Kubernetes (or similar like Knative) and CockroachDB. At time of writing a PoC "spike" is done to assess and verify the hopes connected with Zitadel in the context of the SCS testbed. -Currently Zitadel is lacking the possibility to easily add custom claims. +Currently, Zitadel is lacking the possibility to easily add custom claims. It supports `urn:zitadel:iam:user:metadata`, but that is more suitable towards Kubernetes and cannot be parsed with the OpenStack mapping mechanism. [There is work going on](https://github.com/zitadel/zitadel/issues/3997) which @@ -238,7 +238,7 @@ Keycloak currently supports the OAuth 2.0 grants that SCS wants to make use of (e.g. Device Authorization Grant). It is the implementation for which integration is currently documented in OpenStack and implemented in kolla-ansible. SCS currently deploys Keycloak and the IAM team has -most hands on expecience with it, e.g. when it comes to colletaral questions +most hands-on experience with it, e.g. when it comes to collateral questions like how to make TLS and signing certificates available to the IdP that shall be used in federation to external domains. diff --git a/Standards/scs-0301-v1-naming-conventions.md b/Standards/scs-0301-v1-naming-conventions.md index 6540e35e2..bf909e573 100644 --- a/Standards/scs-0301-v1-naming-conventions.md +++ b/Standards/scs-0301-v1-naming-conventions.md @@ -33,10 +33,10 @@ OPTIONAL For naming the customers the suggestion from PS is the following: -A prefix will be use to differenciate domain, project and user in -the openstack environment. The project name is also added as a sufix. +A prefix will be used to differentiate domain, project and user in +the openstack environment. The project name is also added as a suffix. -So the onboaring tool will create the following structure for a new +So the onboarding tool will create the following structure for a new customer onboarded in the system. ```commandline @@ -109,15 +109,15 @@ will be called "Customer A". There should be an OIDC client in each customer realm to allow the federation to the Proxy realm. Currently called OSISM on the testbed. -On the proxy realm, it's needed to add this new customer realm as an idenity provider. During the creation of the identity +On the proxy realm, it's needed to add this new customer realm as an identity provider. During the creation of the identity provider for "Customer A", the field "Alias" should be set to ``. This will make that the users federated from -realm "Customer A" to the proxy realm to be prefixed to avoid naming colisions, e.g. `d${ALIAS}-${CLAIM.preferred_username}`. +realm "Customer A" to the proxy realm to be prefixed to avoid naming collisions, e.g. `d${ALIAS}-${CLAIM.preferred_username}`. Also, on the identity federation there should be configured to store the `` from that realm into the users. So it -can be send to Keystone mapping to use it as `gd-member` and `gp--member`. There is +can be sent to Keystone mapping to use it as `gd-member` and `gp--member`. There is also the necessity of a mapper to send the `openstack-default-project`. -Add the aditional mappings for roles and groups as necessary to get the attributes from the customer realm into the OIDC +Add the additional mappings for roles and groups as necessary to get the attributes from the customer realm into the OIDC userinfo that is put into the OIDC to the proxy realm and from there to Keystone. #### _Option 2_ diff --git a/Standards/scs-0302-v1-domain-manager-role.md b/Standards/scs-0302-v1-domain-manager-role.md index 59702b3dc..29ffa5a7c 100644 --- a/Standards/scs-0302-v1-domain-manager-role.md +++ b/Standards/scs-0302-v1-domain-manager-role.md @@ -44,8 +44,8 @@ Omitting the provisioning of any Domain Manager users (i.e. not assigning the ne ## Motivation In the default configuration of Keystone, only users with the `admin` role may manage the IAM resources such as projects, groups and users and their relation through role assignments. -The `admin` role in OpenStack Keystone is not properly scoped when assigned within a domain or project only as due to hard-coded architectural limitations in OpenStack, a user with the `admin` role may escalate their privileges outside of their assigned project or domain boundaries. -Thus, it is not possible to properly give customers a self-service functionality in regards to project, group and user management with the default configuration. +The `admin` role in OpenStack Keystone is not properly scoped when assigned within a domain or project only as due to hard-coded architectural limitations in OpenStack, a user with the `admin` role may escalate their privileges outside their assigned project or domain boundaries. +Thus, it is not possible to properly give customers a self-service functionality in regard to project, group and user management with the default configuration. To address this, this standard defines a new Domain Manager persona implemented using a domain-scoped `manager` role in conjunction with appropriate Keystone API policy adjustments to establish a standardized extension to the default Keystone configuration allowing for IAM self-service capabilities for customers within domains. @@ -59,7 +59,7 @@ To address this, this standard defines a new Domain Manager persona implemented ## Design Considerations - the Domain Manager persona MUST support managing projects, groups and users within a specific domain -- the Domain Manager persona MUST be properly scoped to a domain, it MUST NOT gain access to resources outside of its owning domain +- the Domain Manager persona MUST be properly scoped to a domain, it MUST NOT gain access to resources outside its owning domain - the Domain Manager persona MUST NOT be able to manipulate existing roles or create new roles - the Domain Manager persona MUST only be able to assign specific non-administrative\* roles to their managed users where the applicable roles are defined by the CSP - Domain Managers MUST NOT be able to abuse the role assignment functionalities to escalate their own privileges or those of other users beyond the roles defined by the CSP @@ -78,7 +78,7 @@ This results in special permissions being granted to users possessing the role w This poses severe security risks as the proper scoping of the `admin` role is impossible. **Due to this, this approach was discarded early.** -Upstream (OpenStack) is in the process of addressing this across the services but it has not been fully implemented yet, especially for domains[^3]. +Upstream (OpenStack) is in the process of addressing this across the services, but it has not been fully implemented yet, especially for domains[^3]. [^2]: [Launchpad bug: "admin"-ness not properly scoped](https://bugs.launchpad.net/keystone/+bug/968696) @@ -124,7 +124,7 @@ The only parts of the policy definitions that may be changed are: ```yaml # SCS Domain Manager policy configuration -# Section A: OpenStack base definitons +# Section A: OpenStack base definitions # The entries beginning with "base_" should be exact copies of the # default "identity:" definitions for the target OpenStack release. # They will be extended upon for the manager role below this section. @@ -240,7 +240,7 @@ They are used as a basis for the domain-manager-specific changes which are imple The section of "`base_*`" rules is meant for easy maintenance/update of default rules while keeping the domain-manager-specific rules separate. > **Note:** -> The "`or rule:admin_required`" appendix to the rule defintions in "Section B" is included for backwards compatibility with environments not yet fully configured for the new secure RBAC standard[^6]. +> The "`or rule:admin_required`" appendix to the rule definitions in "Section B" is included for backwards compatibility with environments not yet fully configured for the new secure RBAC standard[^6]. [^6]: [OpenStack Technical Committee Governance Documents: Consistent and Secure Default RBAC](https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html) @@ -374,4 +374,4 @@ Rationale: Links / Comments / References: - [SIG IAM meeting protocol entry](https://input.scs.community/2023-scs-sig-iam#Domain-Admin-rights-for-SCS-IaaS-Customers-184) -- [issue commment about decision](https://github.com/SovereignCloudStack/issues/issues/184#issuecomment-1670985934) +- [issue comment about decision](https://github.com/SovereignCloudStack/issues/issues/184#issuecomment-1670985934) diff --git a/Standards/scs-0400-v1-status-page-create-decision.md b/Standards/scs-0400-v1-status-page-create-decision.md index 139a05675..0443cd4b2 100644 --- a/Standards/scs-0400-v1-status-page-create-decision.md +++ b/Standards/scs-0400-v1-status-page-create-decision.md @@ -9,13 +9,13 @@ enhances: status-page-comparison.md ## Introduction Creating and maintaining IT infrastructure is a complex task. -Any kind of consumer (e.g. operators, cutsomers) can +Any kind of consumer (e.g. operators, customers) can be supported by presenting the status of all possible parts of the serving infrastructure. Whether a service is not reachable or the used hardware is having an outage we want the consumers to be easily informed by using a "Status Page" application. The need for a "Status Page" came up early in the SCS project and the requirements a "Status Page" application -has to fulfill were defined and written down on 2022-06-02 as a +has to fulfill were defined and written down on 2022-06-02 as the [MVP-0 epic](https://github.com/SovereignCloudStack/issues/issues/123). The upcoming research on existing solutions came to the conclusion that we want to create a new "Status Page" application. @@ -48,7 +48,7 @@ we pick up an existing project and try to get it in shape for our use case. It w own additional patches. So there will be a reference implementation that will match the requirements we have. -In addition there will be an architecture design documentation. So if the reference +In addition, there will be an architecture design documentation. So if the reference implementation may not fit to you, it will be possible to create your own application. ## Status Page Requirements @@ -60,7 +60,7 @@ implementation may not fit to you, it will be possible to create your own applic - support that components are only visible to a subset of users - implies that there is a role that is read-only - On-Prem use case might be handled by having an authenticating reverse proxy in front -- The status page applicaton should allow for simple and easy theming +- The status page application should allow for simple and easy theming - Page = (Possibly simple) Web-UI @@ -101,15 +101,15 @@ implementation may not fit to you, it will be possible to create your own applic - to minimize the probability of making errors, updating the status of a component should not be hard brainwork - updates can be both machine generated status changes (triggered e.g. by health monitoring) - as well as updates from human operators + and updates from human operators - updating a status should allow the CSP Operator to do that in a fashion that either pushes infos to the subscribers or just updates the status on the status page - updating the status can either be toggling the status of the component or can be accompanied by additional textual information. - When updating a status with textual information the status page application should make it - easy for me as the CSP Operator to do in a way that if different people submit infos over time - they are presented in a similar way (eg. the status page application should guide so that the - resulting infos are presented in a identical way. Example: when updating infos of an incident + easy for me as the CSP Operator to do in a way, that if different people submit infos over time, + they are presented in a similar way (e.g. the status page application should guide so that the + resulting infos are presented in an identical way). Example: when updating infos of an incident over time the timeline should automatically be sorted by the status page application so that it does not depend on the Operator whether the newest info is on top or at the bottom. This is typical thing that varies if several people update items @@ -153,7 +153,7 @@ With those requirements in mind the projects that initially were found, were eva | user management | ✅ | ❌ | ❌ | ❌ | ✅ by OIDC | ⁇ through github? | ❌ | | different output format on notification | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | | external hosting | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ looks like you are limited to github | ✅ | -| project healthy | ❌ last commit 17 months | ❌ last commit 3 years | ❌ last commit 5 months | ✅ last commit 2 months | ✅ recent activities | ✅ recent activities | ❌ archived and abondend by the owner | +| project healthy | ❌ last commit 17 months | ❌ last commit 3 years | ❌ last commit 5 months | ✅ last commit 2 months | ✅ recent activities | ✅ recent activities | ❌ archived and abandoned by the owner | | documentation | ✅ API ❌ User Documentation | ❌ | ❌ | ❌ | ✅ | ⁇u | ❌ not reachable anymore | | git based | ❌ | ✅ | ❌ | ✅ | ❌ | ✅ | ⁇ a netlify based installation is able to communicate with github | | project page | [project page](https://cachethq.io/) | [project page](https://github.com/weeblrpress/clearstatus) | [project page](https://www.brotandgames.com/ciao/) | [project page](https://cstate.netlify.app/) | [project page](https://gatus.io/) | [project page](https://github.com/tadhglewis/issue-status) | [project page](https://marquez.co/statusfy) | diff --git a/Standards/scs-0401-v1-status-page-reference-implementation-decision.md b/Standards/scs-0401-v1-status-page-reference-implementation-decision.md index eca9480ae..2f9eb5bdf 100644 --- a/Standards/scs-0401-v1-status-page-reference-implementation-decision.md +++ b/Standards/scs-0401-v1-status-page-reference-implementation-decision.md @@ -7,9 +7,9 @@ track: Ops ## Introduction -For the reference implementation of the status page API defined by the [OpenAPI spec](https://github.com/SovereignCloudStack/status-page-openapi) some decision should be made to which technlogy to be used and why. +For the reference implementation of the status page API defined by the [OpenAPI spec](https://github.com/SovereignCloudStack/status-page-openapi) some decision should be made to which technology to be used and why. -A reference implementation should be of use to most of the intended group, but is not necsessarily applicable for every use case. +A reference implementation should be of use to most of the intended group, but is not necessarily applicable for every use case. ## Motivation @@ -19,9 +19,9 @@ For a reference implementation to be of any use, some common and widely used tec ### Programming Language -The status page application consists of an api server as well as a frontend. For implementing the [api server](https://github.com/SovereignCloudStack/status-page-api), which is generated from the [OpenAPI spec](https://github.com/SovereignCloudStack/status-page-openapi), [Go](https://go.dev/) was chosen, because of maturity and wide spread usage as industry standard. Go, in particular, is a modern programming language and is commonly used in network and cloud computing environments. +The status page application consists of an api server as well as a frontend. For implementing the [api server](https://github.com/SovereignCloudStack/status-page-api), which is generated from the [OpenAPI spec](https://github.com/SovereignCloudStack/status-page-openapi), [Go](https://go.dev/) was chosen, because of maturity and widespread usage as industry standard. Go, in particular, is a modern programming language and is commonly used in network and cloud computing environments. ### Database As database, [PostgreSQL](https://www.postgresql.org/) was chosen, since it is a mature, well-known database. PostgreSQL can be run in various environments from small setups to scaled setups. -Furthermore PostgreSQL is a very healthy project with an active community and a solid license. It easily passed the [SCS OSS health check](https://github.com/SovereignCloudStack/standards/blob/main/Drafts/OSS-Health.md). +Furthermore, PostgreSQL is a very healthy project with an active community and a solid license. It easily passed the [SCS OSS health check](https://github.com/SovereignCloudStack/standards/blob/main/Drafts/OSS-Health.md). diff --git a/Standards/scs-0402-v1-status-page-openapi-spec-decision.md b/Standards/scs-0402-v1-status-page-openapi-spec-decision.md index 00ba8a6dd..c1d28c5d4 100644 --- a/Standards/scs-0402-v1-status-page-openapi-spec-decision.md +++ b/Standards/scs-0402-v1-status-page-openapi-spec-decision.md @@ -11,7 +11,7 @@ While defining the [OpenAPI spec](https://github.com/SovereignCloudStack/status- ## Requirements -The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119). In addition, "FORBIDDEN" is to be interpreted equivalent to "MUST NOT". @@ -35,7 +35,7 @@ UUIDs are used, to ensure uniqueness. Also, they can be visually recognized as i #### Incremental -An `Incremental` is used in combination with other identifiers to identify a sub resource of any kind. `Incremental`s themselves are not globally unique, but unique for every sub resource of an unique resource. +An `Incremental` is used in combination with other identifiers to identify a sub resource of any kind. `Incremental`s themselves are not globally unique, but unique for every sub resource of a unique resource. #### Generation and order @@ -43,11 +43,11 @@ An `Incremental` is used in combination with other identifiers to identify a sub #### SeverityValue -A `SeverityValue` is an unsiged integer ranging from 0 to 100 inclusively. It MUST be utilized by an `Impact` when referenced by a requested `Component` to gauge the severity of the impact on that component. It MUST be added to an `Impact` when refereced by an `Incident`, when its created. While being described as an unsiged integer, implementing this value MAY not require it to be an uint data type in any form, because its range even fits in a signed int8 (byte) data type. +A `SeverityValue` is an unsigned integer ranging from 0 to 100 inclusively. It MUST be utilized by an `Impact` when referenced by a requested `Component` to gauge the severity of the impact on that component. It MUST be added to an `Impact` when referenced by an `Incident`, when its created. While being described as an unsigned integer, implementing this value MAY not require it to be an uint data type in any form, because its range even fits in a signed int8 (byte) data type. ### API objects -All objects which are used as payload, either as request or response, are defined by schemas. This centralizes the maintanence of field names and types, for both requests and responses. +All objects which are used as payload, either as request or response, are defined by schemas. This centralizes the maintenance of field names and types, for both requests and responses. ### API object fields @@ -62,7 +62,7 @@ Requests to updating operations SHOULD contain the minimum of the changed fields ### Endpoint naming -The endpoints are named in plural form, even when handeling single objects, to keep uniform paths. +The endpoints are named in plural form, even when handling single objects, to keep uniform paths. ### Phase list @@ -131,7 +131,7 @@ This means: A value of 100 is the maximum of the severity value. -A severity with the value of 100 MUST always be supplied. This is the highest severity for the system. If no severity with a value of 100 exists, e.g. the highest severity value is set at 90, an `Impact` with a higher `SeverityValue` WILL be considered to be an _unkown_ severity. +A severity with the value of 100 MUST always be supplied. This is the highest severity for the system. If no severity with a value of 100 exists, e.g. the highest severity value is set at 90, an `Impact` with a higher `SeverityValue` WILL be considered to be an _unknown_ severity. ### Component impacts @@ -139,13 +139,13 @@ Components list their impacts, which they are affected by, as read only. Only an ### Return of `POST` requests -Generally `POST` requests create new resources. These endpoints do not return the new resource, but a unique identifier to the resource e.g. an UUID. +Generally `POST` requests create new resources. These endpoints do not return the new resource, but a unique identifier to the resource e.g. a UUID. In most cases the new resource won't be used directly after creation. Most often list calls are used. If the new resource is used directly, it can be retrieved by the returned identifier. Payloads to POST requests SHALL NOT include ID or `Incremental` typed fields, it lies in the responsibility of the API server to assign IDs and `Incremental`s to objects. -### Return of `PATCH` requestes +### Return of `PATCH` requests Most commonly `PATCH` requests are used to partially or fully change a resource. These requests do not respond with the changed resource, nor an identifier. @@ -159,4 +159,4 @@ The `PUT` requests is most commonly used to update full objects, whereas `PATCH` ### Authentication and authorization -The API spec does not include either authentication (AuthN) nor authorization (AuthZ) of any kind. The API server MUST be secured by an reverse/auth proxy. +The API spec does not include either authentication (AuthN) nor authorization (AuthZ) of any kind. The API server MUST be secured by a reverse/auth proxy. diff --git a/Standards/scs-0403-v1-csp-kaas-observability-stack.md b/Standards/scs-0403-v1-csp-kaas-observability-stack.md index f8d0d3523..5b22881b1 100644 --- a/Standards/scs-0403-v1-csp-kaas-observability-stack.md +++ b/Standards/scs-0403-v1-csp-kaas-observability-stack.md @@ -83,7 +83,7 @@ Use a mix of [kubernetes-mixin alerts](https://github.com/kubernetes-monitoring/ - S3 compatible bucket as a storage for long term metrics is configured - thanos query-frontend is deployed and configured - thanos query is deployed and configured - - thanos reciever is deployed and configured (simple deployment, non HA, without router) + - thanos receiver is deployed and configured (simple deployment, non HA, without router) - thanos ruler is deployed and configured - thanos compactor is deployed and configured - thanos bucket-web is deployed and configured @@ -97,7 +97,7 @@ Use a mix of [kubernetes-mixin alerts](https://github.com/kubernetes-monitoring/ - There exist Dashboards for KaaS Cluster Health - KaaS L0 dashboard counters are working correctly - Dedicated L0 dashboards are deployed for KaaS and for IaaS monitoring layers - - There exist Dashboards for SCS services endpoinds health (BlackBox exporter) + - There exist Dashboards for SCS services endpoints health (BlackBox exporter) - There exist Dashboards for IaaS layer health - Automatic Setup of Exporters for Observability of managed K8s clusters - KaaS service is mocked @@ -117,13 +117,13 @@ Use a mix of [kubernetes-mixin alerts](https://github.com/kubernetes-monitoring/ A survey was conducted to gather the needs and requirements of a CSP when providing Kubernetes as a Service. The results of the Survey (Questions with answers) were the following: 1. What is your understanding of a managed Kubernetes Offering: - - Hassle-Free Installation and Maintainance (customer viewpoint); Providing Controlplane and worker nodes and responsibility for correct function but agnostic to workload - - Day0, 1 and 2 (~planning, provisioning, operations) full lifecyle management or let customer manages some parts of that, depending on customer contract + - Hassle-Free Installation and Maintenance (customer viewpoint); Providing control plane and worker nodes and responsibility for correct function but agnostic to workload + - Day0, 1 and 2 (~planning, provisioning, operations) full lifecycle management or let customer manages some parts of that, depending on customer contract 2. What Type and Depth of observability is needed - - CPU, RAM, HDD and Network usage, Health and Function of Cluster Nodes, Controlplane and if desired Customer Workload + - CPU, RAM, HDD and Network usage, Health and Function of Cluster Nodes, control plane and if desired Customer Workload -3. Do you have an observabiltiy infrastructure, if yes, how it is built +3. Do you have an observability infrastructure, if yes, how it is built - Grafana/Thanos/Prometheus/Loki/Promtail/Alertmanger Stack, i.e. [Example Infrastructure](https://raw.githubusercontent.com/dNationCloud/kubernetes-monitoring-stack/main/thanos-deployment-architecture.svg) 4. Data Must haves diff --git a/Standards/scs-0410-v1-gnocchi-as-metering-database.md b/Standards/scs-0410-v1-gnocchi-as-metering-database.md index 848c33df9..9fce901ed 100644 --- a/Standards/scs-0410-v1-gnocchi-as-metering-database.md +++ b/Standards/scs-0410-v1-gnocchi-as-metering-database.md @@ -19,7 +19,7 @@ when it is supposed to be used for billing purposes. This document discusses how such metering data should be stored within the SCS. -In partiuclar, +In particular, it provides rationale for the choice of Gnocchi as time-series database for metering data within SCS. From 7b6a4a616207ed5b2f3da49dcbe82e6fc0583004 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20B=C3=BCchse?= Date: Wed, 26 Jun 2024 11:31:15 +0200 Subject: [PATCH 7/9] Bugfix: add missing pco-prod4 (#647) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Matthias Büchse --- .zuul.d/secure.yaml | 22 ++++++++++++++++++++++ playbooks/clouds.yaml.j2 | 9 +++++++++ 2 files changed, 31 insertions(+) diff --git a/.zuul.d/secure.yaml b/.zuul.d/secure.yaml index 6abe89cf9..c4489dcdc 100644 --- a/.zuul.d/secure.yaml +++ b/.zuul.d/secure.yaml @@ -266,6 +266,28 @@ VfHQSOSy9hDC+tbt87xwpWNbWIMV9wVFPNLMowb9oAmqbjEEV2qMR9t+XJFHGQ3ta+nFM IPiM1gXvIZEXHRW23JBckC4dgxWZ43F2u7TM6emB2gW1DKgnWOn/tRlIcNKZxeZ6daJ9Y s8QePn/Z5DS6DaADyyLGaNbIAUXTvEgCEZvq6xAknTF0PT0zqw33PF3mwNoclM= + pco_prod4_ac_id: !encrypted/pkcs1-oaep + - L4tJH+zPSVZweHeg7FjSVgeDZdumMqhyEU9Amf6lUKqrHGz7llHgDp0InKyjrFe/CwWkG + Y3hySGiEvsrdqywYWRq3y1gfxCvdJ7RMIO7j0xH2oJtCa+v1MpJYLG7FwC34YNt4aphgg + VWdL6HgmBTwxmQZhRGMykqSoPTKRT0jInUZwKUg/xbAUj6WzMId0sfxM0C+q4zdSbyxhT + sRT9J7ewHbOBpOnO+RTjNP1yHhU8ZqZvJ8RoVHLGuu3i8mGjSdr5cnUrZj7bxdCv9Xh8h + QzirkJJ0MN7oiyvAjQdzC6fZQvlTGaH3ifzLZFWl/1ipwOsDDvb58011KxIjA4RpwoBU/ + fdLWZZnsLDGk1I/j1XZULipRHVqBZxCotIfXjMQhbbuRRC8nAADaS9jbB703gjgN90nxO + Adbp9kRj7MHLc0F1JRs8AbadCu4+VVxIPQFzg2LtfN200tXDYJ0XwXUZ899fGJkfXTJgi + Dy55LTZ8Dvumi+5AU5fuQ9cqeGGjG21878vuopaG9qwoEo6gcpHAQ77WpqfLYfN13jUUg + 9zFfpmzPJ7/307QVSMMdRdogEjAFkJ0TzwFYOysVTdv+wbfc5VTBAiX2HFLmyiE2G5F7g + 2dWrNS4ahwIlNXtG1PVvQ+kcT7gdx5WViHCUc4qwLwIgzkRguVLUIcokW2R6CI= + pco_prod4_ac_secret: !encrypted/pkcs1-oaep + - Poj5AZd4iE9iSZpUTizRgup9PshitKyN/hScYPee/NJmfF8qHKkpXEWK1YvnfCcL8xOuE + cgVAKWkWxpggBAYYRen7AdkGZR4zldCqHQ26xnjmXRvjEv0ncUL96pWg9yj6GeZjFyLon + /mFS7fc+cTDgPjJ2zgKi2uT4MV1LVAiARa5RXgXRZ64vg1F6UT1kKIuLUmM3iu83KImsh + AJgXjR0xsBS8qxbQ3l85+ybSBglXRp0ETOinxrVfyS7rpSnXepGLE2s2evSHVntybEgsy + TNCCtOti8phaGh2WSEyA/YZekMpMNhSq5bYS6J3ttF9fkpBE4Xsgbu7Z5yL4BTkEDOXBG + I8nNV8ICq8i5VEcaMByPWetJwFUxlYuQ07dOaqQk7XohQKd6+XMeUAkKlag9Vosb8K1kX + lX9EyR1Y+C28tc4soeXsg/TkE702JnCpJ7I3aQqSjbhUm0yDWOEwzT4TOoN1j35iXzD1K + +sTx+tASbZ9UobexgC+3hyMa1CanFzPPjgMm3UYyrMmnvi96zImau6Q/CpJhQg3tZ8vLz + 4BnqOQklRAJxZA5btw8SFAb7GB2TCeEs/+dt/XqLrY2XkeaR9lGBl3Bftvkr9vFVfsVmx + 7IMobRXhnMOdUZQo7JBc5BV2CB0ZhBn0phUCHQtD4BGQZb/YIl0wO1wyJdk4A0= regio_a_key: !encrypted/pkcs1-oaep - UEDFCkodx6dlfe9bidIhoPdXqEY4vBT9rwJLXXveBmPY9Q3cnQkQRjz4D/o7VHkyfCpkj hzWgvxpsFKnVBkHgLNCbXH8YUhhDTfNGJeLvgVNMo1sk/3JdfUynvgPNAWo1IA9hxxgXN diff --git a/playbooks/clouds.yaml.j2 b/playbooks/clouds.yaml.j2 index 39475c456..da0d3602d 100644 --- a/playbooks/clouds.yaml.j2 +++ b/playbooks/clouds.yaml.j2 @@ -46,6 +46,15 @@ clouds: application_credential_id: "{{ clouds_conf.pco_prod3_ac_id }}" application_credential_secret: "{{ clouds_conf.pco_prod3_ac_secret }}" auth_type: "v3applicationcredential" + pco-prod4: + region_name: "prod4" + interface: "public" + identity_api_version: 3 + auth: + auth_url: https://prod4.api.pco.get-cloud.io:5000 + application_credential_id: "{{ clouds_conf.pco_prod4_ac_id }}" + application_credential_secret: "{{ clouds_conf.pco_prod4_ac_secret }}" + auth_type: "v3applicationcredential" poc-kdo: interface: public identity_api_verion: 3 From 96983a68de69768ff8bb7d7cab6337651a8ca98a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20B=C3=BCchse?= Date: Wed, 26 Jun 2024 13:02:00 +0200 Subject: [PATCH 8/9] Introduce mechanism to parametrize standards (#595) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Matthias Büchse --- .../scs-0003-v1-sovereign-cloud-standards-yaml.md | 15 ++++++++++++++- Tests/iaas/standard-images/images-openstack.py | 9 +++++++++ Tests/scs-compatible-iaas.yaml | 4 +++- Tests/scs-compliance-check.py | 9 ++++++--- 4 files changed, 32 insertions(+), 5 deletions(-) diff --git a/Standards/scs-0003-v1-sovereign-cloud-standards-yaml.md b/Standards/scs-0003-v1-sovereign-cloud-standards-yaml.md index 43d8219ed..c94d6304e 100644 --- a/Standards/scs-0003-v1-sovereign-cloud-standards-yaml.md +++ b/Standards/scs-0003-v1-sovereign-cloud-standards-yaml.md @@ -137,8 +137,13 @@ Every list of standards consists of several standards that – altogether – de | `name` | String | Full name of the particular standard | _Flavor naming_ | | `url` | String | Valid URL to the latest raw version of the particular standard | _[Flavor naming](https://raw.githubusercontent.com/SovereignCloudStack/standards/main/Standards/scs-0100-v2-flavor-naming.md)_ | | `condition` | String | State of the particular standard, currently either `mandatory` or `optional`, default is `mandatory` | _mandatory_ | +| `parameters` | Map | Maps parameter names to parameter values | | | `checks` | Array | List of all checks that must pass; each entry being a check descriptor | | +The parameters specified here will be added to the variable assignment for all check tools that belong to this standard, so they will be substituted in the same way. +The advantage is that these parameters may show up in the automatically generated documentation, whereas the check tools themselves probably won't. +See the "Standard images" standard in the larger basic example below for a possible use case. + ### Check descriptor The following fields are valid for every check descriptor: @@ -194,7 +199,7 @@ versions: id: flavor-name-check lifetime: day - name: Image metadata - url: https://raw.githubusercontent.com/SovereignCloudStack/Docs/main/Standards/SCS-0004-v1-image-metadata.md + url: https://raw.githubusercontent.com/SovereignCloudStack/standards/main/Standards/scs-0102-v1-image-metadata.md condition: mandatory checks: - executable: image-md-check.py @@ -205,6 +210,14 @@ versions: condition: optional id: image-md-check-2 lifetime: day + - name: Standard images + url: https://raw.githubusercontent.com/SovereignCloudStack/standards/main/Standards/scs-0104-v1-standard-images.md + parameters: + image_spec: https://raw.githubusercontent.com/SovereignCloudStack/standards/main/Tests/iaas/scs-0104-v1-images.yaml + checks: + - executable: ./iaas/standard-images/images-openstack.py + args: -c {os_cloud} -d {image_spec} + id: standard-images-check - version: v4 # This is the upcoming version with a given target date. No further changes should be done to this set of standards stabilized_at: 2022-04-01 standards: diff --git a/Tests/iaas/standard-images/images-openstack.py b/Tests/iaas/standard-images/images-openstack.py index 22182fe80..1bfbdff5c 100755 --- a/Tests/iaas/standard-images/images-openstack.py +++ b/Tests/iaas/standard-images/images-openstack.py @@ -87,6 +87,15 @@ def main(argv): logger.critical("You need to have OS_CLOUD set or pass --os-cloud=CLOUD.") return 1 + # we only support local files; but we allow specifying the following URLs for the sake of + # better documentation + prefix = next(p for p in ( + 'https://raw.githubusercontent.com/SovereignCloudStack/standards/main/Tests/', + 'https://github.com/SovereignCloudStack/standards/blob/main/Tests/', + '', # sentinel (do not remove!) + ) if yaml_path.startswith(p)) + if prefix: + yaml_path = yaml_path[len(prefix):] try: with open(yaml_path, "rb") as fileobj: image_data = yaml.safe_load(fileobj) diff --git a/Tests/scs-compatible-iaas.yaml b/Tests/scs-compatible-iaas.yaml index b40929a41..e721a8139 100644 --- a/Tests/scs-compatible-iaas.yaml +++ b/Tests/scs-compatible-iaas.yaml @@ -37,9 +37,11 @@ versions: id: standard-flavors-check - name: Standard images url: https://raw.githubusercontent.com/SovereignCloudStack/standards/main/Standards/scs-0104-v1-standard-images.md + parameters: + image_spec: https://raw.githubusercontent.com/SovereignCloudStack/standards/main/Tests/iaas/scs-0104-v1-images.yaml checks: - executable: ./iaas/standard-images/images-openstack.py - args: -c {os_cloud} -d ./iaas/scs-0104-v1-images.yaml + args: -c {os_cloud} -d {image_spec} id: standard-images-check - version: v3 stabilized_at: 2023-06-15 diff --git a/Tests/scs-compliance-check.py b/Tests/scs-compliance-check.py index aa7ecd667..6d2ec118d 100755 --- a/Tests/scs-compliance-check.py +++ b/Tests/scs-compliance-check.py @@ -35,7 +35,7 @@ KEYWORDS = { 'spec': ('uuid', 'name', 'url', 'versions', 'prerequisite', 'variables'), 'version': ('version', 'standards', 'stabilized_at', 'deprecated_at'), - 'standard': ('checks', 'url', 'name', 'condition'), + 'standard': ('checks', 'url', 'name', 'condition', 'parameters'), 'check': ('executable', 'env', 'args', 'condition', 'lifetime', 'id', 'section'), } @@ -309,8 +309,11 @@ def main(argv): if config.sections and section not in config.sections: print(f"skipping check '{id_}': not in selected sections") continue - args = check.get('args', '').format(**config.assignment) - env = {key: value.format(**config.assignment) for key, value in check.get('env', {}).items()} + assignment = config.assignment + if "parameters" in standard: + assignment = {**assignment, **standard['parameters']} + args = check.get('args', '').format(**assignment) + env = {key: value.format(**assignment) for key, value in check.get('env', {}).items()} env_str = " ".join(f"{key}={value}" for key, value in env.items()) memo_key = f"{env_str} {check['executable']} {args}".strip() invokation = memo.get(memo_key) From ef991585075cbe10e8ce4fc3b11ca9c780e2a84d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matthias=20B=C3=BCchse?= Date: Wed, 26 Jun 2024 22:11:01 +0200 Subject: [PATCH 9/9] Fix templates: new documents start out in Propoosal state (#626) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Matthias Büchse --- Standards/scs-XXXX-vN-decision-record-template.md | 2 +- Standards/scs-XXXX-vN-standard-template.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Standards/scs-XXXX-vN-decision-record-template.md b/Standards/scs-XXXX-vN-decision-record-template.md index 774bd10b6..4b73c1ca0 100644 --- a/Standards/scs-XXXX-vN-decision-record-template.md +++ b/Standards/scs-XXXX-vN-decision-record-template.md @@ -1,7 +1,7 @@ --- title: _Descriptive title_ type: Decision Record -status: Draft +status: Proposal track: Global # | IaaS | Ops | KaaS | IAM --- diff --git a/Standards/scs-XXXX-vN-standard-template.md b/Standards/scs-XXXX-vN-standard-template.md index 1b8afaf22..52a4e7c6e 100644 --- a/Standards/scs-XXXX-vN-standard-template.md +++ b/Standards/scs-XXXX-vN-standard-template.md @@ -1,7 +1,7 @@ --- title: _Descriptive title_ type: Standard # | Procedural -status: Draft +status: Proposal track: Global # | IaaS | Ops | KaaS | IAM ---