Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCM(OriginClusterMaster): Use external application for Origin Cluster to discovery services #1607

Closed
winlinvip opened this issue Feb 15, 2020 · 11 comments
Assignees
Labels
Feature It's a new feature. TransByAI Translated by AI/GPT.
Milestone

Comments

@winlinvip
Copy link
Member

winlinvip commented Feb 15, 2020

In #1501, it is described that when deploying a source station cluster using Docker, SRS actually does not know its accessible IP. After starting SRS in Docker, it is assigned a NAT address, and only Docker knows the external address.

The solution is to add a parameter coworker when making internal requests within the source station cluster, as described in #1501. For example:


+----------------------------------+                       +-----------------------------------+
|   +------------------------+     |                       |   +------------------------+      |
|   |  Origin A(172.16.0.10) |     | /api/v1/clusters?     |   |  Origin B(172.16.0.11) |      |
|   +------------------------+     | coworker=192.168.0.21 |   +------------------------+      |
|                                  +-->--------------------+                                   +
|     docker A(192.168.0.20)       +----------<------------+     docker B(192.168.0.21)        |
|   coworkers [ B<192.168.0.21> ]  | origin=192.168.0.21   |   coworkers [ A<192.168.0.20> ]   |
+----------------------------------+                       +-----------------------------------+

This solution has a prerequisite, which is that we need to know the addresses of Docker A and B in advance. Additionally, it can be quite troublesome to update other source stations when the addresses change. This is known as manual service discovery and updating.

In frameworks like K8s with automatic service discovery, it is necessary to do this even when deploying multiple source station clusters. Manual discovery is not suitable in this case, and the problem should be solved using a separate service discovery service for the source station cluster called OCM (OriginClusterMaster). For example:

  1. Start a series of source stations like OriginA, OriginB, OriginC, etc., and configure coworkers with the address or domain name of OCM, for example: coworkers: ocm-service. Multiple configurations can be set.
  2. Start the OCM (OriginClusterMaster) service, exposing it as a k8s service named ocm-service. This DNS name can be used to connect to the cluster within the cluster.
  3. It is also necessary to configure the source stations like OriginA, OriginB, OriginC, etc., to update the information of the stream as ocm-service. For example, on_publish and on_unpublish, and possibly other events that will be updated later.

Note: In this OCM solution, each Origin provides its own service and requires its own K8s service and DNS name (which is automatically the K8s service name). The Origin needs to configure its own service and notify OCM.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Feb 15, 2020

The design goal of the OCM scheme is to support 100,000 routes.

Since each origin server provides services independently, OCM only serves as a service discovery function. This means that each origin needs to know or be able to know the address or name of its external service, which may need to be done through configuration and then passed from each origin to OCM.

For example, if the K8s service corresponding to OriginA is "origina-service", then "origina-service" needs to be configured as the service name for OriginA, and then OriginA will pass it to OCM.

Each Origin can support around 1000 streams. 10 Origins are required for 10,000 streams, and 100 Origins are required for 100,000 streams. SRS is positioned at the level of 100,000 streams, so this solution is acceptable. Generally, within 10,000 streams, it only requires a source cluster of less than 10 Origins to handle.

Million-stream solution: Of course, if there is a need to build an extremely large-scale source cluster, such as a million streams, a consistent hash can be used with 10 source clusters of 100,000 streams each. It requires modifying the edge nodes to support consistent hashing, distributing the streams across different source clusters within the entire system.

Cross-regional disaster recovery: OCM also assumes that the source clusters are deployed within the same region of cloud data centers and can communicate with each other. If there is a need for cross-regional disaster recovery, a cold backup solution can be considered, as the likelihood of a cloud data center region going down is very low. Another option is to create a cross-regional source cluster, which would require using the public network and customization.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Feb 15, 2020

Less than 5k traffic, you can directly use the Origin Cluster, including in K8s, refer to #1501 (comment)

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Feb 15, 2020

The Origin Cluster Master (OCM) also needs to have an important capability, which is to provide a unified API. Since each origin server in the source cluster has its own independent API and console, there is no unified API for the entire cluster externally.

Currently, OriginCluster and OriginClusterMaster only support stream discovery and redirection. For example, the cluster's console and 1985 can be used in a single origin server, but the edge information is lost. If in OriginCluster, only one origin can be selected because each origin has a separate service, and multiple services cannot provide services on the same port of the same SLB.

In addition, the same problem exists with edge clusters. This issue may be because the fundamental goal of the origin server and edge cluster is to be distributed and provide multiple nodes to share the load. On the other hand, the API is centralized and aims to have an overview of the entire cluster, which conflicts with the goal of the cluster itself. It is possible that the system API needs to be considered separately and not included in OCM.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Feb 15, 2020

For a cluster with more than 30 origin servers, it is necessary to configure 30 origin servers for edge configuration as well. The way edge servers retrieve content from origin servers needs improvement. Currently, edge clusters use a configuration of a list of origin server domain names. A better approach would be to configure OCM's API to query for available origin servers, so that only OCM's domain name needs to be configured.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Feb 17, 2020

When the origin server cluster is being released, the ideal state is to wait until the old stream service ends before stopping the Pod, which means the old stream should continue to provide service. This way, both the new and old Pods can provide service, or in other words, the old ones should no longer have new streams but still need to serve established connections.

Currently, the current practice is to directly restart, regardless of whether it is an update or a rollback. Generally, the origin server cluster is located behind the edge. When the origin server restarts, the edge will re-push, so that users will not be disconnected but there may be some impact.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Feb 18, 2020

Streaming media must be stateful, which means that the stream must be located on a certain origin server and cannot have its state transferred to a database (the database cannot determine which origin server the stream is located on). Whether it is in a tree-like structure of Origin-Edge or a loose mesh structure (where any node can be both an Origin and an Edge), the essence is that the stream has a state and when it needs to be fetched from the origin server, it must be accessed from the process where the stream is located.

Source

Edge is stateless, because the actual request initiated by the client is only one, which is to play the stream (even though RTMP sends multiple commands). The stream itself is not located on the Edge, so it can be accessed from any edge server, and all edge servers will fetch the stream from the origin server. This solves the problem of downstream scalability, which means that when many people are watching a stream, it can be scaled using edge servers.

Source

For scenarios with a large number of streams, such as surveillance cameras and conferences, the producers (publishers) and consumers (players) of the streams may be similar, or even more publishers than players. This involves the fundamental issue of streaming media, which is that streams have state. This is why conference services are more difficult to handle. In reality, publishing streams themselves are also stateless because TCP publishing only has one request, which is to publish the stream. However, UDP publishing may involve a new request due to IP address changes, which is a problem in mobile scenarios (generally, the solution for live streams after network switching is to reconnect, but the cost of reconnecting in conferences is too high and not suitable).

Source

Let's first consider streaming with a fixed address, which means we can consider the act of streaming as a request. In this case, streaming itself is stateless, and it is feasible to stream to any server. Playing the stream is also stateless, as it doesn't matter from which edge server it is played. However, the stream itself has state, as there is a difference in which source server the edge server retrieves the stream from. This is the only place where state exists. In other words, the state of the SRS cluster lies in where the stream is located, while the act of streaming and playing the stream itself is stateless.

Source

In the current Origin Cluster solution, the state of the stream is determined by the mutual access between the source servers. The source server addresses are configured in a configuration file, and this state needs to be updated when scaling up. In the OCM (Origin Cluster Manager) solution, the source server addresses are reported to the OCM, which stores them in a backend service, such as KV (Key-Value) storage, to solve the state issue of the source server addresses.

Source

In addition to the source server addresses, OriginCluster also assumes that the source servers can directly access each other. This requires that these source servers are in the same internal network, for example, using the StatefulSet+HeadlessService approach, where each source server has its own service domain name and address. If deployed using the Deployment+ClusterIP approach, it is equivalent to having each source server behind a SLB (Server Load Balancer), which has limited scalability and is not suitable for situations that require frequent scaling. It also makes the deployment process more cumbersome.

Source

The most comprehensive approach for the origin server cluster is to address the issue of directly exposing the origin server addresses, which means making the streaming addresses stateless. This not only involves making the storage of streaming addresses stateless, but also ensuring that the servers behind these addresses are stateless. One optional solution is:

Source

  • The origin server cluster is divided into federations, where clusters form larger clusters.
  • Within the origin server cluster, the origin servers are deployed as StatefulSets, providing addressing capabilities internally.
  • Within the origin server cluster, multiple edges are used, which are attached to SLB or ClusterIP to serve as a cluster.
  • When addressing with OCM, the addressing capabilities of the federation addresses need to be considered.

Source

The final solution is shown in the following diagram:

Diagram

image

Key technical points:

  • Point 1
  • Point 2
  • Point 3
  • OCM (Origin Cluster Master) is a global stateless service that manages flow addresses.
  • OriginCluster needs to consider Origin Cluster Federation and also needs to identify the address of the origin cluster in the federation.
  • EdgeCluster also needs to connect to OCM to find the correct entry point for the OriginCluster source cluster.

Remark: This solution is designed for a large-scale origin cluster and edge cluster, with the ability to update, rollback, and perform online service capabilities in a gradual manner. It may not be necessary for general businesses that do not reach this scale, and a simpler solution can be considered, which I will describe later.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Feb 18, 2020

For general applications, the business volume will not reach the level of millions of requests and millions of concurrent users. In this case, a simpler OCM (Object-Component Mapping) approach can be chosen, mainly addressing:

  • OCM serves as a global service to address the OriginCluster addressing issue, where both the origin server and the edge server address OCM instead of addressing through configuration.
  • The origin server has the capability to directly provide services, meaning the edge server can directly connect to the origin server.

image

Remark: This solution can support streaming at the scale of tens of thousands and playback at the scale of millions. The scalability of the origin server is not particularly strong but also not small, and it can fully meet the requirements of general scenarios.

TRANS_BY_GPT3

@winlinvip
Copy link
Member Author

winlinvip commented Dec 1, 2020

For the time being, we are not considering the OCM solution. The current Origin Cluster is sufficient for open source solutions.

Is it?

TRANS_BY_GPT3

@zhegexiaohuozi
Copy link

zhegexiaohuozi commented Feb 3, 2021

The design of the OCM version of the source station cluster looks more ideal and aesthetically pleasing.

TRANS_BY_GPT3

@jinleileiking

This comment was marked as off-topic.

@VectorJin
Copy link

VectorJin commented May 9, 2022

Similar to a microservice registry center, Consul can be used to achieve flow registration and discovery. It transfers the flow's status to the registry center, eliminating the need to deploy a separate master service.

The ability to register and discover flows can be abstracted into an API layer, making it easier to replace the underlying implementation.

Additionally, it is recommended not to overly rely on the capabilities provided by Kubernetes (k8s).

TRANS_BY_GPT3

@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature It's a new feature. TransByAI Translated by AI/GPT.
Projects
None yet
Development

No branches or pull requests

4 participants