OCM(OriginClusterMaster): Use external application for Origin Cluster to discovery services #1607

winlinvip · 2020-02-15T02:08:53Z

In #1501, it is described that when deploying a source station cluster using Docker, SRS actually does not know its accessible IP. After starting SRS in Docker, it is assigned a NAT address, and only Docker knows the external address.

The solution is to add a parameter coworker when making internal requests within the source station cluster, as described in #1501. For example:


+----------------------------------+                       +-----------------------------------+
|   +------------------------+     |                       |   +------------------------+      |
|   |  Origin A(172.16.0.10) |     | /api/v1/clusters?     |   |  Origin B(172.16.0.11) |      |
|   +------------------------+     | coworker=192.168.0.21 |   +------------------------+      |
|                                  +-->--------------------+                                   +
|     docker A(192.168.0.20)       +----------<------------+     docker B(192.168.0.21)        |
|   coworkers [ B<192.168.0.21> ]  | origin=192.168.0.21   |   coworkers [ A<192.168.0.20> ]   |
+----------------------------------+                       +-----------------------------------+

This solution has a prerequisite, which is that we need to know the addresses of Docker A and B in advance. Additionally, it can be quite troublesome to update other source stations when the addresses change. This is known as manual service discovery and updating.

In frameworks like K8s with automatic service discovery, it is necessary to do this even when deploying multiple source station clusters. Manual discovery is not suitable in this case, and the problem should be solved using a separate service discovery service for the source station cluster called OCM (OriginClusterMaster). For example:

Start a series of source stations like OriginA, OriginB, OriginC, etc., and configure coworkers with the address or domain name of OCM, for example: coworkers: ocm-service. Multiple configurations can be set.
Start the OCM (OriginClusterMaster) service, exposing it as a k8s service named ocm-service. This DNS name can be used to connect to the cluster within the cluster.
It is also necessary to configure the source stations like OriginA, OriginB, OriginC, etc., to update the information of the stream as ocm-service. For example, on_publish and on_unpublish, and possibly other events that will be updated later.

Note: In this OCM solution, each Origin provides its own service and requires its own K8s service and DNS name (which is automatically the K8s service name). The Origin needs to configure its own service and notify OCM.

TRANS_BY_GPT3

The text was updated successfully, but these errors were encountered:

winlinvip · 2020-02-15T02:12:37Z

The design goal of the OCM scheme is to support 100,000 routes.

Since each origin server provides services independently, OCM only serves as a service discovery function. This means that each origin needs to know or be able to know the address or name of its external service, which may need to be done through configuration and then passed from each origin to OCM.

For example, if the K8s service corresponding to OriginA is "origina-service", then "origina-service" needs to be configured as the service name for OriginA, and then OriginA will pass it to OCM.

Each Origin can support around 1000 streams. 10 Origins are required for 10,000 streams, and 100 Origins are required for 100,000 streams. SRS is positioned at the level of 100,000 streams, so this solution is acceptable. Generally, within 10,000 streams, it only requires a source cluster of less than 10 Origins to handle.

Million-stream solution: Of course, if there is a need to build an extremely large-scale source cluster, such as a million streams, a consistent hash can be used with 10 source clusters of 100,000 streams each. It requires modifying the edge nodes to support consistent hashing, distributing the streams across different source clusters within the entire system.

Cross-regional disaster recovery: OCM also assumes that the source clusters are deployed within the same region of cloud data centers and can communicate with each other. If there is a need for cross-regional disaster recovery, a cold backup solution can be considered, as the likelihood of a cloud data center region going down is very low. Another option is to create a cross-regional source cluster, which would require using the public network and customization.

TRANS_BY_GPT3

winlinvip · 2020-02-15T02:29:15Z

Less than 5k traffic, you can directly use the Origin Cluster, including in K8s, refer to #1501 (comment)

TRANS_BY_GPT3

winlinvip · 2020-02-15T03:35:46Z

The Origin Cluster Master (OCM) also needs to have an important capability, which is to provide a unified API. Since each origin server in the source cluster has its own independent API and console, there is no unified API for the entire cluster externally.

Currently, OriginCluster and OriginClusterMaster only support stream discovery and redirection. For example, the cluster's console and 1985 can be used in a single origin server, but the edge information is lost. If in OriginCluster, only one origin can be selected because each origin has a separate service, and multiple services cannot provide services on the same port of the same SLB.

In addition, the same problem exists with edge clusters. This issue may be because the fundamental goal of the origin server and edge cluster is to be distributed and provide multiple nodes to share the load. On the other hand, the API is centralized and aims to have an overview of the entire cluster, which conflicts with the goal of the cluster itself. It is possible that the system API needs to be considered separately and not included in OCM.

TRANS_BY_GPT3

winlinvip · 2020-02-15T11:00:31Z

For a cluster with more than 30 origin servers, it is necessary to configure 30 origin servers for edge configuration as well. The way edge servers retrieve content from origin servers needs improvement. Currently, edge clusters use a configuration of a list of origin server domain names. A better approach would be to configure OCM's API to query for available origin servers, so that only OCM's domain name needs to be configured.

TRANS_BY_GPT3

winlinvip · 2020-02-17T11:38:30Z

When the origin server cluster is being released, the ideal state is to wait until the old stream service ends before stopping the Pod, which means the old stream should continue to provide service. This way, both the new and old Pods can provide service, or in other words, the old ones should no longer have new streams but still need to serve established connections.

Currently, the current practice is to directly restart, regardless of whether it is an update or a rollback. Generally, the origin server cluster is located behind the edge. When the origin server restarts, the edge will re-push, so that users will not be disconnected but there may be some impact.

TRANS_BY_GPT3

winlinvip · 2020-02-18T05:53:49Z

Streaming media must be stateful, which means that the stream must be located on a certain origin server and cannot have its state transferred to a database (the database cannot determine which origin server the stream is located on). Whether it is in a tree-like structure of Origin-Edge or a loose mesh structure (where any node can be both an Origin and an Edge), the essence is that the stream has a state and when it needs to be fetched from the origin server, it must be accessed from the process where the stream is located.

Source

Edge is stateless, because the actual request initiated by the client is only one, which is to play the stream (even though RTMP sends multiple commands). The stream itself is not located on the Edge, so it can be accessed from any edge server, and all edge servers will fetch the stream from the origin server. This solves the problem of downstream scalability, which means that when many people are watching a stream, it can be scaled using edge servers.

Source

For scenarios with a large number of streams, such as surveillance cameras and conferences, the producers (publishers) and consumers (players) of the streams may be similar, or even more publishers than players. This involves the fundamental issue of streaming media, which is that streams have state. This is why conference services are more difficult to handle. In reality, publishing streams themselves are also stateless because TCP publishing only has one request, which is to publish the stream. However, UDP publishing may involve a new request due to IP address changes, which is a problem in mobile scenarios (generally, the solution for live streams after network switching is to reconnect, but the cost of reconnecting in conferences is too high and not suitable).

Source

Let's first consider streaming with a fixed address, which means we can consider the act of streaming as a request. In this case, streaming itself is stateless, and it is feasible to stream to any server. Playing the stream is also stateless, as it doesn't matter from which edge server it is played. However, the stream itself has state, as there is a difference in which source server the edge server retrieves the stream from. This is the only place where state exists. In other words, the state of the SRS cluster lies in where the stream is located, while the act of streaming and playing the stream itself is stateless.

Source

In the current Origin Cluster solution, the state of the stream is determined by the mutual access between the source servers. The source server addresses are configured in a configuration file, and this state needs to be updated when scaling up. In the OCM (Origin Cluster Manager) solution, the source server addresses are reported to the OCM, which stores them in a backend service, such as KV (Key-Value) storage, to solve the state issue of the source server addresses.

Source

In addition to the source server addresses, OriginCluster also assumes that the source servers can directly access each other. This requires that these source servers are in the same internal network, for example, using the StatefulSet+HeadlessService approach, where each source server has its own service domain name and address. If deployed using the Deployment+ClusterIP approach, it is equivalent to having each source server behind a SLB (Server Load Balancer), which has limited scalability and is not suitable for situations that require frequent scaling. It also makes the deployment process more cumbersome.

Source

The most comprehensive approach for the origin server cluster is to address the issue of directly exposing the origin server addresses, which means making the streaming addresses stateless. This not only involves making the storage of streaming addresses stateless, but also ensuring that the servers behind these addresses are stateless. One optional solution is:

Source

The origin server cluster is divided into federations, where clusters form larger clusters.
Within the origin server cluster, the origin servers are deployed as StatefulSets, providing addressing capabilities internally.
Within the origin server cluster, multiple edges are used, which are attached to SLB or ClusterIP to serve as a cluster.
When addressing with OCM, the addressing capabilities of the federation addresses need to be considered.

Source

The final solution is shown in the following diagram:

Key technical points:

Point 1
Point 2
Point 3

OCM (Origin Cluster Master) is a global stateless service that manages flow addresses.
OriginCluster needs to consider Origin Cluster Federation and also needs to identify the address of the origin cluster in the federation.
EdgeCluster also needs to connect to OCM to find the correct entry point for the OriginCluster source cluster.

Remark: This solution is designed for a large-scale origin cluster and edge cluster, with the ability to update, rollback, and perform online service capabilities in a gradual manner. It may not be necessary for general businesses that do not reach this scale, and a simpler solution can be considered, which I will describe later.

TRANS_BY_GPT3

winlinvip · 2020-02-18T06:17:38Z

For general applications, the business volume will not reach the level of millions of requests and millions of concurrent users. In this case, a simpler OCM (Object-Component Mapping) approach can be chosen, mainly addressing:

OCM serves as a global service to address the OriginCluster addressing issue, where both the origin server and the edge server address OCM instead of addressing through configuration.
The origin server has the capability to directly provide services, meaning the edge server can directly connect to the origin server.

Remark: This solution can support streaming at the scale of tens of thousands and playback at the scale of millions. The scalability of the origin server is not particularly strong but also not small, and it can fully meet the requirements of general scenarios.

TRANS_BY_GPT3

winlinvip · 2020-12-01T05:45:23Z

For the time being, we are not considering the OCM solution. The current Origin Cluster is sufficient for open source solutions.

Is it?

TRANS_BY_GPT3

zhegexiaohuozi · 2021-02-03T07:22:23Z

The design of the OCM version of the source station cluster looks more ideal and aesthetically pleasing.

TRANS_BY_GPT3

VectorJin · 2022-05-09T13:29:36Z

Similar to a microservice registry center, Consul can be used to achieve flow registration and discovery. It transfers the flow's status to the registry center, eliminating the need to deploy a separate master service.

The ability to register and discover flows can be abstracted into an API layer, making it easier to replace the underlying implementation.

Additionally, it is recommended not to overly rely on the capabilities provided by Kubernetes (k8s).

TRANS_BY_GPT3

winlinvip added this to the SRS 4.0 release milestone Feb 15, 2020

winlinvip added the Feature It's a new feature. label Feb 15, 2020

This was referenced Feb 15, 2020

Docker source station cluster, two-way streaming cannot be played simultaneously #1501

Closed

Support docker and k8s in native #1595

Closed

winlinvip mentioned this issue Feb 15, 2020

Cluster: Origin Cluster for Fault Tolarence and Load Balance. #464

Closed

winlinvip closed this as completed Dec 1, 2020

This comment was marked as off-topic.

Sign in to view

winlinvip self-assigned this Sep 5, 2021

winlinvip mentioned this issue Oct 22, 2021

Cluster: cluster coworkers update of raw api #1887

Closed

winlinvip added the TransByAI Translated by AI/GPT. label Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCM(OriginClusterMaster): Use external application for Origin Cluster to discovery services #1607

OCM(OriginClusterMaster): Use external application for Origin Cluster to discovery services #1607

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 17, 2020 •

edited

Loading

winlinvip commented Feb 18, 2020 •

edited

Loading

winlinvip commented Feb 18, 2020 •

edited

Loading

winlinvip commented Dec 1, 2020 •

edited

Loading

zhegexiaohuozi commented Feb 3, 2021 •

edited by winlinvip

Loading

This comment was marked as off-topic.

VectorJin commented May 9, 2022 •

edited by winlinvip

Loading

OCM(OriginClusterMaster): Use external application for Origin Cluster to discovery services #1607

OCM(OriginClusterMaster): Use external application for Origin Cluster to discovery services #1607

Comments

winlinvip commented Feb 15, 2020 • edited Loading

winlinvip commented Feb 15, 2020 • edited Loading

winlinvip commented Feb 15, 2020 • edited Loading

winlinvip commented Feb 15, 2020 • edited Loading

winlinvip commented Feb 15, 2020 • edited Loading

winlinvip commented Feb 17, 2020 • edited Loading

winlinvip commented Feb 18, 2020 • edited Loading

winlinvip commented Feb 18, 2020 • edited Loading

winlinvip commented Dec 1, 2020 • edited Loading

zhegexiaohuozi commented Feb 3, 2021 • edited by winlinvip Loading

This comment was marked as off-topic.

VectorJin commented May 9, 2022 • edited by winlinvip Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 17, 2020 •

edited

Loading

winlinvip commented Feb 18, 2020 •

edited

Loading

winlinvip commented Feb 18, 2020 •

edited

Loading

winlinvip commented Dec 1, 2020 •

edited

Loading

zhegexiaohuozi commented Feb 3, 2021 •

edited by winlinvip

Loading

VectorJin commented May 9, 2022 •

edited by winlinvip

Loading