Skip to content

Commit

Permalink
[FLINK-34738][cdc][docs-zh] "Deployment - YARN" Page for Flink CDC Ch…
Browse files Browse the repository at this point in the history
…inese Documentation (#3205)

* [FLINK-34738][cdc][docs-zh] "Deployment - YARN" Page for Flink CDC Chinese Documentation

* [FLINK-34738][cdc][docs-zh] Optimization for "Deployment - YARN" Page's Chinese Documentation
  • Loading branch information
Vincent-Woo authored Sep 29, 2024
1 parent 17c0dc4 commit 4b13c49
Showing 1 changed file with 31 additions and 31 deletions.
62 changes: 31 additions & 31 deletions docs/content.zh/docs/deployment/yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,44 +24,44 @@ specific language governing permissions and limitations
under the License.
-->

# Introduction
# 介绍

[Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) is a resource provider popular with many data processing frameworks.
Flink services are submitted to YARN's ResourceManager, which spawns containers on machines managed by YARN NodeManagers. Flink deploys its JobManager and TaskManager instances into such containers.
[Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) 是众多数据处理框架所青睐的资源提供者。
Flink 服务被提交至 YARNResourceManager,后者会在由 YARN NodeManager 管理的机器上生成 container。Flink 将其 JobManager TaskManager 实例部署到这些 container 中。

Flink can dynamically allocate and de-allocate TaskManager resources depending on the number of processing slots required by the job(s) running on the JobManager.
Flink 可以根据在 JobManager 上运行的作业处理所需的 slot 数量,动态分配和释放 TaskManager 资源。

## Preparation
## 准备工作

This *Getting Started* section assumes a functional YARN environment, starting from version 2.10.2. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not recommended for following through this *Getting Started* tutorial.
此“入门指南”部分假定从 2.10.2 版本起具备功能可用的 YARN 环境。YARN 环境可以通过像亚马逊 EMR、谷歌云 DataProc 等服务或 Cloudera 等产品来搭建。不建议在[本地](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html)[集群](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html)上手动设置 YARN 环境来完成本入门教程。

- Make sure your YARN cluster is ready for accepting Flink applications by running `yarn top`. It should show no error messages.
- Download a recent Flink distribution from the [download page](https://flink.apache.org/downloads/) and unpack it.
- **Important** Make sure that the `HADOOP_CLASSPATH` environment variable is set up (it can be checked by running `echo $HADOOP_CLASSPATH`). If not, set it up using
- 通过运行 `yarn top` 无错误信息显示以确保你的 YARN 集群准备好接受 Flink 应用程序的提交。
- [下载页面](https://flink.apache.org/downloads/)下载最新的 Flink 发行版并解压缩。
- 一定要确保设置了 `HADOOP_CLASSPATH` 环境变量(可以通过运行 `echo $HADOOP_CLASSPATH` 来检查)。如果没有,请使用以下命令进行设置。

```bash
export HADOOP_CLASSPATH=`hadoop classpath`
```

## Session Mode
## Session 模式

Flink runs on all UNIX-like environments, i.e. Linux, Mac OS X, and Cygwin (for Windows).
You can refer [overview]({{< ref "docs/connectors/pipeline-connectors/overview" >}}) to check supported versions and download [the binary release](https://flink.apache.org/downloads/) of Flink,
then extract the archive:
Flink 在所有类 UNIX 的环境中运行,即在 LinuxMac OS X 以及(针对 Windows 的)Cygwin 上运行。
你可以参考[概览]({{< ref "docs/connectors/pipeline-connectors/overview" >}})来检查支持的版本并下载[Flink二进制版本](https://flink.apache.org/downloads/)
然后解压文件:

```bash
tar -xzf flink-*.tgz
```

You should set `FLINK_HOME` environment variables like:
你需要设置 `FLINK_HOME` 环境变量,比如:

```bash
export FLINK_HOME=/path/flink-*
```

### Starting a Flink Session on YARN
### 在 YARN 启动一个Flink Session

Once you've made sure that the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session:
确保已设置 `HADOOP_CLASSPATH` 环境变量,即可在 YARN 会话启动一个 Flink 任务:

```bash
# we assume to be in the root directory of
Expand All @@ -78,9 +78,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
echo "stop" | ./bin/yarn-session.sh -id application_XXXXX_XXX
```

After starting YARN session, you can now access the Flink Web UI through the URL printed in the last lines of the command output, or through the YARN ResourceManager web UI.
启动 Yarn 会话之后,即可通过命令输出最后一行打印的 URL 或者 YARN ResourceManager Web UI 访问 Flink Web UI。

Then, you need to add some configs to your flink-conf.yaml:
然后,需要向 flink-conf.yaml 添加一些配置:

```yaml
rest.bind-port: {{REST_PORT}}
Expand All @@ -89,22 +89,22 @@ execution.target: yarn-session
yarn.application.id: {{YARN_APPLICATION_ID}}
```
{{REST_PORT}} and {{NODE_IP}} should be replaced by the actual values of your JobManager Web Interface, and {{YARN_APPLICATION_ID}} should be replaced by the actual YARN application ID of Flink.
{{REST_PORT}} {{NODE_IP}} 需要替换为 JobManager Web 接口的实际值,{{YARN_APPLICATION_ID}} 则需替换为 Flink 实际的 Yarn 应用 ID。
### Set up Flink CDC
Download the tar file of Flink CDC from [release page](https://github.com/apache/flink-cdc/releases), then extract the archive:
### 配置 Flink CDC
从[发布页面](https://github.com/apache/flink-cdc/releases)下载 Flink CDC 的 tar 文件,然后提取该归档文件:
```bash
tar -xzf flink-cdc-*.tar.gz
```

Extracted `flink-cdc` contains four directories: `bin`,`lib`,`log` and `conf`.
提取后的 `flink-cdc` 包含四个目录: `bin``lib``log` `conf`

Download the connector jars from [release page](https://github.com/apache/flink-cdc/releases), and move it to the `lib` directory.
Download links are available only for stable releases, SNAPSHOT dependencies need to be built based on specific branch by yourself.
[发布页面](https://github.com/apache/flink-cdc/releases)下载连接器 jar,并将其移动至 `lib` 目录中。
下载链接仅适用于稳定版本,SNAPSHOT 依赖项需要自己基于特定分支进行构建。

### Submit a Flink CDC Job
Here is an example file for synchronizing the entire database `mysql-to-doris.yaml`:
### 提交 Flink CDC 任务
下面是一个用于整库同步的示例文件 `mysql-to-doris.yaml`

```yaml
################################################################################
Expand Down Expand Up @@ -132,22 +132,22 @@ pipeline:

```

You need to modify the configuration file according to your needs.
Finally, submit job to Flink Standalone cluster using Cli.
你可以按需修改配置文件。
最后,通过 Cli 将作业提交至 Flink Standalone 集群。

```bash
cd /path/flink-cdc-*
./bin/flink-cdc.sh mysql-to-doris.yaml
```

After successful submission, the return information is as follows:
提交成功将返回如下信息:

```bash
Pipeline has been submitted to cluster.
Job ID: ae30f4580f1918bebf16752d4963dc54
Job Description: Sync MySQL Database to Doris
```

You can find a job named `Sync MySQL Database to Doris` running through Flink Web UI.
你可以通过 Flink Web UI 找到一个名为 `Sync MySQL Database to Doris` 的作业。

Please note that submitting to application mode cluster and per-job mode cluster are not supported for now.
请注意,目前还不支持提交至 application 模式集群和 per-job 模式集群。

0 comments on commit 4b13c49

Please sign in to comment.