Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-34738][cdc][docs-zh] "Deployment - YARN" Page for Flink CDC Chinese Documentation #3205

Merged
merged 3 commits into from
Sep 29, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 31 additions & 31 deletions docs/content.zh/docs/deployment/yarn.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,44 +24,44 @@ specific language governing permissions and limitations
under the License.
-->

# Introduction
# 介绍

[Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) is a resource provider popular with many data processing frameworks.
Flink services are submitted to YARN's ResourceManager, which spawns containers on machines managed by YARN NodeManagers. Flink deploys its JobManager and TaskManager instances into such containers.
[Apache Hadoop YARN](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) 是众多数据处理框架所青睐的资源提供者。
Flink 服务被提交至 YARNResourceManager,后者会在由 YARN NodeManager 管理的机器上生成 container。Flink 将其 JobManager TaskManager 实例部署到这些 container 中。

Flink can dynamically allocate and de-allocate TaskManager resources depending on the number of processing slots required by the job(s) running on the JobManager.
Flink 可以根据在 JobManager 上运行的作业处理所需的 slot 数量,动态分配和释放 TaskManager 资源。

## Preparation
## 准备工作

This *Getting Started* section assumes a functional YARN environment, starting from version 2.10.2. YARN environments are provided most conveniently through services such as Amazon EMR, Google Cloud DataProc or products like Cloudera. [Manually setting up a YARN environment locally](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html) or [on a cluster](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html) is not recommended for following through this *Getting Started* tutorial.
此“入门指南”部分假定从 2.10.2 版本起具备功能可用的 YARN 环境。YARN 环境可以通过像亚马逊 EMR、谷歌云 DataProc 等服务或 Cloudera 等产品来搭建。不建议在[本地](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html)或[集群](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html)上手动设置 YARN 环境来完成本入门教程。

- Make sure your YARN cluster is ready for accepting Flink applications by running `yarn top`. It should show no error messages.
- Download a recent Flink distribution from the [download page](https://flink.apache.org/downloads/) and unpack it.
- **Important** Make sure that the `HADOOP_CLASSPATH` environment variable is set up (it can be checked by running `echo $HADOOP_CLASSPATH`). If not, set it up using
- 通过运行 `yarn top` 无错误信息显示以确保你的 YARN 集群准备好接受 Flink 应用程序的提交。
- 从[下载页面](https://flink.apache.org/downloads/)下载最新的 Flink 发行版并解压缩。
- 一定要确保设置了 `HADOOP_CLASSPATH` 环境变量(可以通过运行 `echo $HADOOP_CLASSPATH` 来检查)。如果没有,请使用以下方式进行设置。
Vincent-Woo marked this conversation as resolved.
Show resolved Hide resolved

```bash
export HADOOP_CLASSPATH=`hadoop classpath`
```

## Session Mode
## Session 模式

Flink runs on all UNIX-like environments, i.e. Linux, Mac OS X, and Cygwin (for Windows).
You can refer [overview]({{< ref "docs/connectors/pipeline-connectors/overview" >}}) to check supported versions and download [the binary release](https://flink.apache.org/downloads/) of Flink,
then extract the archive:
Flink 在所有类 UNIX 的环境中运行,即在 LinuxMac OS X 以及(针对 Windows 的)Cygwin 上运行。
您可以参考[概览]({{< ref "docs/connectors/pipeline-connectors/overview" >}})来检查支持的版本并下载[Flink二进制版本](https://flink.apache.org/downloads/)
Vincent-Woo marked this conversation as resolved.
Show resolved Hide resolved
然后提取该归档文件:
Vincent-Woo marked this conversation as resolved.
Show resolved Hide resolved

```bash
tar -xzf flink-*.tgz
```

You should set `FLINK_HOME` environment variables like:
你需要设置 `FLINK_HOME` 环境变量,比如:

```bash
export FLINK_HOME=/path/flink-*
```

### Starting a Flink Session on YARN
### 在 YARN 启动一个Flink Session

Once you've made sure that the `HADOOP_CLASSPATH` environment variable is set, you can launch a Flink on YARN session:
确保已设置 `HADOOP_CLASSPATH` 环境变量,即可在 YARN 会话启动一个 Flink 任务:

```bash
# we assume to be in the root directory of
Expand All @@ -78,9 +78,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
echo "stop" | ./bin/yarn-session.sh -id application_XXXXX_XXX
```

After starting YARN session, you can now access the Flink Web UI through the URL printed in the last lines of the command output, or through the YARN ResourceManager web UI.
启动 Yarn 会话之后,即可通过命令输出最后一行打印的 URL 或者 YARN ResourceManager Web UI 访问 Flink Web UI。

Then, you need to add some configs to your flink-conf.yaml:
然后,需要向 flink-conf.yaml 添加一些配置:

```yaml
rest.bind-port: {{REST_PORT}}
Expand All @@ -89,22 +89,22 @@ execution.target: yarn-session
yarn.application.id: {{YARN_APPLICATION_ID}}
```

{{REST_PORT}} and {{NODE_IP}} should be replaced by the actual values of your JobManager Web Interface, and {{YARN_APPLICATION_ID}} should be replaced by the actual YARN application ID of Flink.
{{REST_PORT}} {{NODE_IP}} 需要替换为 JobManager Web 接口的实际值,{{YARN_APPLICATION_ID}} 则需替换为 Flink 实际的 Yarn 应用 ID。

### Set up Flink CDC
Download the tar file of Flink CDC from [release page](https://github.com/apache/flink-cdc/releases), then extract the archive:
### 配置 Flink CDC
从[发布页面](https://github.com/apache/flink-cdc/releases)下载 Flink CDC 的 tar 文件,然后提取该归档文件:

```bash
tar -xzf flink-cdc-*.tar.gz
```

Extracted `flink-cdc` contains four directories: `bin`,`lib`,`log` and `conf`.
提取后的 `flink-cdc` 包含四个目录: `bin``lib``log` `conf`

Download the connector jars from [release page](https://github.com/apache/flink-cdc/releases), and move it to the `lib` directory.
Download links are available only for stable releases, SNAPSHOT dependencies need to be built based on specific branch by yourself.
从[发布页面](https://github.com/apache/flink-cdc/releases)下载连接器 jar,并将其移动至 `lib` 目录中。
下载链接仅适用于稳定版本,SNAPSHOT 依赖项需要自己基于特定分支进行构建。

### Submit a Flink CDC Job
Here is an example file for synchronizing the entire database `mysql-to-doris.yaml`:
### 提交 Flink CDC 任务
下面是一个用于整库同步的示例文件 `mysql-to-doris.yaml`

```yaml
################################################################################
Expand Down Expand Up @@ -132,22 +132,22 @@ pipeline:

```

You need to modify the configuration file according to your needs.
Finally, submit job to Flink Standalone cluster using Cli.
你可以按需修改配置文件。
最后,通过 Cli 将作业提交至 Flink Standalone 集群。

```bash
cd /path/flink-cdc-*
./bin/flink-cdc.sh mysql-to-doris.yaml
```

After successful submission, the return information is as follows:
提交成功将返回如下信息:

```bash
Pipeline has been submitted to cluster.
Job ID: ae30f4580f1918bebf16752d4963dc54
Job Description: Sync MySQL Database to Doris
```

You can find a job named `Sync MySQL Database to Doris` running through Flink Web UI.
你可以通过 Flink Web UI 找到一个名为 `Sync MySQL Database to Doris` 的作业。

Please note that submitting to application mode cluster and per-job mode cluster are not supported for now.
请注意,目前还不支持提交至 application 模式集群和 per-job 模式集群。
Loading