Flink learing #1

wenbingshen · 2024-01-04T06:11:19Z

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

The TaskInfo is stored in the blob store on job creation time as a persistent artifact
Deployments RPC transmits only the blob storage reference
TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (100MB)
Extended integration test for recovery after master (JobManager) failure
Added test that validates that TaskInfo is transferred only once across recoveries
Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…f module.

…y-api to sql-gateway This closes apache#20678

…alized. (apache#20698)

…ocessingTimeRepeatedCompleteOrderedWithRetry This closes apache#20702.

…2.4.1

…ample" in "User-defined Sources & Sinks" page

…ubernetes

…son path (apache#20397)

…ile failure while running in idea This closes apache#20670.

… value when some child fields is null This closes apache#20616

… some hive udf required constant parameters with implicit constant passed This closes apache#18975

…port read complex type

…rectly

…ay rest endpoint This closes apache#20622

…p source This closes apache#20717

…en compression is enabled This closes apache#20647

…ssues caused by CURATOR-645 CURATOR-645 covers a bug in the LeaderLatch implementation that causes a race condition if a child node, participating in the leader election, is removed too fast. This results in a different code branch being executed which triggers a reset of the LeaderLatch instead of re-collecting the children to determine the next leader. The issue occurs because LeaderLatch#checkLeadership is not executed transactionally, i.e. retrieving the children and setting up the watcher for the predecessor is not done atomically. This leads to the race condition where a children (the previous leader's node) is removed before setting up the watcher which results in an invalid handling of the situation using reset. Adding some sleep here (simulating the leader actually doing something) will reduce the risk of falling into the race condition because it will give the concurrently running LeaderLatch instances more time to set up the watchers properly. This is only meant as a temporary solution until CURATOR-645 is resolved and the curator dependency on the Flink side is upgraded.

… fix metrics

…nt when constructing Async Client for Kinesis EFO

This closes apache#20748.

…xt classloader

… jdbc connector by add jar syntax This closes apache#20707

…ructor

…onnector. This closes apache#20235.

…QL Client via Docker Compose

…Chinese documentation to bring them back in sync

…result in case of cache miss

…g in the HiveServer2 Endpoint when openSession This closes apache#20714

…JobDetails to avoid memory leak. This closes apache#20733.

…eupException in KafkaConsumerThread KafkaConsumerThread makes a wakeup on the KafkaConsumer on offset commit to wakeup the potential blocking KafkaConsumer.poll(). However the wakeup might happen when the consumer is not polling. The wakeup will be remembered by the consumer and re-examined while committing the offset asynchronously, which leads to an unnecessary WakeupException.

…ict of multiple LOOKUP hints This closes apache#20743

… CommonPhysicalJoin This closes apache#20763

…ncorrectly set to a negative number (apache#20765)

…e.flink.shaded prefix in flink-kubernetes For supporting stepDecorators SPI(pluginable decorators), we propose to package the implementation class and associated dependencies into a plugin jar. So we need to load the said dependencies from parent class loader, as the most part / all of plugin decorators depend on the fabric8 kubernetes dependency, such as replies on the kubernetes models/client from fabric8. So we need to shade all the said classes in flink-kubernetes and flink-dist.

zentol and others added 30 commits August 29, 2022 15:07

[hotfix][tests] Nicer migration path for typo fix

d61a7a9

[FLINK-29062][build] Fix protobuf plugin proxy issue on flink-protobu…

7ac37c0

…f module.

[FLINK-27030][tests] Prevent race-condition

aa3c124

[hotfix][tests] Minor cleanup

d0434e6

[FLINK-29097][sql-gateway] Move json se/deserializers from sql-gatewa…

e18782f

…y-api to sql-gateway This closes apache#20678

[FLINK-28609][Connector/Pulsar] PulsarSchema didn't get properly seri…

d9bcbff

…alized. (apache#20698)

[FLINK-29038][runtime] Fix unstable case AsyncWaitOperatorTest#testPr…

a1d74c1

…ocessingTimeRepeatedCompleteOrderedWithRetry This closes apache#20702.

[FLINK-28814][Connectors][JDBC] Update org.postgresql:postgresql to 4…

7669daf

…2.4.1

[FLINK-28121][docs-zh]Translate "Extension Points" and "Full Stack Ex…

b3dcafa

…ample" in "User-defined Sources & Sinks" page

[FLINK-29123][k8s] Dynamic paramters are not pushed to working with k…

c376430

…ubernetes

[FLINK-28751][table] Improve the performance of JSON functions with j…

2220f24

…son path (apache#20397)

[FLINK-29087][connector/jdbc] Change dependencies order to avoid comp…

d55be68

…ile failure while running in idea This closes apache#20670.

[FLINK-29005][parquet] Parquet row type reader should not return null…

00f5852

… value when some child fields is null This closes apache#20616

[FLINK-26474][hive] Fold exprNode to fix the issue of failing to call…

fc5730a

… some hive udf required constant parameters with implicit constant passed This closes apache#18975

[FLINK-29019][doc][parquet] Updating parquet format document that sup…

2798565

…port read complex type

[FLINK-29130][state] Correct the doc description of local-recovery

581e1fe

[FLINK-24718] Update Avro dependency to 1.11.1

c0f0807

[hotfix][docs] List FileSystem also as a source

a377d0d

[hotfix][tests] Replace deprecated AbstractThrowableAssert#getRootCause

4411978

[FLINK-28938][hive] Fix HiveServer2 Endpoint can not set variable cor…

6630ce7

…rectly

[FLINK-28938][hive] Improve error messages for unsupported interfaces

549d432

[hotfix][doc]Update doc of REST API in runtime module

8ccbf3b

[FLINK-28974][sql-gateway]Add doc for the API and Option of sql gatew…

5590205

…ay rest endpoint This closes apache#20622

[FLINK-29138][table-planner] fix project can not be pushed into looku…

2e2fb24

…p source This closes apache#20717

[FLINK-29053] Hybrid shuffle has concurrent modification of buffer wh…

8b8245b

…en compression is enabled This closes apache#20647

[FLINK-29161][tests] Fix the built docker image name

a9c94e0

[hotfix][docs] Fix typo

87cbea3

[hotfix][csv][javadoc] Fix reference

5c8db12

[FLINK-28948][table] Increase test coverage for lookup full caching +…

20e00fd

… fix metrics

dannycranmer and others added 30 commits September 6, 2022 16:58

[FLINK-29205][connectors/kinesis] Passthrough use config to HTTP clie…

cb32dad

…nt when constructing Async Client for Kinesis EFO

[hotfix][docs][release] Update the building branch in workflow

016fa93

This closes apache#20748.

[fixup][table-planner] Using user classloader instead of thread conte…

5235461

…xt classloader

[FLINK-29074][Connectors/JDBC] Fix ClassNotFound exception when using…

e025788

… jdbc connector by add jar syntax This closes apache#20707

[FLINK-29096][table] Keep backward compatibility of JdbcCatalog const…

481ed78

…ructor

[FLINK-14101][jdbc-connector] Support SQLServer dialect in the jdbc c…

a8ca381

…onnector. This closes apache#20235.

[FLINK-29210][Docs][SQL Client] Add required parameter when running S…

f27779b

…QL Client via Docker Compose

[FLINK-29210][Docs][SQL Client] Copy English Docker documentation to …

442ab0c

…Chinese documentation to bring them back in sync

[FLINK-28860][datastream] Cache consumption in stream mode recompute …

a901911

…result in case of cache miss

[hotfix][datastream] Fix cache invalidate with remote session cluster

3d05f27

[FLINK-28860][runtime] JobMaster wait for partition promote before close

b7dd426

[FLINK-29118][sql-gateway][hive] Remove default GenericInMemoryCatalo…

833e7ff

…g in the HiveServer2 Endpoint when openSession This closes apache#20714

[FLINK-29184][sql-gateway] Close resource manager when closing Session

bd1d973

[FLINK-29132][rest] Cleanup subtask attempt metrics according to the …

26eeabf

…JobDetails to avoid memory leak. This closes apache#20733.

[hotfix][tests][table-planner] Add two more cases to verify the confl…

264eff6

…ict of multiple LOOKUP hints This closes apache#20743

[FLINK-28787][table-planner] Rename getUniqueKeys to getUpsertKeys in…

347316e

… CommonPhysicalJoin This closes apache#20763

[hotfix] Fix the missing comma in create_snapshot_branch.sh

10a934f

[FLINK-29207][connector/pulsar] Fix Pulsar message eventTime may be i…

1e6897c

…ncorrectly set to a negative number (apache#20765)

flink RPC 调用原理

0969a96

flink 图

9216aad

flink network stack

5ac6bbf

flink checkpoint源码流程分析

a57841a

flink JobGraph 在 JobManager（JobMaster）上转换为 ExecutionGraph 的过程源码解析

d682690

flink RM & TM & JobMaster 调度&上报&管理Slot资源等流程源码详解

cb9639a

字节跳动 Flink 基于 Slot 的资源管理实践

503de6a

flink Task之间network数据交换源码解析：RP/IG消费生产模式以及部分调度相关源码内容

0e3a137

flink 新特性之网络缓存消胀（Network Buffer Debloating）机制

ec9653a

flink ck barrier对齐和异步IO部分源码解析

6bea388

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink learing #1

Flink learing #1

wenbingshen commented Jan 4, 2024

Flink learing #1

Are you sure you want to change the base?

Flink learing #1

Conversation

wenbingshen commented Jan 4, 2024

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation