Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fractional {job,task}manager.heap.size crashes Flink #97

Closed
JonnyIncognito opened this issue Sep 4, 2019 · 8 comments
Closed

Fractional {job,task}manager.heap.size crashes Flink #97

JonnyIncognito opened this issue Sep 4, 2019 · 8 comments

Comments

@JonnyIncognito
Copy link
Contributor

The offHeapMemoryFraction can cause fractional values to be written to flink-conf.yaml, which crashes Flink on startup:

/opt/flink/bin/config.sh: line 261: [ERROR] >> 20: syntax error: operand expected (error token is "[ERROR] >> 20")
[ERROR] Configured JobManager memory size is not a valid value. Please set 'jobmanager.heap.size' in flink-conf.yaml.

For example, the following FlinkApplication...

  jobManagerConfig:
    resources:
      requests:
        memory: "4Gi"
        cpu: "1.0"
    offHeapMemoryFraction: 0.15
    replicas: 1
  taskManagerConfig:
    taskSlots: 1
    resources:
      requests:
        memory: "4Gi"
        cpu: "1.0"
    offHeapMemoryFraction: 0.3

... causes the following to be generated:

jobmanager.heap.size: 3481.6
taskmanager.heap.size: 2867.2

Above is using v0.2.0

@anandswaminathan
Copy link
Contributor

@JonnyIncognito

Aah yes, it is is float https://github.com/lyft/flinkk8soperator/blob/master/pkg/controller/flink/config.go#L87

Want to contribute a fix ?

@JonnyIncognito
Copy link
Contributor Author

Sure! I'm a Go newbie but seems like a nice simple task to start with.

@anandswaminathan
Copy link
Contributor

Also we have a slack channel - slack

Feel free to join.

@JonnyIncognito
Copy link
Contributor Author

Addition to this: I noticed that the memory characteristics weren't correct in our metrics, which eventually led me to realise that the two config parameters are using the wrong units. Same generated example as above:

jobmanager.heap.size: 3481.6
taskmanager.heap.size: 2867.2

The xyz.heap.size config parameters without a unit suffix are interpreted as bytes, not the FlinkK8sOperator's intended MiB. This leads to taskmanager.sh skipping the -Xms...M, -Xmx...M and -XX:MaxDirectMemorySize=... JVM arguments. The generated config values should either have a "m" suffix (therefore string type instead of float64) so that config.sh:parseBytes() can interpret them correctly or the config keys xyz.heap.mb should be used instead.

@anandswaminathan
Copy link
Contributor

^ cc @mwylde

@mwylde
Copy link
Contributor

mwylde commented Sep 25, 2019

Sorry for the long delay, I have a fix out in #106. And thanks for the report!

mwylde added a commit that referenced this issue Sep 27, 2019
@mwylde
Copy link
Contributor

mwylde commented Sep 27, 2019

The fix has been merged. Thanks for reporting!

@mwylde mwylde closed this as completed Sep 27, 2019
@JonnyIncognito
Copy link
Contributor Author

That's great, many thanks! The power of open-source: I filed this fully expecting to and planning to fix it myself some time this coming month, but happy you guys got there first and it's fixed :) I'll upgrade our clusters as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants