Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM IPC Deserialization uses BinaryFormatter, which is now Deprecated for OWASP CWE #1131

Closed
cutecycle opened this issue Jan 6, 2023 · 3 comments · May be fixed by #1166
Closed

JVM IPC Deserialization uses BinaryFormatter, which is now Deprecated for OWASP CWE #1131

cutecycle opened this issue Jan 6, 2023 · 3 comments · May be fixed by #1166
Labels
bug Something isn't working

Comments

@cutecycle
Copy link
Contributor

cutecycle commented Jan 6, 2023

Describe the bug
We'll be getting SYSLIB0011 errors for the way the Broadcast, Worker, and RDD are formatting streams.

My current understanding:

  • .NET Spark communicates with a spark workers and makes broadcast variables JVM compatible by using a system of converting CLR objects to JVM objects.
  • This is all currently performed raw via BinaryFormatter.
  • Untrusted binary deserialization of data with undefined schema can result in RCE.
  • The guidance is "go to xml or json/bson", but we can't just do IPC with those formats?
  • In theory, protobuf/Arrow could allow us whatever level of control over the individual bytes may be necessary to continue byte-level IPC with the JVM
  • The missing piece is a definition of the stream format that would go over the socket?

Historical example: jni4net (also uses BinaryFormatter, but has a some level of struct definition:
*https://github.com/jni4net/jni4net/blob/ac2189c37253710e7b729797631419b0bf3b8559/jni4net.tested.n/src/generated/net/sf/jni4net/tested/JavaInstanceFields.generated.cs#L16

Notes:
Can we write directly to MemoryStream?
#1112 (comment)

"passing protobufs between Java and C using JNI":
https://medium.com/@dhaval.durve/passing-protobufs-between-java-and-native-c-code-using-jni-9808b60f6d2c

An equivalent of this CVE, and the object filter used to resolve it
https://security.snyk.io/vuln/SNYK-PYTHON-PYSPARK-3021140

https://github.com/apache/spark/pull/18166/files#diff-6a1d1601920af68466d7c30dc02170468abbe408138734c00d50d2ba1b81ba35R179

BinaryFormatter Guidance:
https://learn.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-security-guide

Arrow buffers:
https://arrow.apache.org/docs/python/ipc.html

BinaryFormatter Marshaller in ProtoBuf.net:
https://github.com/protobuf-net/protobuf-net.Grpc/blob/main/tests/protobuf-net.Grpc.Test.Integration/CustomMarshaller.cs

Protobuf scalar bytes for arbitrary byte lengths:
https://developers.google.com/protocol-buffers/docs/proto3#scalar

Wind down plan in dotnet:
dotnet/designs@bd0a066

"Is binary serialization inherently unsafe?"
https://stackoverflow.com/a/66825699

pyspark's implementation of this is based on py4j; they were going to use protobuf but opted for strings
https://github.com/py4j/py4j/blob/b4514ecd40ea121a35f9cf50bbf2ccea95354245/py4j-python/src/py4j/protocol.py#L9
https://github.com/py4j/py4j/blob/1f8a0b6dc216f16092d9c1b2556897eec8653a62/py4j-python/src/py4j/java_gateway.py#L1737

Though I will say... things seem to be BinaryFormatter all the way down?
https://github.com/protobuf-net/protobuf-net/search?q=binaryformatter

@cutecycle cutecycle added the bug Something isn't working label Jan 6, 2023
@cutecycle cutecycle changed the title [BUG]: JVM IPC Deserialization uses BinaryFormatter, which is now Deprecated for OWASP CWE JVM IPC Deserialization uses BinaryFormatter, which is now Deprecated for OWASP CWE Jan 6, 2023
@cutecycle
Copy link
Contributor Author

referencing #795

@arsdragonfly
Copy link

This is entirely pointless for UDFs. UDFs execute arbitrary code remotely on workers by design. The whole entire point of UDF on Spark is distributed RCE. Whatever formatter you use does not change the fact that the payload is, and has to be arbitrary computation.

@cutecycle
Copy link
Contributor Author

That makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants