-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API #25007
Changes from all commits
1957e82
857552a
d13037f
8f5fb60
e17c7ea
3f0c131
f982df7
6891197
7b44ed2
df75f1f
a8558af
806d7bb
3167030
70f59db
3083d86
4c3d692
2421c92
982f207
594d1e2
66aae91
8b432f9
9f597dd
86c1829
a7885ae
9893c6c
cd897e7
9f17b9b
e53a001
56fa450
b8b7b8d
2d29404
06ea01a
7dceec9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.shuffle.api; | ||
|
||
import org.apache.spark.annotation.Private; | ||
|
||
/** | ||
* :: Private :: | ||
* An interface for plugging in modules for storing and reading temporary shuffle data. | ||
* <p> | ||
* This is the root of a plugin system for storing shuffle bytes to arbitrary storage | ||
* backends in the sort-based shuffle algorithm implemented by the | ||
* {@link org.apache.spark.shuffle.sort.SortShuffleManager}. If another shuffle algorithm is | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be good to check how these links render in the final documentation, since as I mentioned that package is removed from public docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
* needed instead of sort-based shuffle, one should implement | ||
* {@link org.apache.spark.shuffle.ShuffleManager} instead. | ||
* <p> | ||
* A single instance of this module is loaded per process in the Spark application. | ||
* The default implementation reads and writes shuffle data from the local disks of | ||
* the executor, and is the implementation of shuffle file storage that has remained | ||
* consistent throughout most of Spark's history. | ||
* <p> | ||
* Alternative implementations of shuffle data storage can be loaded via setting | ||
* <code>spark.shuffle.sort.io.plugin.class</code>. | ||
* @since 3.0.0 | ||
*/ | ||
@Private | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question from SPARK-28568. Is it an API or not? Looks so given the PR description.
So There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @HyukjinKwon it'll all eventually be Looks like we forgot to file a follow up jira about that, I just filed https://issues.apache.org/jira/browse/SPARK-28592 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, okie. That's good. |
||
public interface ShuffleDataIO { | ||
This comment was marked as resolved.
Sorry, something went wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add some JavaDoc explaining the difference between the ShuffleManager plugin and this plugin system. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A question that may be naive, why do we choose Java over Scala? I see Spark classes except the ones dealing with underlying memory write in Scala... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a public interface, it is better to use Java, so that other users can implement with Java, Scala or other jvm languages. If we defined the APIs using Scala, mostly user can only use Scala to implement it, unless it is well designed to avoid Scala specific features, so that it can be leveraged by Java. |
||
|
||
/** | ||
* Called once on executor processes to bootstrap the shuffle data storage modules that | ||
* are only invoked on the executors. | ||
*/ | ||
ShuffleExecutorComponents executor(); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.shuffle.api; | ||
|
||
import java.io.IOException; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: white space between different import groups. |
||
|
||
import org.apache.spark.annotation.Private; | ||
|
||
/** | ||
* :: Private :: | ||
* An interface for building shuffle support for Executors. | ||
* | ||
* @since 3.0.0 | ||
*/ | ||
@Private | ||
public interface ShuffleExecutorComponents { | ||
|
||
/** | ||
* Called once per executor to bootstrap this module with state that is specific to | ||
* that executor, specifically the application ID and executor ID. | ||
*/ | ||
void initializeExecutor(String appId, String execId); | ||
|
||
/** | ||
* Called once per map task to create a writer that will be responsible for persisting all the | ||
* partitioned bytes written by that map task. | ||
* @param shuffleId Unique identifier for the shuffle the map task is a part of | ||
* @param mapId Within the shuffle, the identifier of the map task | ||
* @param mapTaskAttemptId Identifier of the task attempt. Multiple attempts of the same map task | ||
* with the same (shuffleId, mapId) pair can be distinguished by the | ||
* different values of mapTaskAttemptId. | ||
* @param numPartitions The number of partitions that will be written by the map task. Some of | ||
* these partitions may be empty. | ||
*/ | ||
ShuffleMapOutputWriter createMapOutputWriter( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. During the fix of SPARK-25341, we need to pass more param into shuffle writer and shuffle block resolver, give #25361 for the quick API change review. Thanks :) |
||
int shuffleId, | ||
int mapId, | ||
long mapTaskAttemptId, | ||
int numPartitions) throws IOException; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.shuffle.api; | ||
|
||
import java.io.IOException; | ||
|
||
import org.apache.spark.annotation.Private; | ||
|
||
/** | ||
* :: Private :: | ||
* A top-level writer that returns child writers for persisting the output of a map task, | ||
* and then commits all of the writes as one atomic operation. | ||
* | ||
* @since 3.0.0 | ||
*/ | ||
@Private | ||
public interface ShuffleMapOutputWriter { | ||
|
||
/** | ||
* Creates a writer that can open an output stream to persist bytes targeted for a given reduce | ||
* partition id. | ||
* <p> | ||
This comment was marked as resolved.
Sorry, something went wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we want these to have line breaks in the generated HTML. But I'm not sure what the stance is across the rest of the codebase - we can remove these if pretty-formatting with line breaks isn't necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, I think it is needed for javadoc, though its not needed for scaladoc. IMO its worth keeping them. https://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#format |
||
* The chunk corresponds to bytes in the given reduce partition. This will not be called twice | ||
* for the same partition within any given map task. The partition identifier will be in the | ||
* range of precisely 0 (inclusive) to numPartitions (exclusive), where numPartitions was | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we mention There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made the docs more thorough, indicating ordering and also indicating how there's no guarantee that this will be called for an empty partition. |
||
* provided upon the creation of this map output writer via | ||
* {@link ShuffleExecutorComponents#createMapOutputWriter(int, int, long, int)}. | ||
* <p> | ||
* Calls to this method will be invoked with monotonically increasing reducePartitionIds; each | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How useful is this? I think we can make Spark shuffle more flexible if we don't guarantee this. Do you have a concrete example of how an implementation can leverage this guarantee? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. spark's existing implementation makes this assumption. The index & data file assume they are in sequential order. though it would be really easy to change the index format to allow for the order to random (just need to include a start and end, rather having the end be implicit). |
||
* call to this method will be called with a reducePartitionId that is strictly greater than | ||
* the reducePartitionIds given to any previous call to this method. This method is not | ||
* guaranteed to be called for every partition id in the above described range. In particular, | ||
* no guarantees are made as to whether or not this method will be called for empty partitions. | ||
*/ | ||
ShufflePartitionWriter getPartitionWriter(int reducePartitionId) throws IOException; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why "calls to this method will be invoked with monotonically increasing reducePartitionIds"? This may cause potential issues in future and cause burden on implementation. for example, if people want to implement multiple partition writers and write shuffle data in parallel. It cannot guarantee monotonically increasing reducePartitionIds. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. People using this will be using it with |
||
|
||
/** | ||
* Commits the writes done by all partition writers returned by all calls to this object's | ||
* {@link #getPartitionWriter(int)}. | ||
* <p> | ||
* This should ensure that the writes conducted by this module's partition writers are | ||
* available to downstream reduce tasks. If this method throws any exception, this module's | ||
* {@link #abort(Throwable)} method will be invoked before propagating the exception. | ||
* <p> | ||
* This can also close any resources and clean up temporary state if necessary. | ||
*/ | ||
void commitAllPartitions() throws IOException; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gczsjdy any reason to return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We ended up adjusting the API for shuffle locations. This will come later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe the SPIP has the latest API. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jerryshao @mccheah has explained well, because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something to that effect yeah - it also has implications on the reader API, but these are concerns to be addressed in subsequent patches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it. : ) |
||
|
||
/** | ||
* Abort all of the writes done by any writers returned by {@link #getPartitionWriter(int)}. | ||
* <p> | ||
* This should invalidate the results of writing bytes. This can also close any resources and | ||
* clean up temporary state if necessary. | ||
*/ | ||
void abort(Throwable error) throws IOException; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.shuffle.api; | ||
|
||
import java.io.IOException; | ||
import java.util.Optional; | ||
import java.io.OutputStream; | ||
|
||
import org.apache.spark.annotation.Private; | ||
|
||
/** | ||
* :: Private :: | ||
* An interface for opening streams to persist partition bytes to a backing data store. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd add that this stores bytes for one (mapper, reducer) pair, which corresponds to one ShuffleBlock |
||
* <p> | ||
* This writer stores bytes for one (mapper, reducer) pair, corresponding to one shuffle | ||
* block. | ||
* | ||
* @since 3.0.0 | ||
*/ | ||
@Private | ||
public interface ShufflePartitionWriter { | ||
|
||
/** | ||
* Open and return an {@link OutputStream} that can write bytes to the underlying | ||
* data store. | ||
* <p> | ||
* This method will only be called once on this partition writer in the map task, to write the | ||
* bytes to the partition. The output stream will only be used to write the bytes for this | ||
* partition. The map task closes this output stream upon writing all the bytes for this | ||
* block, or if the write fails for any reason. | ||
* <p> | ||
* Implementations that intend on combining the bytes for all the partitions written by this | ||
* map task should reuse the same OutputStream instance across all the partition writers provided | ||
* by the parent {@link ShuffleMapOutputWriter}. If one does so, ensure that | ||
* {@link OutputStream#close()} does not close the resource, since it will be reused across | ||
* partition writes. The underlying resources should be cleaned up in | ||
* {@link ShuffleMapOutputWriter#commitAllPartitions()} and | ||
* {@link ShuffleMapOutputWriter#abort(Throwable)}. | ||
*/ | ||
OutputStream openStream() throws IOException; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to say more here about the lifecycle of this outputstream. In particular, that (a) the framework will only keep one of these outputstreams open at a time per map task (b) the framework ensures that the outputstreams are closed, even if there are any exceptions and (c) if an individual implementation wants to keep all the output for one map task together (like the index / data file organization of local shuffle output), then they may want to reuse the the real underlying outputstream across all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added more docs. |
||
|
||
/** | ||
* Opens and returns a {@link WritableByteChannelWrapper} for transferring bytes from | ||
* input byte channels to the underlying shuffle data store. | ||
* <p> | ||
* This method will only be called once on this partition writer in the map task, to write the | ||
* bytes to the partition. The channel will only be used to write the bytes for this | ||
* partition. The map task closes this channel upon writing all the bytes for this | ||
* block, or if the write fails for any reason. | ||
* <p> | ||
* Implementations that intend on combining the bytes for all the partitions written by this | ||
* map task should reuse the same channel instance across all the partition writers provided | ||
* by the parent {@link ShuffleMapOutputWriter}. If one does so, ensure that | ||
* {@link WritableByteChannelWrapper#close()} does not close the resource, since the channel | ||
* will be reused across partition writes. The underlying resources should be cleaned up in | ||
* {@link ShuffleMapOutputWriter#commitAllPartitions()} and | ||
* {@link ShuffleMapOutputWriter#abort(Throwable)}. | ||
* <p> | ||
* This method is primarily for advanced optimizations where bytes can be copied from the input | ||
* spill files to the output channel without copying data into memory. If such optimizations are | ||
* not supported, the implementation should return {@link Optional#empty()}. By default, the | ||
* implementation returns {@link Optional#empty()}. | ||
* <p> | ||
* Note that the returned {@link WritableByteChannelWrapper} itself is closed, but not the | ||
* underlying channel that is returned by {@link WritableByteChannelWrapper#channel()}. Ensure | ||
* that the underlying channel is cleaned up in {@link WritableByteChannelWrapper#close()}, | ||
* {@link ShuffleMapOutputWriter#commitAllPartitions()}, or | ||
* {@link ShuffleMapOutputWriter#abort(Throwable)}. | ||
*/ | ||
default Optional<WritableByteChannelWrapper> openChannelWrapper() throws IOException { | ||
return Optional.empty(); | ||
} | ||
|
||
/** | ||
* Returns the number of bytes written either by this writer's output stream opened by | ||
* {@link #openStream()} or the byte channel opened by {@link #openChannelWrapper()}. | ||
* <p> | ||
* This can be different from the number of bytes given by the caller. For example, the | ||
* stream might compress or encrypt the bytes before persisting the data to the backing | ||
* data store. | ||
*/ | ||
long getNumBytesWritten(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This class delegates writing to OutputStream by openStream(). Will getNumBytesWritten() in this class access internal state inside that OutputStream? How about let OutputStream track the number of bytes written so this class does not need to access OutputStream? One possible solution is to add a subclass of OutputStream to track number of bytes. Something like existing TimeTrackingOutputStream class in Spark which extends OutputStream. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea is that if the implementation also supports creating a custom There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I also remember why we didn't attach it to the output stream - it's particularly because of the lifecycle. If we have an output stream for the partition that pads bytes upon closing the stream, it's unclear that one will continue to call methods on the output stream object after it has been closed. That's why we have the contract:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case, the OutputStream returned by openStream() is tightly coupled with ShufflePartitionWriter. Could we merge them together into one class, e.g.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I mean the OutputStream returned by openStream() is tightly coupled with ShufflePartitionWriter, thus suggest merging them together. for example, rename ShufflePartitionWriter to ShufflePartitionWriterStream which extends OutputStream: ShufflePartitionWriterStream extends OutputStream { In this case, user do not need to create a ShufflePartitionWriter and then call its openStream() method to get an OutputStream. Instead, user will create ShufflePartitionWriterStream, which is already an OutputStream. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But again, do we call |
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.shuffle.api; | ||
|
||
import java.io.Closeable; | ||
import java.nio.channels.WritableByteChannel; | ||
|
||
import org.apache.spark.annotation.Private; | ||
|
||
/** | ||
* :: Private :: | ||
* | ||
* A thin wrapper around a {@link WritableByteChannel}. | ||
* <p> | ||
* This is primarily provided for the local disk shuffle implementation to provide a | ||
* {@link java.nio.channels.FileChannel} that keeps the channel open across partition writes. | ||
* | ||
* @since 3.0.0 | ||
*/ | ||
@Private | ||
public interface WritableByteChannelWrapper extends Closeable { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we only need a wrapper for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to return the This has come up in #25007 (comment) and palantir#535 and especially palantir#535 (comment). Given that this has come up as a question a number of times, I wonder if there's a better way we can make the semantics more accessible. I don't see a way to improve the architecture itself, but perhaps better documentation in the right places explaining why we went about this the way we did is warranted. |
||
|
||
/** | ||
* The underlying channel to write bytes into. | ||
*/ | ||
WritableByteChannel channel(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw it was recommended that this package be used instead of
o.a.s.api
. The problem is thatorg.apache.spark.shuffle
is explicit removed from the documentation, and we want this to (eventually) be documented. So either need to go back to the old package, or tweakSparkBuild.scala
to not filter this sub-package...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about conflicts with the other kinds of APIs in the
org.apache.spark.api.*
namespace, particularly because these are all related to other language bindings, e.g.org.apache.spark.api.java.function.Function
,org.apache.spark.api.r.RRDD
. Let's modifySparkBuild.scala
instead - I'll look into what that would require. Can that be done in a follow-up PR?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Better be proactive and file a bug to make these interfaces non-Private and at the same time make sure they're showing up properly in documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/SPARK-28568