-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYSTEMDS-3729] Add roll reorg operations in FED #2126
base: main
Are you sure you want to change the base?
Conversation
Here, I have outlined my implementation intentions, current progress, issues, and questions. I realize that the content might be a bit extensive, but after thinking it through on my own for a long time, I still find myself uncertain. Therefore, I kindly ask for guidance. 1. Original Implementation IntentI aimed to implement the Federated Roll Function as follows:
2. Current Implementation MethodDue to uncertainties in implementing step (3) as intended, I had to deviate and implement it as follows:
3. Problems and QuestionsProblem 1: Data Interference Issues in Split
|
I'm planning to reimplement the first problem using the For the second problem, I haven't found a suitable reference yet, but I'll study more to resolve it. |
sorry for the delay, ad 3: you can send a single CP rightindex instruction to the specific fed worker which we need to handle; ad 2: I would recommend to do something as follows:
|
Not at all. Thank you very much for your advice! I will try to proceed according to your suggestion. |
@mboehm7 I sincerely apologize for asking again, but I would like to inquire once more about problem 1. When performing a fedmap shift, a new fedmap element is added and shifted as shown below code. However, the newly added fedmap element points to a different range but refers to the same file (data), which seems to cause a problem. /*Common*/
FederationMap outFedMapID1 = mo1.getFedMapping().copyWithNewID(outID1);
// outFedMap=[<[(0,10),(25,10)], fileA_ID1>, <[(25,10),(50,10)], fileB_ID1>, <[(50,10),(75,10)], fileC_ID1>,
// <[(75,10),(100,10)], fileD_ID1>]
outFedMap.rollFedMap(shift=5)(;
// outFedMap=[<[(5,10),(30,10)], fileA_ID1>, <[(30,10),(55,10)], fileB_ID1>, <[(55,10),(80,10)], fileC_ID1>,
// <[(80,10),(100,10)], fileD_ID1>, <[(0,10),(5,10)], fileD_ID1>] Based on my understanding, there seem to be two possible approaches, but each has its own issue: /*CASE 1*/
Future<FederatedResponse>[] ffr = outFedMap.executeRollEnd(getTID(), true, fr, frEndID1, frStartID1);
// 1) frEnd: Slice only <[(75,10),(100,10)], fileD_ID0> into [(80,10),(100,10)], fileD_ID1>
// 2) ftStart: Slice only <[(75,10),(100,10)], fileD_ID0> into <[(0,10),(5,10)], fileD_ID1> (Overwrite)
MatrixObject out = ec.getMatrixObject(output);
out.setFedMapping(outFedMapID1); In CASE 1, fr is performed on two fedmap elements with the same ID, but since the file is the same, one of the fr results is overwritten and lost. /*CASE 2*/
Future<FederatedResponse>[] ffr1 = outFedMap.executeRollEnd(getTID(), true, frID1, frEndID1);
// 1) frEnd: Slice only <[(75,10),(100,10)], fileD_ID0> into [(80,10),(100,10)], fileD_ID1>
FederationMap outFedMapID2 = outFedMapID1.copyWithNewID(outID2);
Future<FederatedResponse>[] ffr2 = outFedMap.executeRollEnd(getTID(), true, frID2, frStartID2);
// 2) ftStart: Slice only <[(75,10),(100,10)], fileD_ID0> into <[(0,10),(5,10)], fileD_ID2>
MatrixObject out = ec.getMatrixObject(output);
out.setFedMapping(outFedMapID1); // lost <[(0,10),(5,10)], fileD_ID2> In CASE 2, fr is performed on two fedmap elements with different IDs, but elements with different IDs cannot be mapped together. |
yes we need to send two rightindex instructions to the federated worker and bind them to different IDs (which will automatically create temporary file names).
and then update the federated ranges accordingly at the coordinator. This requires a custom handling instead of |
Thank you for the quick response. I now have a clear understanding of the problem. I had hoped there might be an existing API to handle this, but it seems a new implementation is needed. I’ll give it my best shot and see it through! |
I am submitting this question through the PR to provide better context by including the relevant code. Currently, only FED using CP INST works correctly, while FED using SP INST encounters errors. The detailed implementation of
Question 1.Although step 2 was implemented, the result of Therefore, I modified the public List<Pair<FederatedRange, Future<FederatedResponse>>> requestFederatedData() {
// FederatedRequest request = new FederatedRequest(RequestType.GET_VAR, _ID); // previous
for(Pair<FederatedRange, FederatedData> e : _fedMap)
FederatedRequest request = new FederatedRequest(RequestType.GET_VAR, e.getValue().getVarID());
readResponses.add(Pair.of(e.getKey(), e.getValue().executeFederatedOperation(request)));
return readResponses;
} Question 2.The current issue with FED using SP INST is that it fails to read Since both CP and SP proceed using the same Below are the |
I haven't fully completed the implementation of the federated version yet, but I am creating this PR along with the code to ask some questions.