Adding new parties and retraining #52

Enrique-Marmol · 2021-02-08T17:21:58Z

Hi! I was wondering about what IBMFL can do after a training process. When aggregator.start_training() finish is it possible to make start_training() again and the model leverage what was make before? And between start_training() and start_training() is it possible to add new parties and quit others? And last one, between start_training() and start_training() is it possible to change the data of one party? I mean, maybe that party during the training phase has more data and what to make the next training phases with that data.

Regards and thanks in advance.

Yi-Zoey · 2021-02-08T21:33:36Z

Hi, thanks for using IBM Federated Learning Library!

is it possible to make start_training() again and the model leverage what was make before?

Yes, that's possible. You can enter TRAIN again once the global training is finished, the aggregator will continue to train for several rounds (depends on your configurations).

between start_training() and start_training() is it possible to add new parties and quit others?

Parties can drop any time during the training, as long as the quorum is met, the training process will keep going. And new parties can join between each TRAIN command. Once the new party starts, use the REGISTER command to join the training.

between start_training() and start_training() is it possible to change the data of one party?

This answer really depends on how the party loads its datasets. If the data handler looks for and loads a local dataset each time when IBM FL local training module uses get_data() to access the local training dataset, ideally the party can utilize new data in the current training round, since the data handler will reload the dataset instead of looking for it in the memory.

Let us know if you have further questions.

Enrique-Marmol · 2021-02-18T18:35:39Z

Hi, I have a problem related with what I mentioned above. I registered 10 parties and I started the training, everything went perfect. Then, I chose one party and made STOP. After that I made start_training() again to train the model with this 9 parties leveraging the work made before, I mean, I did not make the aggregator to stop. However, I had an error that say that it could not connect with one party and above it printed:

It has the response of the 9 partied that left, but it has registered the 10 of the begining. Is there a way of solving this issue?

Thanks in advance

Yi-Zoey · 2021-02-18T23:44:41Z

Hi, I see you are using IBM FL 1.0.2, can you try to upgrade the version to 1.0.3 and see if the issue still exists? Thanks.

Yi-Zoey · 2021-02-18T23:54:44Z

The logic for IBM FL is that the aggregator will wait until max_timeout for everyone to reply back even after the quorum is met. Therefore, if you didn't set a max_timeout variable, the aggregator will still wait for all registered parties to reply back since it uses the max_timeout to identify dropout parties.

Enrique-Marmol · 2021-03-01T18:23:10Z

Hi again! I uploaded to version 1.0.3 and the problem persists. Reading the previous responses I think I did not explain myself well. What I am trying to say is that I start with 10 parties, I register all of them and then I make the aggragator to start training. Then, the training finished successfully. Now, for instance, I make party 7 to disconect, party7.stop(), and it disconnect successfully. Party 7 disconnect after the training, not during the training. Finally, I make the aggregator to start training again but this time with the 9 remaining parties and now it is when the issue come up and the training cannot be completed. I would like to know if this can be made or at the moment this cannot be done.
Thanks in advance.

Yi-Zoey · 2021-03-22T21:32:55Z

Hi, can you share the config_agg.yml you are using? Just want to check the setting you select for the quorum check.

Yi-Zoey added the question Further information is requested label Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new parties and retraining #52

Adding new parties and retraining #52

Enrique-Marmol commented Feb 8, 2021

Yi-Zoey commented Feb 8, 2021 •

edited

Loading

Enrique-Marmol commented Feb 18, 2021

Yi-Zoey commented Feb 18, 2021

Yi-Zoey commented Feb 18, 2021

Enrique-Marmol commented Mar 1, 2021 •

edited

Loading

Yi-Zoey commented Mar 22, 2021

Adding new parties and retraining #52

Adding new parties and retraining #52

Comments

Enrique-Marmol commented Feb 8, 2021

Yi-Zoey commented Feb 8, 2021 • edited Loading

Enrique-Marmol commented Feb 18, 2021

Yi-Zoey commented Feb 18, 2021

Yi-Zoey commented Feb 18, 2021

Enrique-Marmol commented Mar 1, 2021 • edited Loading

Yi-Zoey commented Mar 22, 2021

Yi-Zoey commented Feb 8, 2021 •

edited

Loading

Enrique-Marmol commented Mar 1, 2021 •

edited

Loading