-
-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to use Colaboratory's TPU for finetuning #3
Comments
According to what I see here, the 8x speed up is only if you have a batch_size of 8 or more, as the batches are distributed among the cores. However, if you're already using a batch_size of 2, the training speed should be about 4x if you change batch_size to 8, which is still a very nice speed-up. All the documentation I can find on using the TPUs seems to be using tensorflow's keras api, like this example, so the model might have to be converted to that. |
I do not use Workflows that use the TPU do use |
Hi all! I've been playing with the idea of making this run in a Colaboratory TPU. So far, no luck, but I seem to be really close. I have a mess in the code right now -- my approach was first to make it work and then simplify and clean up. I'm currently stuck at the point of loading the initialized model so it can be finetuned. It will complain that the local file system scheme is not implemented. I understand that the TPU is instructed (through This is where I'm currently at: https://colab.research.google.com/drive/1_WVxlRgUjfAVZ5im2LaBoQcA5XnpU0K6 A few notes on that code:
I might drop out of this effort for a couple of days (weeks?) unless someone has a quick approach I might take. Regardless, if someone benefits from my advances, it has been worth it! For reference, this is the error: Full error and stack trace
|
@AlphaGit Furthermore, you do not need to do that. |
@Skylion007 Hi! Thanks for the response. My problem doesn't seem to be really saving the weights, but rather loading them in the first place. (At least... not so far.) Yours is an interesting approach. I initially moved out of it because loading the model in memory to later on transfer to the TPU meant moving around 500MBs of data (word embeddings and all). But it might not be that bad, the colaboratory should be prepared to deal with bigger datasets all the time, right? Regarding, tf.saver, I believe it is really tied to a filesystem. At least, that's what prevented me from using it against GCS... but I might have done it wrong. This is what I'm stuck on right now. |
Loading them is pretty straight forward. I almost have a working solution using https://github.com/CyberZHG/keras-gpt-2 but I still need to debug some keras vs tf.keras issues. I did get it working with a fixed input shape so it is possible to at least load it on the TPU, but I need to fix the input and output layers. |
@AlphaGit |
This alone will be the single-biggest improvement for gpt-2-simple.
= 16x training speed
Unfortunately documentation for using Colaboratory's TPU is a bit messy.
The text was updated successfully, but these errors were encountered: