-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More "OpenAI Blog Post" Training | Depth 32 | Heads 8 | LR 5e-4 #86
Comments
@lucidrains btw, i've been going through the wandb.ai docs and found some nice extras you can add to train_dalle.py that will give you live updates on the transformer itself: config = wandb.config
config.depth = DEPTH
config.heads = HEADS
config.dim_head = DIM_HEAD
config.learning_rate = LEARNING_RATE
config.shuffle = SHUFFLE
config.resume = RESUME
config.batch_size = BATCH_SIZE
config.grad_clip_norm = GRAD_CLIP_NORM
config.reversible = REVERSIBLE
config.model_dim = MODEL_DIM
config.attn_types = ATTN_TYPES
wandb.init(project = PROJECT_NAME, resume = RESUME)
wandb.watch(dalle) # Updates a graph of gradients on wandb as soon as your model changes In particular, that very last line is actually all you need to add. But attaching all the parameters in the way I did also allows it to track those better and you can more easily create hyperparameter sweeps from existing projects when you do. |
@afiaka87 ohh got it! i'm circling back to DALL-E this week for some final instrumentations :) i'll be sure to add that! 🙏 |
@lucidrains Awesome, looking forward to it! Thanks for patching up big-sleep/deep-daze btw. I tried but I'm so distracted with this project now lol. |
@afiaka87 yes, arguably getting a DALL-E model trained and released would be bigger than either big sleep or deep daze! |
thanks for doing this! it demonstrates that reversibility does work :) |
@lucidrains For sure! I'm trying to be as open about my training, code, results, etc. but I'm not seeing much else of that here. I'm aware it's prohibitively expensive for most though and I'm privileged to be able to run Depth=32 for a day or two. At any rate, looking forward to 1024 token model from Germany! I know it's in there currently but I'm still having some trouble with it last I checked. All in due time. |
There should be system usage info in the graphs on wandb.ai, but yeah it does what it says on the label, lol. You definitely trade time for space. But, that whole training session never went above 16 GiB of VRAM. So at least people can use colab! |
@afiaka87 great to know! also, do log the issue with the VQ-GAN VAE and i'll be sure to fix it this week. It seems to be working on my end, but I haven't tried testing it from a fresh install |
@lucidrains one last thing, but the "image masking" feature is used pretty thoroughly in this dataset and they even have the image used for the mask and everything. Let me know as soon as that feature is implemented as I would love to use those as a baseline for that. |
@afiaka87 is that the feature where they have half an image and have it complete the other half? |
@lucidrains Yes. "The exact same cat on the top {insert style qualifer here} on the bottom." style ones. They're passing the top half in, as well as a prompt that acknowledges both pictures and presumably forcing the top half to stay the same while it trains. |
@afiaka87 yup, i can build that :) |
Great let me know asap. The zero-shot style transfer stuff is so cool to me. |
Agree! Really awesome work of lucidrains for trying to replicate such an awesome tool like DALL-E! If we only could collaborate in a more efficient way - somehow like in the blockchain, where a few people improve the DALL-E and the best get chosen after 2 days, gets distributed again and a new search for better optimization begins... I think your hyperparameter session is a great step forward @afiaka87 ! I will have my big system running in a week, so i hope to contribute then in a more significant way! By the way, the open images V6 dataset (https://storage.googleapis.com/openimages/web/download.html) has "localized" narratives, which might fit perfectly for the Dall-E for training! Maybe I will generate a downsampled version (256x256px) with captions like in the DALL-E format required, that would speed up search for training dataset and could improve collaborations. |
@robvanvolt Yep that's a perfect dataset. I found this dataloader here. https://github.com/google/localized-narratives/blob/master/localized_narratives.py And the downloader for the images: You should be able to modify the DataLoader to load the correct image for the given localized narrative somewhat easily. This would also lend itself well to Weights and Biases artifacts (you just map urls to things and it downloads and caches them for you, pinning versions if things change). Let me know if you get started on this and need any help. I think this would produce a great result! |
I went ahead and downloaded all 500,000 of all of their images with "localized annotations". I'm training currently! The download is not for the faint of heart though. Winds up being 169 GiB of data. Anyway, I can at least share the proper structure for the "*.txt" files as well as the "file_ids.txt" list of of image ids to download. wget https://www.dropbox.com/s/3s0saz480hlg651/ids_to_download.txt
wget https://www.dropbox.com/s/ni95in1k7wpetso/captions.tar.gz # contains structure for localized annotations. Plop this folder next to the folder you put your images in.
tar -cf captions.tar.gz --directory=~/project_name/captions . |
Thanks a lot! However, I could not download the captions.tar.gz from your dropbox (maybe the link is broken, since ids_to_download.txt is fine to download). I wonder how you reorganized the captions from the annotations. Did you use classname as the caption of the image? Thanks again! |
@Jinglei5 Hm, I'll see if I can fix that. Unfortunately my internet has just gone out halfway through training 🙄. On my phone till it's back up so it may be a bit. |
Hm, dropbox is insisting that I've set that file to be publically shared. Wouldy ou mind trying again with this?
You'll have to rename the file as it will include the ?dl=0 bit, but that's the only thing I can think of. If that still doesnt work, i'll host it elsewhere. @Jinglei5 as for how i reorganized the captions, the current DataLoader literally just expects every single unique png in your folder to have a respectively named txt that contains its text descriptions. If you go to the "localized annotations" page, you'll find a .jsonl file containing a mapping of each text phrase to image ids. The rest is just some python scripting to create a bunch of files with the same names as your images and fill them with the correct text descriptions. Here's my copy of the Probably best to find the original again though. I'll be back with an edit. |
Just a general heads up though - these captions aren't great. Due to the ability to use a mouse to "tell the dataset" where in the image they were referring to, they often leave out explicit directions knowing that the information will be in there. For instance:
or
The captions not only contain pretty glaring grammar mistakes but the information about the location is also missing from these prompts because the annotater (labeler? what do we call that?) knows that the computer is getting that info from their mouse. |
It works this time! Thanks! |
@Jinglei5 I'm gonna try mixing it with COCO2018 to see if it can at least get an idea of what a regular prompt might look like. |
@Jinglei5 also currently in the (very lengthy) process of converting all of these to 256px jpegs so I can actually move them around a bit. Do you have an existing workflow for that? Right now I'm just using imagemagick |
Hm the annotations looked pretty solid in the first place, but we will see how the grammar mistakes and the bad orientation get handled... A few other interesting points: Dall-E was trained with redundancy, e.g.
So this shouldn't be a problem, as i previously thought.
Incredible computation power was used by Open-AI - this will be tough to optimize to get near the results of Open-AI...
Also, the dataset collected is insane - 250 million!
Wikipedia might be another solid source for text-image pairs. Also, we might need to establish a better filter that we all use for training:
And finally:
This might also be important, as i've seen a lot of images in different aspect ratios. On the other hand, we might have a better / faster transformer with 1024 VQGAN, which might speed up things a little bit. |
Sorry, I don't have the workflow. I just sampled 10,000 of them to feed the model directly for a trial right now. >< |
@robvanvolt Here's some early results from training on that dataset by the way. I think we should definitely clean it up with the info from OpenAI. After about ~15k iters, I stopped training, added the COCO2018 dataset and resumed from there for another ~6K steps. |
I'll probably make another post once I'm finished training. I think i'm ultimately gonna go with a combination of all three datasets I've accrued so far: COCO2018, OpenImagesV6 and the ~1 million images from the OpenAI blog post. The size of openai's dataset is definitely discouraging though. @robvanvolt I'm assuming there's a relatively easy way to get captioned images from wikipedia, no? That's what I'm after next. |
Ha I do that as well. It is insane to me the number of things that just straight up break when you're dealing with lots of files. It's all good though, I managed to figure it out:
|
Yes, it seems that on the 20th of March, 2021, there might be a solution which fits exactly our needs:
|
Moving these to discussions. |
funny how fast things change, eh? |
Edit: Moved to discussions: #106
Hey, all. Some of you might know I'm practicing and learning about machine learning with dalle-pytorch and a dataset consisting of the images OpenAI presented in the DALLE blog post. I honestly dont have the money to train this whole dataset,
edit: this is no longer true. Using the 1024 VQGAN from the "Taming Transformers" research, it's now quite possible to train a full dataset of 1,000,000 image-text pairs and i'm doing just that. I hope to have it finished in about a week. I assume someone else will release a dalle-pytorch trained properly on COCO and other image sets before then, but if they dont, check here for updates.
Anway, it ran for ~36000 steps. As you can see it...still really likes mannequins. I'm considering removing them from the dataset. But also, you'll notice that the network has actually got a decent idea of the sort of general colors that belong in types of prompts.
Some Samples from Near the End of Training
Every Text-Image Reconstruction
https://wandb.ai/afiaka87/dalle_pytorch_live_training/reports/dalle-pytorch-Test-Run-2--Vmlldzo1MzM5MjQ
Deliverables (my train_dalle.py)
https://gist.github.com/afiaka87/850fb3cc48edde8a7ed4cb1ce53b6bd2
This has some code in it that actually manages to deal with truncated images via Try Catch. Apparently detecting a corrupted PNG is harder than P vs NP. PIL's
imverify()
function doesnt catch all of them. Python's built inimghdr
library doesn't catch all of them either. So you just sort of catch OSError and return an item further along. Works well enough.Parameters
Dataset Description
#61 (comment)
The text was updated successfully, but these errors were encountered: