Fine-tune with a new object point cloud dataset #68

noahcao · 2024-03-25T23:18:42Z

I was trying to fine tune the LION from the weights of unconditional/all55/checkpoints/epoch_10999_iters_2100999.pt by the config file unconditional/all55/cfg.yml you provide.

My basic idea is to freeze the weights of VAE encoder and decoder and only fine tune the two priors by imitating the behavior in train_2prior.py. I did the necessary preprocessing of the data points that I have used the pre-trained VAE to make sure that the input point clouds can be reconstructed.

However, the training does not goes well. and the final generated results by demo.py is like:

I attach the key components of the code I write here:

timestep # 1->1000

def gain_x_t(timesteps, noise, x0):
  t_p, var_t_p, m_t_p = self.iw_quantities(timestep)  # as in utils/diffusion_continuous.py
  x_t = m_t_p * x0 + torch.sqrt(var_t_p) * noise
  return t_p, x_t

x_start_obj_g, x_start_obj_l = LION.VAE.encode_obj(obj_points) # VAE is the pre-trained LION VAE
x_start_obj_g, x_start_obj_l = x_start_obj_g.detach(), x_start_obj_l.detach()
noise['obj_g'] = torch.rand_like(x_start_obj_g)
noise['obj_l'] = torch.rand_like(x_start_obj_l)

t_p, x_t_obj_g = gain_x_t(timestep, noise['obj_g'], x_start_obj_g)
t_p, x_t_obj_l = gain_x_t(timestep, noise['obj_l'], x_start_obj_l)

global_cond = LION.VAE.global2style(x_start_obj_g).detach()
pred_noise_g = LION.priors[0](x_t_obj_g, t_p, x0=None, clip_feat=None)
pred_noise_l = LION.priors[1](x_t_obj_l, t_p, x0=None, condition_input=global_cond, clip_feat=None)

loss_g = F.mse_loss(pred_noise_g.view(B,-1), noise['obj_g'].view(B,-1), reduction='mean')
loss_l = F.mse_loss(pred_noise_l.view(B,-1), noise['obj_l'].view(B,-1), reduction='mean')

I can't find an obvious error in on my side and the training losses seem good to me. However, as shown above, the fine-tuned model can't generate valid point clouds....

Also, the dataset I am using contains different object categories and I use no clip feature as the condition. I assumed this should be fine. But can you also confirm this? It would be great if you can share any idea! Thanks

The text was updated successfully, but these errors were encountered:

ZENGXH · 2024-03-26T21:59:45Z

Yes, different object categories and no clip feature is fine. This checkpoint is trained on 55 classes and not using clip features. 1) Do you have the reconstruction figure to share with? in the training, the reconstruction figure plus the latent points will be logged as well. 2) For the sampling, is it failed from the early iteration or failed after some iterations? The sampled image from different iterations are logged in the training code as well. 3) perhaps worth tracing the `mixing_component` part of code here: https://github.com/nv-tlabs/LION/blob/main/trainers/train_2prior.py#L308C43-L309C118 4) by reconstruction is working do you mean x_start_obj_g, x_start_obj_l = LION.VAE.encode_obj(obj_points) # VAE is the pre-trained LION VAE and decode the latent with x_start_obj_g, x_start_obj_l could get the reconstructed shape?

…

On Mon, Mar 25, 2024 at 7:19 PM Jinkun Cao ***@***.***> wrote: Hi @ZENGXH <https://github.com/ZENGXH> , I was trying to fine tune the LION from the weights of unconditional/all55/checkpoints/epoch_10999_iters_2100999.pt by the config file unconditional/all55/cfg.yml you provide. My basic idea is to freeze the weights of VAE encoder and decoder and only fine tune the two priors by imitating the behavior in train_2prior.py <https://github.com/nv-tlabs/LION/blob/155a22f5c9f5ff4b2d15aed4e86fbdd8b4bf7ba1/trainers/train_2prior.py#L272>. I did the necessary preprocessing of the data points that I have used the pre-trained VAE to make sure that the input point clouds can be reconstructed. However, the training does not goes well. and the final generated results by demo.py is like: Screenshot.2024-03-25.at.19.05.40.png (view on web) <https://github.com/nv-tlabs/LION/assets/17743251/776dac2d-a7ef-41dc-ab42-33d9265cd9d1> I attach the key components of the code I write here: timestep # 1->1000 def gain_x_t(timesteps, noise, x0): t_p, var_t_p, m_t_p = self.iw_quantities(timestep) # as in utils/diffusion_continuous.py x_t = m_t_p * x0 + torch.sqrt(var_t_p) * noise return t_p, x_t x_start_obj_g, x_start_obj_l = LION.VAE.encode_obj(obj_points) # VAE is the pre-trained LION VAE x_start_obj_g, x_start_obj_l = x_start_obj_g.detach(), x_start_obj_l.detach() noise['obj_g'] = torch.rand_like(x_start_obj_g) noise['obj_l'] = torch.rand_like(x_start_obj_l) t_p, x_t_obj_g = gain_x_t(timestep, noise['obj_g'], x_start_obj_g) t_p, x_t_obj_l = gain_x_t(timestep, noise['obj_l'], x_start_obj_l) global_cond = LION.VAE.global2style(x_start_obj_g).detach() pred_noise_g = LION.priors[0](x_t_obj_g, t_p, x0=None, clip_feat=None) pred_noise_l = LION.priors[1](x_t_obj_l, t_p, x0=None, condition_input=global_cond, clip_feat=None) loss_g = F.mse_loss(pred_noise_g.view(B,-1), noise['obj_g'].view(B,-1), reduction='mean') loss_l = F.mse_loss(pred_noise_l.view(B,-1), noise['obj_l'].view(B,-1), reduction='mean') I can't find an obvious error in on my side and the training losses seem good to me. However, as shown above, the fine-tuned model can't generate valid point clouds.... Screenshot.2024-03-25.at.19.17.33.png (view on web) <https://github.com/nv-tlabs/LION/assets/17743251/5d56f963-ae95-405a-a287-0b093c80c6ac> Also, the dataset I am using contains different object categories and I use no clip feature as the condition. I assumed this should be fine. But can you also confirm this? It would be great if you can share any idea! Thanks — Reply to this email directly, view it on GitHub <#68>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCCY5KPRMCTE7W6IQCRMH3Y2CWGRAVCNFSM6AAAAABFH4PRSKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYDMOJRGQ2TSNY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

noahcao · 2024-03-26T22:26:07Z

Thanks for the prompt response!

Sure, this is a point cloud I sample from my dataset:

This is the reconstructed by using the function VAE.recont(points) link

Per my opinion, the reconstruction looks pretty good.

For your question

For the sampling, is it failed from the early iteration or failed after some iterations? The sampled image from different iterations are logged in the training code as well.

The generation quality becomes worse and worse very quickly. Before the training, I can sample some point clouds by the pre-trained weights of aall55 your provided:

noahcao · 2024-03-26T22:27:19Z

Also, I didn't use the mixing_component function for my training. Will this matter that lot?

noahcao · 2024-03-26T22:41:49Z

Maybe this is the reason? #66 (comment).

I used the pre-trained all55 weights for priors and VAE and then fine-tune over it with my own datasets. But I use the dataset definition as pointflow_datasets for the training.

ZENGXH · 2024-03-26T22:50:41Z

The same dataloader should work. But would be good to check if the loaded shape is aligned witht the validation point cloud of the all55 model.

mixing_component might matter. It seems to be the missing part when comparing to the train_2prior.py

noahcao mentioned this issue Mar 26, 2024

Evaluationg other dataset #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune with a new object point cloud dataset #68

Fine-tune with a new object point cloud dataset #68

noahcao commented Mar 25, 2024

ZENGXH commented Mar 26, 2024 via email

noahcao commented Mar 26, 2024

noahcao commented Mar 26, 2024

noahcao commented Mar 26, 2024

ZENGXH commented Mar 26, 2024

Fine-tune with a new object point cloud dataset #68

Fine-tune with a new object point cloud dataset #68

Comments

noahcao commented Mar 25, 2024

ZENGXH commented Mar 26, 2024 via email

noahcao commented Mar 26, 2024

noahcao commented Mar 26, 2024

noahcao commented Mar 26, 2024

ZENGXH commented Mar 26, 2024