Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about implementation details #2

Open
yuanhangtangle opened this issue Mar 28, 2023 · 1 comment
Open

questions about implementation details #2

yuanhangtangle opened this issue Mar 28, 2023 · 1 comment

Comments

@yuanhangtangle
Copy link

Thank you for your repo. I meet with some problems reading the codes and the paper.

Q1. In fd_waterbird.py, I can't find the code coresponding to $P(R|X)$. It seems that you are simply dropping some elements with torch.distributions.binomial.Binomial rather than actually modeling $P(R|X)$. Then what is $P(R|X)$ for?

Q2. The codes following the drop mentioned in Q1 are quite confusing:

for jj in range(args.samples - 1):
      if add_n:
          binomial = torch.distributions.binomial.Binomial(
              probs=1 - p)
          fea = feature * binomial.sample(
              feature.size()).cuda() * (1.0 / (1 - p))
      else:
          fea = feature
      logit_compose = logit_compose + classifier(
          fea, Xp[j * bs_m:(j + 1) * bs_m, :, :, :])  # TODO:

I seems that logit_compose = logit_compose + classifier(fea, Xp[j * bs_m:(j + 1) * bs_m, :, :, :]) is run for multiple times without any modification.

Q3. I can't find the variables representing Ni and Nj described in the paper (5.3.Experimental Settings: We set Nj = 256 and Ni = 10 for all experiments and denotes it as Ours). I seems that codes mentioned in Q1 and A2 are the key to the implementation, but I don't know how they relate to the front-door formula derived in the paper.

Expecting your reply. Thanks in advance.

@ChengzhiCU
Copy link
Contributor

ChengzhiCU commented Mar 29, 2023

Thank you for your questions.

Q1: P(R|X) is a neural network. It can be pretrained VAE, then R is the latent space in VAE. For ResNet, R is the second to last layers feature. Since R is deterministic, we add random noise to it to make it ``probabilistic'', yet in ResNet, there is no probabilistic guarantee. VAE is probabilistic modeling, so that R is precise.

Q2, Q3: j is changing, which corresponding to N_j. jj is to perform the Monte Carlo sampling on P(R|X), which is N_i. Marginalizing over j (N_j) is important, which is a key factor in the front-door.

Let me know if you have other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants