-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backfilling experiment #1130
Comments
#1128 is relevant here because if we can match color properly, it should persist better in the subset learner, giving us a "something" to fill in when we say "this is probably an apple or an orange." @lichtefeld and I plan on using the learner's patterns to fill in details based on the experimental affordance learning "it's probably an X" outputs (#1129). |
At any point a perception graph is observed after an ObjectLearner has processed it all tldr: Problem 2 is already mostly-solved from Phase 1/2 work.
We could aim to tune a better threshold for confidence. If the shape confidence is below a configured threshold we fail and resort to backfilling as our default rather than asserting a low confidence answer. (Quasi precision vs. recall tuning.) |
I'd forgotten about that, that's good to know.
Hmm, maybe. Are we talking about the GNN's confidence outputs? I am hesitant to rely too much on a neural model's confidence outputs this way, because in my work on MICS I've been working on addressing a problem where T5's (a language model's) confidences are completely useless by default (see Figure 1 (a), showing how T5's question-answering accuracy is uncorrelated with its confidence). If the baseline confidence output were as meaningless as it is in that context, fixing problems with confidence would be out of scope, and thresholding may not help. But UnifiedQA does a little better in that figure, so maybe it's not hopeless. Also, the GNN is being used "as intended" in a sense and not contrived to do an entirely different task, so it might have a better shot. So maybe the GNN's confidence outputs are good enough. |
Correct I am referencing the GNN's confidence outputs. I don't want to claim just using the confidence values 'solves' the problem of knowing when to call a sample novel but baring other object feature extraction for objects it's what we do have to work with. Discovering that actually 'GNN output confidence doesn't correlate well with detecting novel object concepts' is still a useful result even if we're no closer to solving the problem. |
Not a reply to the above, but re:
I'm noting a third alternative -- for in-domain objects like apple or orange, (3) the GNN might well recognize the object correctly, in which case we have nothing to backfill. |
We would like to run an experiment in backfilling. @marjorief gave a motivating example: Suppose we're looking at a black-and-white photo of a person eating an apple. We should be able to say, this is probably an apple or an orange. And we should be able to fill in from there, it is probably red or green or orange in color. The idea here is: Train an action learner as usual, then at test time cause object recognition to ~fail, either by ablating features or directly. We would then measure how well we are able to backfill using the affordance learner (#1129).
In the "ablating features" instantiation of this which we discussed yesterday, @blakeharrison-ai would output grayscale images (or do postprocessing to remove color) before @shengcheng does the curriculum decode. An issue with this approach is that I expect the GNN to simply output the wrong object concept, in which case I think we will either (1) successfully match the wrong pattern/recognize the wrong thing, so no backfilling takes place, or (2) fail to match any pattern, so that the object never gets replaced and we fail to recognize the action (because the slot nodes can only match to
ObjectSemanticNode
s, notObjectClusterNode
s). I think (1) may not be problematic if we're okay with running backfilling regardless of whether we know what the object is, but (2) seems like more of a problem. @lichtefeld, do you have thoughts on this problem?The other approach might be to directly replace the object semantic node with "I don't know." This is hard to do with our existing learners. However, I think we can do a hack where we define a new GNN recognizer that knows when it's being evaluated, and at test time chooses one of the objects in the scene to replace with an "I don't know" semantic node. (That would mean setting up a new "unrecognized object" concept.) That way we know what concept the object was labeled with and it is not the same as any "real" concept, but the object still has a semantic node so it can be recognized as taking part in the action. As a bonus we also know we're ablating only one object.
The text was updated successfully, but these errors were encountered: