You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all thank you for open sourcing this code. It is excellent. I'm encountering an error during training where a leaf node can receive a NaN score. After this happens, training freezes. The error has to occur in the following block of code-
if (node_idx >= nodes_n / 2) {
// we are on a leaf node
const int idx = node_idx - nodes_n / 2;
double pos_w, neg_w;
pos_w = neg_w = c.esp;
for (int i = 0; i < pos_n; i++)
pos_w += pos.weights[pos_idx[i]];
for (int i = 0; i < neg_n; i++)
neg_w += neg.weights[neg_idx[i]];
float score = 0.5 * log(pos_w / neg_w);
scores[idx] = isnan(score) ? 0. : score;
return;
}
I added the NaN check above the return myself to work around the issue, but I'm not sure setting the score to 0 is the proper solution. Do you have any insight on better ways to avoid this problem?
The text was updated successfully, but these errors were encountered:
@JordanCheney pay attention to the leaf score at first carts, it shouldn't be too large. The problem is caused by the internal node split which may lead to a leaf node has no face sample or non-face sample. The score threshold may be unusually and cause weights to explosion when calculate exp()
I understand that the math says "pure" splits (all face or all non-face) will basically cause the scores to explode. This seems odd to me though as the goal of a tree should be to split the data perfectly no? Of course this should be very difficult to accomplish but still. I suppose this isn't really a bug but a strange artifact of my data. For the record, I was able to get a full cascade to train using my fix above, but the scores weren't comparable to the paper and I'm hoping this is the reason.
Hello,
First of all thank you for open sourcing this code. It is excellent. I'm encountering an error during training where a leaf node can receive a NaN score. After this happens, training freezes. The error has to occur in the following block of code-
I added the NaN check above the return myself to work around the issue, but I'm not sure setting the score to 0 is the proper solution. Do you have any insight on better ways to avoid this problem?
The text was updated successfully, but these errors were encountered: