Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN leaf score #19

Open
JordanCheney opened this issue Feb 11, 2016 · 2 comments
Open

NaN leaf score #19

JordanCheney opened this issue Feb 11, 2016 · 2 comments

Comments

@JordanCheney
Copy link

Hello,

First of all thank you for open sourcing this code. It is excellent. I'm encountering an error during training where a leaf node can receive a NaN score. After this happens, training freezes. The error has to occur in the following block of code-

if (node_idx >= nodes_n / 2) {
    // we are on a leaf node
    const int idx = node_idx - nodes_n / 2;
    double pos_w, neg_w;
    pos_w = neg_w = c.esp;
    for (int i = 0; i < pos_n; i++)
        pos_w += pos.weights[pos_idx[i]];
    for (int i = 0; i < neg_n; i++)
        neg_w += neg.weights[neg_idx[i]];

    float score = 0.5 * log(pos_w / neg_w);
    scores[idx] = isnan(score) ? 0. : score;

    return;
}

I added the NaN check above the return myself to work around the issue, but I'm not sure setting the score to 0 is the proper solution. Do you have any insight on better ways to avoid this problem?

@luoyetx
Copy link
Owner

luoyetx commented Feb 13, 2016

@JordanCheney pay attention to the leaf score at first carts, it shouldn't be too large. The problem is caused by the internal node split which may lead to a leaf node has no face sample or non-face sample. The score threshold may be unusually and cause weights to explosion when calculate exp()

@JordanCheney
Copy link
Author

I understand that the math says "pure" splits (all face or all non-face) will basically cause the scores to explode. This seems odd to me though as the goal of a tree should be to split the data perfectly no? Of course this should be very difficult to accomplish but still. I suppose this isn't really a bug but a strange artifact of my data. For the record, I was able to get a full cascade to train using my fix above, but the scores weren't comparable to the paper and I'm hoping this is the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants