Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhaustive chaid #112

Closed
KamilGos opened this issue Oct 16, 2020 · 8 comments
Closed

Exhaustive chaid #112

KamilGos opened this issue Oct 16, 2020 · 8 comments

Comments

@KamilGos
Copy link

KamilGos commented Oct 16, 2020

What should be changed in Your code to get an "exhaustive CHAID" version of this algorithm?

Here is the explanation: ftp://ftp.software.ibm.com/software/analytics/spss/support/Stats/Docs/Statistics/Algorithms/13.0/TREE-CHAID.pdf

@Rambatino
Copy link
Owner

Hi @KamilGos, the only difference is the merging of the predictors right? So that means it's in the best_splits() method, which would exhaustively find the best pair.

The variable would need to be passed down, or it could even be via a new ExhaustiveCHAID class. Either way, shouldn't be too difficult.

@KamilGos
Copy link
Author

Yes, that is right.

Ok, I will try then. It is just a little difficult for me to understand Your best_con_split() function (I work witch continuous dependent variable) so I don't know where should I put the changes actually, but I will try. Thanks.

@Rambatino
Copy link
Owner

So, there are multiple ways to do this, I don't know off the top of my head which would be better (it's been a few years since I've looked into the code as it's been production grade for a long time).

The best_con_split() is a little verbose. But I wonder whether ind_var.possible_groupings(exhaustive=True) could be an easy solution (and then finding a way of passing that variable down).

There's also room for using other / more appropriate to your use-case stats functions when comparing two continuous sets of variables. Not sure whether you've thought about that side of things?

@KamilGos
Copy link
Author

I'll try to use your idea.

Regarding your question, when I was working with your code using simple CHAID method it was working pretty fine (exactly as expected) so I didn't think about using other statistical functions. Now I'm working with ex-chaid and I thought I would just make a small change to the code and it would work but I'm stuck analyzing the function I mentioned earlier. But yes, you may be right, there would probably be a more appropriate approach (stats function) to do that. For now, I will try with what I have :)

Thanks

@Rambatino
Copy link
Owner

Yeah that function could probably be dryed up a bit.

Let me know how you get on, I may have time to look at some point this week, if you aren't able to crack it.

(although the unit tests should be useful for you)

@KamilGos
Copy link
Author

Hi, I still can't handle it. If you could try to make this modification I will very gratefull ! Thanks for the thought anyway.

@Rambatino
Copy link
Owner

@KamilGos can you have a look at: #113 🙏

@KamilGos
Copy link
Author

@KamilGos can you have a look at: #113 🙏

It works as expected. Thank you so much!!! I very appreciate your effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants