Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raised error for one-hot encoded data with generic method #446

Open
Gradwanderer opened this issue Oct 24, 2024 · 1 comment
Open

Raised error for one-hot encoded data with generic method #446

Gradwanderer opened this issue Oct 24, 2024 · 1 comment

Comments

@Gradwanderer
Copy link

I have preprocessed a data set with one-hot encoding for all categorical features. The data has now more than 30 binary (hot encoded) (0,1) features. If I try to run the "genetic", I get the error shown below.

Do I need to specify the (0,1) encoded features?
The change to random does provide CFE, but I guess the "random method" is not the optimal solution to get the best possible CFE.

File "C:\CFE.py", line 158, in gen_cfe
e1 = exp.generate_counterfactuals(downsampled_dataset, total_CFs=10, desired_class="opposite")# "opposite")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:.venv\Lib\site-packages\dice_ml\explainer_interfaces\explainer_base.py", line 186, in generate_counterfactuals
res = self.generate_counterfactuals(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:.venv\Lib\site-packages\dice_ml\explainer_interfaces\dice_genetic.py", line 270, in generate_counterfactuals
self.num_output_nodes = self.model.get_num_output_nodes2(query_instance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:.venv\Lib\site-packages\dice_ml\model_interfaces\base_model.py", line 70, in get_num_output_nodes2
return self.get_output(input_instance).shape[1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:.venv\Lib\site-packages\dice_ml\model_interfaces\base_model.py", line 54, in get_output
return self.model.predict_proba(input_instance)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:.venv\Lib\site-packages\pyod\models\base.py", line 213, in predict_proba
test_scores = self.decision_function(X)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:.venv\Lib\site-packages\pyod\models\pca.py", line 300, in decision_function
cdist(X, self.selected_components
) / self.selected_w_components
,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:.venv\Lib\site-packages\scipy\spatial\distance.py", line 3006, in cdist
return cdist_fn(XA, XB, out=out, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Unsupported dtype object

@Gradwanderer
Copy link
Author

Just to add some "problems" to my setting.
If switched to random sampling, the Dice package does generate CFEs that are not possible.
For example I have the feature sex with (male, female, other) encoded in Male (0,1) and Female (0,1). This solution would require a datapoint with [0,0] for a "other" choice. A datapoint with [1,1] does not exist. I get such solutions, because I do not know how to specify how the change can be done. Is there a solution to that specific problem or do I need block a cange in such features?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant