-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fiber Notebook line 15 - train test split #552
Comments
@wd15 I added a generic transformer to fix the issue here. When I use full PCA it gives this error : ValueError: operands could not be broadcast together with shapes (882,882) (640,1) (882,882) It is the first GenericTransformer that causes the issue. Randomized PCA does not have the same issue but it doesn't give good results. There is something wrong with the dask randomized PCA as sklearn randomized PCA doesn't have the same issue. Any suggestions for the broadcast issue? |
What's happening here is that the number of features is larger than the number of samples when it arrives at the PCA and that breaks the "full" PCA when using dask. If you reduce the amount of data out of the correlations it will work. Use this function as the reshape function:
in
and you'll see it. For example, with a cutoff of 5 and 2 correlations the shapes are
Now 320 > 242 so things work. If you change it so that the amount of data is greater than 320 then it breaks. 3 correlations for example or cutoff=11 |
Should we raise a warning in the two point correlations when the features are greater than the samples? |
Looks like we have this repaired now. |
(Line 15) We create the train and test split and get a flat array. Then, we are inputting the flat array as an input for the pipeline. Ideally, It has to be a 2D microstructure. In that setting, we are converting the image to a 1D structure and performing the pipeline with 1D microstructures.
We need to take a look at that
The text was updated successfully, but these errors were encountered: