-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange error that only happens with certain input dimensions #262
Comments
Hi @Mithrillion, Thanks for your detailed bug report: this is very unexpected indeed! Until then, and in any case, I wouldn't advise you to use KeOps with such high-dimensional features. As detailed in our NeurIPS paper, a good rule of thumb is that KeOps stops being that useful in spaces of dimension D > 50 or 100.
What do you think? |
@jeanfeydy Thanks for the response! Glad the cause of this strange problem is found. And thanks for the advice on implementation. This indeed is more of a toy example. In practice, I noticed similar issues for much lower dimensions (a few specific values between 64-512 for a particular input), and I usually work around that by changing the input dimension. The "huge" input dimension I used here was discovered when I tried to compare naive flattened Euclidean distance kNN with distance over a much more sensible feature set, using the same implementation, therefore the weird batch size to dimension size ratio. The original data is actually a time series with some redundant dimensions. But great advice still! I do use PCA and UMAP frequently myself, and the batch size for my data is roughly in the range where KeOps is faster than index-based kNN methods. KeOps works well for me because I only need a sparse kNN matrix (the full pairwise matrix is way too large). It manages memory and GPU utilisation better than any other tools I have. Thanks again! |
Hi @Mithrillion, |
Hello @Mithrillion and @jeanfeydy , |
I have encountered an error that happens only when the input dimension to a kNN formula has certain dimensions. Here is my test code:
This returns the error (full trace below)
ValueError: [KeOps] Error : args must be c_variable or c_array instances (error at line 382 in file .../lib/python3.9/site-packages/keopscore/utils/code_gen_utils.py)
The same code runs for almost any other input dimensions, such as
(204, 405 * 61+1)
,(204, 405 * 62)
or(204, 405 * 60)
. The error persists if I change the formula to argKmin or Kmin only, and it happens with both numpy and torch bindings. Clearing KeOps cache does not seem to work either.My PyKeOps version is 2.1 release, Python version is 3.9 and here is the nvcc message:
The system is running Ubuntu 22.04. The GPU used is an RTX3090.
Edit: error reproduceable on colab: https://colab.research.google.com/drive/1Zu93CDL6KTLOV_skg1uUJ1rcAdRoUJFI?usp=sharing
The full trace is:
The text was updated successfully, but these errors were encountered: