-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Batch Prediction] [Doc] Output jumps from 1/200 to 1/1 and "tench" output is suspicious #39028
Comments
Reproducible, this time I got
This time I printed out more sample predictions and saw the same memory address for two images which are definitely different:
so I wonder if the memory location must be the location of the batch or something, not the location of the image. |
We have a related PR which may end up resolving the 1/1 progress bar issue: #39828 |
ah, actually i think the Regarding the memory location, I don't think ray is modifying the memory location here, we think it's purely related to PIL? Are you able to see otherwise when reading the images with just PIL and not through ray? @architkulkarni |
Sure, maybe the memory location is a distraction. The main point of the issue is that the output is baffling for a first-time user reading through the tutorial. It prints out five identical "tench" lines. If it's working as intended, then the doc or the sample script should be updated to make it more obvious that it worked so the user can feel successful. Also the |
The 5 "tench" lines come from displaying each of the 5 examples from Would the most helpful addition here be to add something like "Successfully loaded 5 samples" at the end? Or any other suggestion to make it obvious it succeeded (without having to rely on displaying images on the screen, which may or may not be possible depending on how the user is running the code). I think |
I see, I missed the fact that you could scroll through the images. (There's an invisible scroll bar) For The progress bar is still confusing though, because it goes from 0/200 to 1/200 and then stops (even assuming the 1/1 is from an unrelated call as you suggested). |
Ideally there would be more than one type of fish in the output so we can see it classifying different fish... Not sure if there's a way to guarantee that though. |
@architkulkarni can we just chat in person to determine priority. |
i think updating the example will be pretty quick, just need to understand what would be the best way to show that it succeeded. |
What happened + What you expected to happen
Running the tutorial from https://docs.ray.io/en/latest/data/examples/huggingface_vit_batch_prediction.html, we want the output to confirm that the prediction worked and that it used GPUs. But the current output is a little suspicious:
the five sample images were all "tench" and there are only two distinct memory locations out of the five. (0x7B37546CF7F0 and 0x7B37546AE430)no longer worried about this, but still curious how different images could have the same memory location.I'm not sure if it's a setup issue, an issue with the script, or a bug in Ray, or if it's working as expected.
Versions / Dependencies
Ray 2.6.3, Kuberay 0.6.0
Reproduction script
For more details about how it was run, see ray-project/kuberay#1361. I know the GPUs were available and being used (for example we got CUDA out of memory errors until we reduced the batch size.). The use case is a KubeRay tutorial which uses this workload as an example of how to run a real workload on KubeRay.
Issue Severity
None
The text was updated successfully, but these errors were encountered: