Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from suffix checks to archive checks #9963

Merged
merged 4 commits into from
Oct 28, 2022
Merged

Switch from suffix checks to archive checks #9963

merged 4 commits into from
Oct 28, 2022

Conversation

kalenmike
Copy link
Contributor

@kalenmike kalenmike commented Oct 28, 2022

When downloading an archive from a URL shortener service the extension is not added. As the archive checks where based on the extension they were not being unzipped.

This uses archive packages to check the file type and not rely on the extension.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improvement to dataset downloading by supporting various compression formats.

📊 Key Changes

  • Expanded dataset download capabilities with added support for tar files.
  • Refactored conditionals by utilizing is_zipfile and is_tarfile functions for file type checking instead of suffix string comparison.

🎯 Purpose & Impact

  • Purpose: Provide a more robust way to handle different archive file types when downloading datasets for YOLOv5, making the process more error-proof and accommodating a wider range of compressed file formats.
  • Impact: Users will benefit from a more flexible dataset downloading process, with the tool automatically recognizing zip and tar archives without relying on the file extension alone. This could prevent issues when files do not have the expected file extension but are valid archives.

@kalenmike
Copy link
Contributor Author

@glenn-jocher I saw you were converting data to a string before so I wasn't sure if it sometimes came in some strange format.

@glenn-jocher
Copy link
Member

@kalenmike ya I just checked now to verify, they accept either str or Path which is nice. I cleaned up the imports a bit and I think it's good to go!

I'll wait for CI and then merge to master and update ultralytics/HUB branch.

@glenn-jocher glenn-jocher merged commit 575055c into ultralytics:master Oct 28, 2022
@glenn-jocher
Copy link
Member

@glenn-jocher I saw you were converting data to a string before so I wasn't sure if it sometimes came in some strange format.

You're right by the way that many functions do not accept Path filenames, only strings, so I usually check beforehand. It's a bit annoying as we generally needs Path objects for all filenames for multi-os compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants