Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no failed activity log when previewing one invalid avro file #7478

Closed
v-xianya opened this issue Nov 3, 2023 · 6 comments
Closed
Assignees
Labels
🪲 regression Issue was working in a previous version ⚙️ adls gen2 Related to hierarchical namespaces (ADLS Gen 2) ⚙️ blobs Related to blob storage ⚙️ files Related to file storage 🧪 testing Found through regular testing ✅ merged A fix for this issue has been merged
Milestone

Comments

@v-xianya
Copy link
Member

v-xianya commented Nov 3, 2023

Storage Explorer Version: 1.33.0-dev (96)
Build Number: 20231103.1
Branch: main
Platform/OS: Windows 10/Linux Ubuntu 22.04/MacOS Sonoma 14.1 (Apple M1 Pro)
Architecture: x64/x64/arm64
How Found: From running test cases
Regression From: Previous release (1.32.0)

Steps to Reproduce

  1. Expand one storage account -> Blob Containers.
  2. Create a blob container -> Upload an invalid avro file.
  3. Select the file -> Click 'Preview'.
  4. Check whether there is a failed activity log.

Expected Experience

There is a failed activity log.
image

Actual Experience

There is no activity log.

@v-xianya v-xianya added 🧪 testing Found through regular testing ⚙️ blobs Related to blob storage ⚙️ files Related to file storage 🪲 regression Issue was working in a previous version ⚙️ adls gen2 Related to hierarchical namespaces (ADLS Gen 2) labels Nov 3, 2023
@MRayermannMSFT MRayermannMSFT added this to the 1.33.0 milestone Nov 3, 2023
@craxal
Copy link
Contributor

craxal commented Nov 3, 2023

@v-xianya I changed the library we use to parse Avro data in order to get better support for date formatting. There are a couple of things I've discovered:

  • The library throws an error if there's a problem with the header data (missing magic bits, bad schema, etc.)
  • The library does not throw an error if the invalid data is at the end of the file.

Can you:

  1. Share the file that should be resulting in a failed activity.
  2. If there is no error, what does the preview tab actually show?
  3. Describe the contents of the file and how the data was generated.
    • What parts of the data are valid? What parts are invalid?
    • Is the schema at the head of the file able to be parsed? Is the invalid part of the data at the end of the file?

It's possible the newer library is more robust than the previous one, which would mean fewer errors.

@v-xianya
Copy link
Member Author

v-xianya commented Nov 6, 2023

@craxal I shared the invalid avro file on teams.
The preview tab screenshot as below:
image

@craxal
Copy link
Contributor

craxal commented Jan 30, 2024

@v-xianya Some changes were made recently. Does this still repro?

@v-xianya
Copy link
Member Author

Hi @craxal Verified this issue on the main build 20240130.14. It reproduces.
.

@craxal
Copy link
Contributor

craxal commented Jan 31, 2024

After debugging into the Avro parsing library, it would seem that the invalid Avro file has a bad header. I found that the library mysteriously does not throw errors in this case. As a result, we simply don't get a schema object, and an empty set of columns and rows is returned.

To work around this issue, I think it's best to throw our own error. If the schema remains undefined when the library returns, it's likely the header was invalid, and so an error would be appropriate.

I've opened an issue for the library: mtth/avsc#448.

@craxal craxal added the ✅ merged A fix for this issue has been merged label Feb 1, 2024
@craxal craxal closed this as completed Feb 1, 2024
@v-xianya
Copy link
Member Author

v-xianya commented Feb 1, 2024

Verified this issue on the main build 20240201.4. Fixed.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪲 regression Issue was working in a previous version ⚙️ adls gen2 Related to hierarchical namespaces (ADLS Gen 2) ⚙️ blobs Related to blob storage ⚙️ files Related to file storage 🧪 testing Found through regular testing ✅ merged A fix for this issue has been merged
Projects
None yet
Development

No branches or pull requests

3 participants