-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add ByteStream
metadata and other metadata to Documents
created by HTMLToDocument
#6304
feat: Add ByteStream
metadata and other metadata to Documents
created by HTMLToDocument
#6304
Conversation
Hey @awinml thanks so much for the contribution 🚀 |
Hey @awinml , this looks like a good contribution; thank you! I'm wondering if you spoke with someone who asked you to add progress bars - as far as I know, we don't add them (at least for now) and having only this component with progress bars would introduce inconsistencies. If you added progress bars on your own initiative, would you please update this PR to remove them? |
Thanks! @vblagoje I saw that I understand how this would introduce inconsistencies with the other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation-wise, looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these unit tests are marked as integration? If not, would you please update the integration tests into unit tests by removing the existing markers. In future, tests will be classified as unit tests by default unless marked as integration tests.
Additionally, consider using truthiness checks (e.g., if some_variable:) instead of is not None checks - if that's applicable. Seems like the code would work with both, but I just wanted to double check with you. And they are more compact and easier to read.
@vblagoje, You're right! I've made the updates you suggested. I have marked the tests as unit tests and replaced the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, looks good, let's 🚢 this one @awinml
…ated by `HTMLToDocument` (#6304) * Refactor HTMLToDocument * Add release notes * Add additional tests * remove progress bar * Add additional test for metadata * remove progress bar from release notes * Update tests * Use truthiness checks instead of is not None
Related Issues
HTMLToDocument
to addByteStream
metadata to Document #6275Proposed Changes:
ByteStream
metadata toDocument
metadata on creation of documents.meta
parameter in therun()
method, that allows the users to pass additional metadata to theDocuments
.How did you test it?
Unit tests, Integration tests and manual verification.
Notes for the reviewer
Usage examples showcasing the various methods for adding metadata to the documents can be viewed in this Colab Notebook. It also has an example utilizing HTMLToDocument with LinkContentFetcher.
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.