Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul pdfobj #176

Closed
wants to merge 17 commits into from
Closed

Conversation

cormacrelf
Copy link
Collaborator

@cormacrelf cormacrelf commented Oct 29, 2019

Depends on #175

  • 🎉 PdfDict uses an insertion-order-preserving hash map, from indexmap 🎉
  • PdfName uses CString under the hood
  • Streams and arrays get a much more convenient API, and more things
    operate directly on streams
  • Rustified filter_decoded

Note that review is a bit annoying as I abstracted away *mut pdf_obj behing a type alias, PdfObjRef. You can basically click 'viewed' on every file except dpx_pdfobj.rs.

@chirsz-ever
Copy link

Do we have a plan to use String everywhere in the near future?

@cormacrelf
Copy link
Collaborator Author

We need to provide &'a CStr sometimes, so that's a non-local change.

@burrbull
Copy link
Collaborator

Do we have a plan to use String everywhere in the near future?

Pdf is byte-oriented format, so:

*const i8 -> CString(CStr) -> Vec<u8>(&[u8])

String(&str) where it is appropriate

@burrbull
Copy link
Collaborator

lopdf uses LinkedHashMap

@cormacrelf
Copy link
Collaborator Author

lopdf uses LinkedHashMap

linked-hash-map is unmaintained, but they do the same thing, so I went for indexmap. It only takes a few moments to swap them over. I'll push and see what happens to the tests.

@cormacrelf
Copy link
Collaborator Author

Ok, it's much worse. Switching back.

@mulimoen
Copy link
Collaborator

I thing the hashmap can hold multiple values with the same key. You could try multimap

@cormacrelf
Copy link
Collaborator Author

@mulimoen Hm? The old linked list code would loop until it found the same key, and replace it, and pdf_obj_release the existing value. If it didn't find any, it would tack it on the end. There were never any dupes. IndexMap has the same properties. It's just a hashmap that preserves insertion order.

@mulimoen
Copy link
Collaborator

@cormacrelf I assumed it was using ht_lookup_table, but seems I was wrong.

(Takes on `png` dependency -- this will be shared by `image`, eventually, hopefully.)
dpx/src/dpx_pdfobj.rs Outdated Show resolved Hide resolved
@cormacrelf
Copy link
Collaborator Author

Ok, I think that's going to pass, we'll see in 125 minutes. But I'm mostly happy.

@cormacrelf cormacrelf marked this pull request as ready for review October 30, 2019 16:33
@cormacrelf
Copy link
Collaborator Author

Well, if anyone can figure out what's wrong, let me know, otherwise I'll try bisecting into multiple PRs.

@burrbull
Copy link
Collaborator

It's better to split this PR on several in any case.

@@ -123,9 +129,49 @@ impl pdf_obj {
pub fn is_array(&self) -> bool {
PdfObjType::from(self.typ) == PdfObjType::ARRAY
}
pub fn get_array_mut(&mut self) -> &mut PdfArray {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better as_array_mut like in lopdf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants