Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Page dimensions analysis task #349

Merged
merged 19 commits into from
Jun 10, 2024

Conversation

JMicheli
Copy link
Collaborator

@JMicheli JMicheli commented Jun 3, 2024

This pull request aims to close #180.

I've added a new type of task to the media analysis job created in #307, AnalyzePageDimensions. This task attempts to determine the dimensions of each page in a media item and then stores its findings to the database so that they can be accessed through the server's API.

The database structure for storing the dimensions data is shown below:

model PageDimensions {
  id String @id @default(cuid())
  dimensions String
  metadata MediaMetadata @relation(fields: [metadata_id], references: [id], onDelete: Cascade)
  metadata_id String @unique

  @@map("page_dimensions")
}

An individual height, width pair is represented in Rust by PageDimension.

One of the concerns discussed when planning this addition was the possibility that a very long book would have a very long list of dimensions, thus necessitating the separate table to store dimensions, making them optional when loading metadata. But for this same reason, the string encoding a vector of PageDimensions could potentially be very large.

To avoid database bloat for users with large libraries/long books, I've implemented a simple compression scheme that serializes Vec<PageDimension> as follows:

// Comma and semicolon-separated serialization of height/width pairs.
let list = vec![
  PageDimension::new(800, 600),
  PageDimension::new(1920, 1080),
  PageDimension::new(800, 600),
];
let serialized_list = dimension_vec_to_string(list);
assert_eq!(serialized_list, "800,600;1920,1080;800,600");

// Repeated values are compressed with shorthand.
let list = vec![
  PageDimension::new(800, 600),
  PageDimension::new(800, 600),
  PageDimension::new(800, 600),
];
let serialized_list = dimension_vec_to_string(list);
assert_eq!(serialized_list, "3>800,600");

I think that this should be effective since many books should have uniform page sizes (excepting the cover and the occasional two-page spread). Thus, many books won't actually need to store that much data in the database. The deserialization function I wrote is copy-free and fast, I think it should not introduce any unwanted overhead.

Finally, the API matches the first option discussed on the issue:

curl -X GET http://stump.cloud/api/v1/media/:id/page/:page/dimensions gives a single dimension for a single page.
curl -X GET http://stump.cloud/api/v1/media/:id/dimensions gives a list of dimensions for each page (0-indexed).

@aaronleopold
Copy link
Collaborator

@JMicheli The compression scheme you outlined is simple and smart! I'm excited to give this a review. I'll try to make time after work sometime this week, but it might land on the weekend. As always, thanks for your time and contributions - I really appreciate it!

Copy link
Collaborator Author

@JMicheli JMicheli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some comments re: concerns I had while writing this code.

core/src/filesystem/media/analyze_media_job/utils.rs Outdated Show resolved Hide resolved
core/src/filesystem/media/analyze_media_job/mod.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@aaronleopold aaronleopold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few really low priority comments/questions but this looks great!

I'll give it another review for my approval stamp once the conflicts are resolved ✅

Edit to add that I meant to call out all of your comments throughout the code. They are very helpful during review and will be beneficial in the long run. I appreciate it!

core/src/db/entity/metadata/page_dimension.rs Outdated Show resolved Hide resolved
core/src/db/entity/metadata/page_dimension.rs Show resolved Hide resolved
core/src/db/entity/metadata/page_dimension.rs Outdated Show resolved Hide resolved
core/src/filesystem/media/analyze_media_job/mod.rs Outdated Show resolved Hide resolved
core/src/db/entity/metadata/media_metadata.rs Outdated Show resolved Hide resolved
apps/server/src/routers/api/v1/media.rs Outdated Show resolved Hide resolved
apps/server/src/routers/api/v1/media.rs Outdated Show resolved Hide resolved
@JMicheli
Copy link
Collaborator Author

JMicheli commented Jun 8, 2024

I'm working on getting things ready to merge - as is tradition, this part is often the most difficult for me.

@aaronleopold
Copy link
Collaborator

No rush on my part! Reach out if you need anything, but I would guess all you need to do is update your feature branch with this repo's experimental branch, either directly if you have a remote set up or just update your fork's experimental and then merge that in

@JMicheli
Copy link
Collaborator Author

JMicheli commented Jun 10, 2024

Alright I think that should do it - you were right, just needed a little cargo clean to get things playing nice again.

@aaronleopold
Copy link
Collaborator

I fixed a couple of clippy lints and a missing Err, but this should be good to go once the experimental finishes building. I'll merge it shortly, thanks again!

@aaronleopold aaronleopold changed the title Add page dimensions analysis task ✨ Page dimensions analysis task Jun 10, 2024
@aaronleopold aaronleopold merged commit 3f447d5 into stumpapp:experimental Jun 10, 2024
3 checks passed
@aaronleopold aaronleopold mentioned this pull request Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants