Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File storage rewrite #2676

Merged
merged 89 commits into from
Jul 13, 2022
Merged

Conversation

WithoutPants
Copy link
Collaborator

@WithoutPants WithoutPants commented Jun 16, 2022

This PR targets the files-refactor branch, but it represents the culmination of all the previous work.

This PR is a significant rewrite of the underlying object model for scenes, images and galleries.

A simplified view of the object model is as follows:

image

File-specific fields have been removed or deprecated from scenes, images and galleries, replaced with relationships to files and folders. There is still a significant amount of code that assumes that these objects have one single file. In most cases, this is replaced with a call to PrimaryFile, which returns the primary file for the object, or nil if it doesn't have one.

The only major user-visible change as a result of this is handling of duplicate files. When a file is added, a scene/gallery/image will be added linking to the new file. If a scene/gallery/image linked to a file with the same primary fingerprint as the new file, then the new file will be added to the existing scene/gallery/image. If multiple files are attached to a single scene/gallery/image, then this will be reflected in the file info tab for the object:

image

The accordion view shows the details for each of the files.

Viewing the file always defaults to the primary file. The primary file cannot currently be changed to another linked file, but this functionality will be added in a future PR.

Known issues:

  • import/export functionality is currently disabled. Needs further design.
  • missing covers are not currently regenerated. Need to consider further, especially around scene cover redesign.

This PR is likely to have many regressions and will need extensive testing. This branch will be rebased when necessary to remain up to date against the develop branch.

This PR will allow future support for the following functionality:

  • file explorer view
  • scene to partial file association (for single movie files)
  • movie to file association

Note: please exercise caution when testing this PR, especially against existing systems.

Note also that this requires a rescan after migrating an existing system.

@WithoutPants WithoutPants added the feature Pull requests that add a new feature label Jun 16, 2022
@bnkai
Copy link
Collaborator

bnkai commented Jun 20, 2022

Very short test on a clean install (linux binary from the pr, install on cwd, all settings to default)


The Duplicated (phash) scene filter is broken

error querying aggregate fields: executing query: SELECT COUNT(temp.id) as total, SUM(temp.duration) as duration, SUM(temp.size) as size FROM (SELECT DISTINCT scenes.id, COALESCE(scenes_query.duration, 0) as duration, COALESCE(scenes_query.size, 0) as size FROM scenes INNER JOIN scenes_query ON scenes.id = scenes_query.id INNER JOIN (SELECT id FROM scenes JOIN (SELECT phash FROM scenes GROUP BY phash HAVING COUNT(phash) > 1) dupes on scenes.phash = dupes.phash) AS scph ON scenes.id = scph.id) as temp [[]]: no such column: phash

The FindDuplicateScenes from the scene duplicate checker is also broken

{"errors":[{"message":"no such column: phash","path":["findDuplicateScenes"]}],"data":null}

Not sure if it related to the PR
On the new install the default scan options are not updated
For example
2022-06-21_01-48
the scan is done according to the above options, the options reset to

2022-06-21_02-02

Not sure if its expected behaviour
If a duplicate (by oshash) is found during scan and phash generation is enabled then it's phash is calculated

Calculating fingerprints for /home/redacted path/file (copy 1).mkv ... 
/home/redacted path/file (copy 1).mkv  doesn't exist. Creating new file entry... 
Adding /home/redacted path/file (copy 1).mkv  to scene file.mkv 
[generator] generating phash sprite for  /home/redacted path/file (copy 1).mkv  

@bnkai
Copy link
Collaborator

bnkai commented Jun 21, 2022

More testing using latest binary after 76fffa5 and still the same clean install


Identify with stashdb as the source is broken

ERRO[2022-06-21 23:44:54] error scraping from stash-box https://stashdb.org/graphql: error querying stash-box using scene ID 36: already in transaction 
DEBU[2022-06-21 23:44:54] Unable to identify %!s(func() string=0xce4600) 
ERRO[2022-06-21 23:44:54] goroutine 12888 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:24 +0x65
github.com/stashapp/stash/pkg/sqlite.(*Database).Begin(0xc00043b6d0, {0x1b87520, 0xc0003579b0})
	github.com/stashapp/stash/pkg/sqlite/transaction.go:23 +0x45
github.com/stashapp/stash/pkg/txn.WithTxn({0x1b87520, 0xc0003579b0}, {0x1b8c438, 0xc0005960d0}, 0xc0004a70f8)
	github.com/stashapp/stash/pkg/txn/transaction.go:18 +0x72
github.com/stashapp/stash/pkg/scraper/stashbox.Client.FindStashBoxScenesByFingerprints({0xc00090a010, {0x1b8c438, 0xc0005960d0}, {{0x7fffc8580f68, 0xc000446e00}, {0x7fffc8580f88, 0x24d7b00}, {0x7fffc8580fd0, 0x24d7b60}, {0x7fffc8580ff8, ...}}, ...}, ...)
	github.com/stashapp/stash/pkg/scraper/stashbox/stash_box.go:130 +0xd6
github.com/stashapp/stash/pkg/scraper/stashbox.Client.FindStashBoxSceneByFingerprints({0xc00090a010, {0x1b8c438, 0xc0005960d0}, {{0x7fffc8580f68, 0xc000446e00}, {0x7fffc8580f88, 0x24d7b00}, {0x7fffc8580fd0, 0x24d7b60}, {0x7fffc8580ff8, ...}}, ...}, ...)
	github.com/stashapp/stash/pkg/scraper/stashbox/stash_box.go:117 +0x8b
github.com/stashapp/stash/internal/manager.stashboxSource.ScrapeScene({0xc00087e090, {0xc0008b2520, 0x0}}, {0x1b87520, 0xc0003579b0}, 0xc0005caea0)
	github.com/stashapp/stash/internal/manager/task_identify.go:255 +0xca
github.com/stashapp/stash/internal/identify.(*SceneIdentifier).scrapeScene(0xc0005caea0, {0x1b87520, 0xc0003579b0}, 0xc0002970e0)
	github.com/stashapp/stash/internal/identify/identify.go:70 +0xf8
github.com/stashapp/stash/internal/identify.(*SceneIdentifier).Identify(0x1b8c438, {0x1b87520, 0xc0003579b0}, {0x1b8c438, 0xc0005caea0}, 0xc0002970e0)
	github.com/stashapp/stash/internal/identify/identify.go:43 +0x47
github.com/stashapp/stash/internal/manager.(*IdentifyJob).identifyScene.func1()
	github.com/stashapp/stash/internal/manager/task_identify.go:149 +0x29d
github.com/stashapp/stash/pkg/job.(*Progress).ExecuteTask(0xc000332000, {0xc000051700, 0x74}, 0xc0004a76a0)
	github.com/stashapp/stash/pkg/job/progress.go:175 +0xcd
github.com/stashapp/stash/internal/manager.(*IdentifyJob).identifyScene(0xc000a8b500, {0x1b87520, 0xc0003579b0}, 0xc0002970e0, {0xc0008d7080, 0x1, 0x1})
	github.com/stashapp/stash/internal/manager/task_identify.go:133 +0x1ee
github.com/stashapp/stash/internal/manager.(*IdentifyJob).identifyAllScenes.func1(0x1b87520)
	github.com/stashapp/stash/internal/manager/task_identify.go:122 +0x95
github.com/stashapp/stash/pkg/scene.BatchProcess({0x1b87520, 0xc0003579b0}, {0x7fffc84f93a8, 0xc000446e00}, 0x30, 0xc0001f6200, 0xc0004a7ce8)
	github.com/stashapp/stash/pkg/scene/query.go:87 +0x1f6
github.com/stashapp/stash/internal/manager.(*IdentifyJob).identifyAllScenes(0xc000a8b500, {0x1b87520, 0xc0003579b0}, {0xc0008d7080, 0x1, 0x1})
	github.com/stashapp/stash/internal/manager/task_identify.go:117 +0x2c5
github.com/stashapp/stash/internal/manager.(*IdentifyJob).Execute.func1({0x1b87520, 0xc0003579b0})
	github.com/stashapp/stash/internal/manager/task_identify.go:56 +0x111
github.com/stashapp/stash/pkg/txn.WithTxn({0x1b87478, 0xc0008d6980}, {0x1b8c438, 0xc0005965b0}, 0xc000df7ed0)
	github.com/stashapp/stash/pkg/txn/transaction.go:39 +0x10f
github.com/stashapp/stash/internal/manager.(*IdentifyJob).Execute(0xc000a8b500, {0x1b87478, 0xc0008d6980}, 0xc000332000)
	github.com/stashapp/stash/internal/manager/task_identify.go:54 +0x18c
github.com/stashapp/stash/pkg/job.(*Manager).executeJob(0xc000485420, {0x1b87478, 0xc0008d6980}, 0xc000ca60a0, 0xc00066c180)
	github.com/stashapp/stash/pkg/job/manager.go:208 +0x19e
created by github.com/stashapp/stash/pkg/job.(*Manager).dispatch
	github.com/stashapp/stash/pkg/job/manager.go:185 +0x1bf 
ERRO[2022-06-21 23:44:54] error scraping from stash-box https://stashdb.org/graphql: error querying stash-box using scene ID 37: already in transaction 
DEBU[2022-06-21 23:44:54] Unable to identify %!s(func() string=0xce4600) 

Deleting a scene removes the scene from the database but not the associated files. That means that in the next scan the scene wont be recreated because the files are already present in the database and so the user as it is now has no way from the UI to fix this. (deleting the files directly from the files table using sqlite and rescanning is needed atm).
Edit: Same applies to zip galleries


More of a TODO than a bug probably
Scene Duplicate Checker needs to take account of the exact (as in checksum) duplicates. For scenes that have a file duplicate it shows the scene twice with the same filename (the primary one)


I am getting a few warnings while browsing some images from a zip gallery

WARN[2022-06-22 00:40:17] read transaction failure while trying to read image by id: beginning transaction: context canceled 
WARN[2022-06-22 00:40:17] read transaction failure while trying to read image by id: beginning transaction: context canceled 
WARN[2022-06-22 00:40:25] couldn't execute getting a performer image from read transaction: beginning transaction: context canceled 
WARN[2022-06-22 00:40:25] error serving image: write tcp 127.0.0.1:9999->127.0.0.1:46596: write: broken pipe 
WARN[2022-06-22 00:41:21] failure while serving image (wrote 3425 bytes out of 603445): write tcp 127.0.0.1:9999->127.0.0.1:46606: write: broken pipe 
WARN[2022-06-22 00:41:21] failure while serving image (wrote 3425 bytes out of 710008): write tcp 127.0.0.1:9999->127.0.0.1:46604: write: broken pipe 
WARN[2022-06-22 00:41:21] failure while serving image (wrote 3425 bytes out of 648741): write tcp 127.0.0.1:9999->127.0.0.1:46602: write: broken pipe 
WARN[2022-06-22 00:41:59] read transaction failure while trying to read image by id: beginning transaction: context canceled 
WARN[2022-06-22 00:42:00] failure while serving image (wrote 3425 bytes out of 948469): write tcp 127.0.0.1:9999->127.0.0.1:46614: write: broken pipe 
WARN[2022-06-22 00:42:00] failure while serving image (wrote 3424 bytes out of 1137676): write tcp 127.0.0.1:9999->127.0.0.1:46622: write: broken pipe 
WARN[2022-06-22 00:42:00] failure while serving image (wrote 3424 bytes out of 1190740): write tcp 127.0.0.1:9999->127.0.0.1:46624: write: broken pipe

Everything seems working though, the images seem to be displayed fine

@WithoutPants
Copy link
Collaborator Author

Thanks for the continued testing.

  • Identify should now be fixed.
  • Deleting objects should now destroy the associated file entries in the database, so that they can be rescanned. In future, we should probably change this behaviour so that the files are left in the database and the associated objects can be recreated during the scan process. Something for when we allow finer control over the file entries in the database.
  • Duplicate checker issues should be resolved.

The scan options resetting appears to be a regression in the develop branch, so I'll look into a separate fix for that.

@bnkai
Copy link
Collaborator

bnkai commented Jun 28, 2022

* Deleting objects should now destroy the associated file entries in the database, so that they can be rescanned. In future, we should probably change this behaviour so that the files are left in the database and the associated objects can be recreated during the scan process. Something for when we allow finer control over the file entries in the database.

That was exactly my thought. It is a lot easier to just delete for now and leave the proper implementation for when the UI can handle the file entries.

I will check the changes and update when i test using an existing database.

* Refactor and separate image model
* Refactor image query builder
* Handle relationships in image query builder
* Remove relationship management methods
* Refactor gallery model/query builder
* Add scenes to gallery model
* Convert scene model
* Refactor scene models
* Remove unused methods
* Add unit tests for gallery
* Add image tests
* Add scene tests
* Convert unnecessary scene value pointers to values
* Convert unnecessary pointer values to values
* Refactor scene partial
* Add scene partial tests
* Refactor ImagePartial
* Add image partial tests
* Refactor gallery partial update
* Add partial gallery update tests
* Use zero/null package for null values
@WithoutPants
Copy link
Collaborator Author

Merging to files-refactor for a more generalised release.

Please report bugs to #2737.

@WithoutPants WithoutPants merged commit 62a2224 into stashapp:files-refactor Jul 13, 2022
WithoutPants added a commit that referenced this pull request Jul 18, 2022
* Restructure data layer part 2 (#2599)
* Refactor and separate image model
* Refactor image query builder
* Handle relationships in image query builder
* Remove relationship management methods
* Refactor gallery model/query builder
* Add scenes to gallery model
* Convert scene model
* Refactor scene models
* Remove unused methods
* Add unit tests for gallery
* Add image tests
* Add scene tests
* Convert unnecessary scene value pointers to values
* Convert unnecessary pointer values to values
* Refactor scene partial
* Add scene partial tests
* Refactor ImagePartial
* Add image partial tests
* Refactor gallery partial update
* Add partial gallery update tests
* Use zero/null package for null values
* Add files and scan system
* Add sqlite implementation for files/folders
* Add unit tests for files/folders
* Image refactors
* Update image data layer
* Refactor gallery model and creation
* Refactor scene model
* Refactor scenes
* Don't set title from filename
* Allow galleries to freely add/remove images
* Add multiple scene file support to graphql and UI
* Add multiple file support for images in graphql/UI
* Add multiple file for galleries in graphql/UI
* Remove use of some deprecated fields
* Remove scene path usage
* Remove gallery path usage
* Remove path from image
* Move funscript to video file
* Refactor caption detection
* Migrate existing data
* Add post commit/rollback hook system
* Lint. Comment out import/export tests
* Add WithDatabase read only wrapper
* Prepend tasks to list
* Add 32 pre-migration
* Add warnings in release and migration notes
WithoutPants added a commit that referenced this pull request Jul 27, 2022
* Restructure data layer part 2 (#2599)
* Refactor and separate image model
* Refactor image query builder
* Handle relationships in image query builder
* Remove relationship management methods
* Refactor gallery model/query builder
* Add scenes to gallery model
* Convert scene model
* Refactor scene models
* Remove unused methods
* Add unit tests for gallery
* Add image tests
* Add scene tests
* Convert unnecessary scene value pointers to values
* Convert unnecessary pointer values to values
* Refactor scene partial
* Add scene partial tests
* Refactor ImagePartial
* Add image partial tests
* Refactor gallery partial update
* Add partial gallery update tests
* Use zero/null package for null values
* Add files and scan system
* Add sqlite implementation for files/folders
* Add unit tests for files/folders
* Image refactors
* Update image data layer
* Refactor gallery model and creation
* Refactor scene model
* Refactor scenes
* Don't set title from filename
* Allow galleries to freely add/remove images
* Add multiple scene file support to graphql and UI
* Add multiple file support for images in graphql/UI
* Add multiple file for galleries in graphql/UI
* Remove use of some deprecated fields
* Remove scene path usage
* Remove gallery path usage
* Remove path from image
* Move funscript to video file
* Refactor caption detection
* Migrate existing data
* Add post commit/rollback hook system
* Lint. Comment out import/export tests
* Add WithDatabase read only wrapper
* Prepend tasks to list
* Add 32 pre-migration
* Add warnings in release and migration notes
WithoutPants added a commit that referenced this pull request Sep 6, 2022
* Restructure data layer part 2 (#2599)
* Refactor and separate image model
* Refactor image query builder
* Handle relationships in image query builder
* Remove relationship management methods
* Refactor gallery model/query builder
* Add scenes to gallery model
* Convert scene model
* Refactor scene models
* Remove unused methods
* Add unit tests for gallery
* Add image tests
* Add scene tests
* Convert unnecessary scene value pointers to values
* Convert unnecessary pointer values to values
* Refactor scene partial
* Add scene partial tests
* Refactor ImagePartial
* Add image partial tests
* Refactor gallery partial update
* Add partial gallery update tests
* Use zero/null package for null values
* Add files and scan system
* Add sqlite implementation for files/folders
* Add unit tests for files/folders
* Image refactors
* Update image data layer
* Refactor gallery model and creation
* Refactor scene model
* Refactor scenes
* Don't set title from filename
* Allow galleries to freely add/remove images
* Add multiple scene file support to graphql and UI
* Add multiple file support for images in graphql/UI
* Add multiple file for galleries in graphql/UI
* Remove use of some deprecated fields
* Remove scene path usage
* Remove gallery path usage
* Remove path from image
* Move funscript to video file
* Refactor caption detection
* Migrate existing data
* Add post commit/rollback hook system
* Lint. Comment out import/export tests
* Add WithDatabase read only wrapper
* Prepend tasks to list
* Add 32 pre-migration
* Add warnings in release and migration notes
@WithoutPants
Copy link
Collaborator Author

$100 bounty placed (contribution no. 603395)

@WithoutPants WithoutPants added the bounty This issue has a bounty on it in the OpenCollective label Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty This issue has a bounty on it in the OpenCollective feature Pull requests that add a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants