-
-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kemono] Downloading revisions similar to the webpage #5013
Comments
What do you mean? |
I don’t know how to use different ‘directory’ settings for the current and historical versions of the Kemono page. Is there a way to set the ‘directory’ condition to ‘current version’? It seems like there is no such parameter.
I tried to write the settings like this, but it didn’t work. The current version still outputs [{revision_id}]=[0]. The output is like this, including the incorrect current version file name and many unnecessary historical versions:
|
This is Kemono's fault. They refuse to show the ID of the current version to discourage scraping because if they were to ban it outright they'd face more backlash than they can handle. |
I found that using the ‘edited’ time seems to achieve a similar effect as described above, but I don’t know how to obtain content in gallery-dl according to the ‘edited’ time. For example, the latest ‘edited’ time is considered the current version, and ‘edited’ time is used as the basis for obtaining different historical versions. |
Oh no, it seems that some services do not have an ‘edited’ time. https://kemono.su/patreon/user/3295915/post/88413981 Now it seems that downloading can only be done by comparing the content. |
A SHA1 hexdigest of other relevant metadata fields like title, content, file and attachment URLs. This value does NOT reflect which revisions are listed on the website. Neither does 'edited' or any other metadata field (combinations).
I looked a bit deeper into this whole revisions thing and found that
Commit 3d68eda adds a |
I think we don’t need to pay too much attention to the revisions being listed on the website, we just need to save all the different versions (including all the files, images and content changes). I hope there is a switch that automatically merges the revisions with the same content, saves the earliest one in each group of identical revisions, and makes the latest group the ‘current version’. A post with different revisions
(I didn’t carefully compare the differences between the versions above, but the download link provided by the author in the first version of this post did disappear in the latest version. Here I assume that different ‘edited’ are different versions.) I am not sure how to use ‘revision_hash’ to achieve this goal (besides appending it to the file name). Could you please give me some hints? Do I need to write it to the database and set up some skip strategies? In addition, some authors will delete published images after a period of time. If you need an example, I can find one for you. |
I believe it's better to only download unique files from the post This is what I'm using for only downloading unique files from the post to
|
set 'revisions' to '"unique"' to have it ignore duplicate revisions
It is now possible to filter duplicate revisions by setting |
…f#5013) A SHA1 hexdigest of other relevant metadata fields like title, content, file and attachment URLs. This value does NOT reflect which revisions are listed on the website. Neither does 'edited' or any other metadata field (combinations).
set 'revisions' to '"unique"' to have it ignore duplicate revisions
I have carefully examined the “revisions” provided by the API and found that the website actually merges “revisions” with the same “edited” time, treating them as the same and using the earliest “revision_id”. I don’t know how to use a similar strategy to merge historical versions in gallery-dl. Can you support automatic merging of historical records with the same “edited” time in the new version (theoretically, their content should be exactly the same)?
#4706
#4727
In addition, I hope that the current version folder and historical versions will have different names, for example:
For the current version: “directory”: [“[{service}]{username}”, “[{date:Olocal/%Y%m%d}][{id}]{title}”],
For historical versions: “directory”: [“[{service}]{username}”, “[{date:Olocal/%Y%m%d}][{id}][{revision_id}]{title}”]
How can I write the configuration file correctly to achieve this goal?
The text was updated successfully, but these errors were encountered: