-
-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deviantart literature items are no longer being downloaded, the html ends up nearly empty #6207
Comments
Duplicate of #6196 |
This is not quite a duplicate I meant literature items in the gallery not status or journals... like.. https://www.deviantart.com/tag/literature if you uh open something random from that search result you get something like NSFW warning: https://www.deviantart.com/milflover5335/gallery/93420151/stories-literature they're gallery items not journals or posts.... but uh yeah... them being broken they're likely broken in the same manner? I guess? Still my issue is uh a slight expansion of the other comments |
also it should totally be fixable gallery literature items can be downloaded manually in chrome by using save as the entire page and saving it as html... maybe uh posts and journals can't be fixed because they're not inside gallery at all? but literature deviations are inside gallery? |
I didn't realize DA has "literature items", which are somehow not the same as journals. Nonetheless, they do internally get processed the same way as journals do, meaning they use the same API endpoint to get their full text content, which is currently broken.
I'll probably find some workaround, but this wouldn't be necessary in the first place if DA wouldn't break their site ... |
When this gets fixed, would it be possible to add an option to overwrite journals, statuses and literature that when it was downloaded was empty? |
testing for emptiness is probably just a waste of effort, you could just delete all of them from the past few months and/or overwrite all of them, they're small files being just text html files. |
Okay so how do I have it where I rescrap an account, have it skip all submission that aren't text/html? |
In general
but it might be better to directly use a user's |
For informational purposes to anyone having to delete broken html-downloads - I don't know when the site-change took place exactly, but it must've been after August 20th, as I still have proper journal downloads for that day. If anybody has a later date for reference, feel free to post it here for the benefit of those having to delete broken html files for re-download. |
This issue is not fixed, it might be fixed for posts, but NOT for literature deviations. The fix has changed the error though from a blank story to like 2 paragraphs of story out of 12 paragraphs that said literature deviation might have... I cannot find any examples of lengthy literature deviations that work with this new fix. In essence this fix took me up from grabbing 1kilobyte .htm files to grabbing 4kilobyte .htm files when what was needed was a 24 kilobyte .htm file... if that makes any sense.. The newest release might have fixed the issue that this issue is a duplicate of (journal posts) but it didn't fix literature deviation posts (possibly because journal posts are typically far shorter?) |
Yes I can corroborate this, thing is this issue also happens to me when I just browse DA normally so I have to reload the page normally. I think its because the workaround is done in a way to mimick normal browsing anot through the API, so you might have to just download at non-peak hours of the site to mitigate the literature being only 2 paragraphs. |
Literature submissions in stash will also cause an error message and not be downloaded at all. This may have been overlooked since it isn't possible anymore to make new literature uploads into stash since the Eclipse update, but that doesn't mean those don't still exist, example here: https://sta.sh/09z3557z648 |
I mean, if this even happens for you on DA in the browser.. Site is just a broken mess since a year or so. |
I downloaded the nightly Windows build , but this problem is still not completely fixed on DA. Some HTML literature files in galleries' download completely. But some still download as incomplete 4k files. This happens in the same gallery, and there doesn't seem to be any pattern. There are also a lot of errors about: |
"This deviation has been labeled as containing themes not suitable for all deviants." Guess I really do have to process
This is a workaround and wouldn't be necessary at all if DA didn't break its website yet again. |
"This is a workaround and wouldn't be necessary at all if DA didn't break its website yet again." I totally sympathize. Gallery-dl is a great tool and I totally appreciate the work that you're putting into it. Sincerely, thanks! And yes, DA sux donkey balls. |
Generating HTML from The generated HTML is not 100% accurate (some whitespace is somehow different, maybe |
Thanks for the fix. However, when I tried to run it, I immediately got an error: |
- support literature link embeds - support @ mentions - support more text styles
@geoffk777 fixed in cfb7b3d. All literature of https://www.deviantart.com/Springbokkx is now downloadable without errors. |
Is there a way to download an exe that uses commit cfb7b3d or do I have to wait a week for the next release? I swear there used to be a way to download builds from github via some obscure link to click on that I can no longer remember or find in the UI (I don't have the tools to build myself at this time.) |
@left1000 https://github.com/gdl-org/builds/releases If you are already using an exe with version 1.27.0 or higher, you can use
|
@Mlkf Thanks!! I downloaded a number of different DA literature galleries and confirmed that the current build seems to fix all of the problems. So this issue can finally be closed. Until DA screws it up again.... |
remind me please what is the flag to ignore archive.sqlite3 and recheck all files and not redownload files that already exist? |
actually this fix not only fixes the past 55 or so days it was totally broken but downloads a superior .htm file to the ones it's been downloading for years (has more formatting data or some such? looks a bit better, isn't taking up 70% of the screen weirdly) so actually uh what's the flag to uh download all .htm files regardless of if they're repeats?
or is it |
It now includes DA's current gallery-dl/gallery_dl/extractor/deviantart.py Lines 2040 to 2042 in cfb7b3d
|
|
Ahh |
deviantart literature items are no longer being downloaded, the html ends up nearly empty
There's a tiny bit of html for the page layout but the entire contents of the literature post is empty and missing, it worked for years, broke sometime in the past 1-2 months... I kept running rip-updates without noticing since the files weren't totally blank... not sure when/what broke it
Anyone know what I'm talking about?
The text was updated successfully, but these errors were encountered: