Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeviantArt extractor not fetching huge amount of images #2488

Closed
Twi-Hard opened this issue Apr 10, 2022 · 8 comments
Closed

DeviantArt extractor not fetching huge amount of images #2488

Twi-Hard opened this issue Apr 10, 2022 · 8 comments

Comments

@Twi-Hard
Copy link

Twi-Hard commented Apr 10, 2022

I went to find a specific image in my downloads from an artist I downloaded but it wasn't there. Every artist I've downloaded was updated with --abort 9999999 within the last month. The artist has "2.6K Deviations" according to their deviantart page. I only had 1386 images. After downloading the artist again (with --abort 9999999) it downloaded much more (2471 images total). Downloading yet another time didn't get any more than it had before.

I then went to check via the api how many images there are total compared to how many there are locally across all 16433 artists.

# amount of normal local files = 4451103
# amount of local scraps = 249207
# amount of local stash files = 114347
# total normal, scrap and stash file count = 4814657
# total files for the artists I've downloaded according to the api = 5410507

So if you add up all of files including scraps and stash (I'm not sure stash would even be included in the image count in the api) I'm missing 595850 files which is 11% of them (an average of 36 arts per artist).
I've noticed in the past when comparing old and new folders that newer downloads were sometimes missing many files that were still on the site that the old folder had but I didn't look into it.

The file I was looking for that I mentioned in the beginning is this one which was uploaded on "Jan 16, 2013" according to the website https://www.deviantart.com/dm29/art/Squashed-Twilight-Bookmark-348896588
After I updated the artist download with --abort 9999999 yesterday that file was downloaded but with the modification date of "01-08-2018 8:26 AM"
The last file downloaded yesterday was this image uploaded from "Oct 28, 2004"
https://www.deviantart.com/dm29/art/IScream-11800313
The modification time gallery-dl gave it was "12-14-2017 7:15 PM"
Even after updating the download twice it failed to fetch many images (2471 normal files, 21 scraps, 69 stash, 2561 total).

Here's my config (I don't know what category-transfer does so it might not make sense where I put it, also the "folder" and "watch" sections are incomplete):

"deviantart":
        {
            "archive": null,
            "extra": false,
            "cookies": "/mnt/main/downloads/gallery-dl/cookies/deviantart-cookies.txt",
            "cookies-update": true,
            "comments": true,
            "auto-watch": false,
            "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.27 Safari/537.36",
            "include": [
                "gallery",
                "scraps"
            ],
            "folders": false,
            "download": true,
            "journals": "html",
            "mature": true,
            "metadata": true,
            "original": true,
            "quality": 100,
            "sleep-extractor": 5,
            "retries": -1,
            "refresh-token": null,
            "parent-metadata": "parent",
            "proxy": null,
            "oauth":
            {
                "browser": false,
                "cache": true,
                "port": 6414
            },
            "client-id": REDACTED,
            "client-secret": REDACTED,
            "base-directory": REDACTED,
            "category-transfer": true,
            "parent-directory": false,
            "skip": true,
            "directory": [
                "error"
            ],
            "deviation":
            {
                "category-transfer": false,
                "directory": [
                    "users",
                    "{author[username]}"
                ],
                "download": true,
                "extra": true,
                "flat": true,
                "folders": true,
                "metadata": true,
                "skip": "abort"
            },
            "gallery":
            {
                "category-transfer": false,
                "directory": [
                    "users",
                    "{author[username]}"
                ],
                "download": true,
                "extra": true,
                "flat": true,
                "folders": true,
                "metadata": true,
                "skip": "abort"
            },
            "user":
            {
                "category-transfer": false,
                "directory": [
                    "users",
                    "{author[username]}"
                ],
                "download": true,
                "extra": true,
                "flat": true,
                "folders": true,
                "metadata": true,
                "skip": "abort"
            },
            "journal":
            {
                "category-transfer": false,
                "directory": [
                    "users",
                    "{author[username]}",
                    "Journal"
                ],
                "download": true,
                "extra": true,
                "flat": true,
                "folders": true,
                "metadata": true,
                "skip": "abort"
            },
            "scraps":
            {
                "category-transfer": false,
                "directory": [
                    "users",
                    "{author[username]}",
                    "Scraps"
                ],
                "download": true,
                "extra": true,
                "flat": true,
                "folders": true,
                "metadata": true,
                "skip": "abort"
            },
            "stash":
            {
                "category-transfer": false,
                "directory":
                {
                    "'parent' in locals() and parent['username'] == author['username']": [
                        "users",
                        "{author[username]}",
                        "stash"
                    ],
                    "'parent' in locals() and parent['username'] != author['username']": [
                        "users",
                        "_stash",
                        "{author[username]}"
                    ],
                    "": [
                        "users",
                        "_stash",
                        "_error",
                        "{author[username]}"
                    ]
                },
                "download": true,
                "extra": true,
                "flat": true,
                "folders": true,
                "metadata": true,
                "skip": true
            },
            "favorite":
            {
                "category-transfer": false,
                "directory": [
                    "favorites",
                    "{author[username]}",
                    "Favorites"
                ],
                "download": true,
                "extra": false,
                "flat": false,
                "folders": false,
                "metadata": true,
                "skip": true
            },
            "collection":
            {
                "directory": [
                    "favorites",
                    "{collection[owner]}",
                    "{collection[title]}",
                    "{author[username]}"
                ],
                "download": true,
                "extra": false,
                "flat": true,
                "folders": false,
                "metadata": true,
                "skip": true
            },
            "folder":
            {
                "#directory": [
                    "groups",
                    "{folder[owner]}",
                    "{folder[title]}",
                    "{author[username]}"
                ],
                "directory":
                {
                    "folder['owner'] == username and author['type'] == 'banned'": [
                        "test"
                    ],
                    "'parent' in locals() and parent['username'] != author['username']": [
                        "aaausers",
                        "_stash",
                        "{author[username]}"
                    ],
                    "": [
                        "aaausers",
                        "_stash",
                        "_error",
                        "{author[username]}"
                    ]
                },
                "download": true,
                "extra": true,
                "flat": true,
                "folders": false,
                "metadata": true,
                "skip": true
            },
            "watch":
            {
                "directory": [
                    "users",
                    "{author[username]}"
                ],
                "download": true,
                "extra": true,
                "flat": true,
                "folders": true,
                "metadata": true,
                "skip": "abort"
            },
            "following":
            {}
        },

Edit: I forgot to mention that the debug log looks like it ended properly, no error or warning.

@AlttiRi
Copy link

AlttiRi commented Apr 10, 2022

The site count number counts the deleted/banned images (as I assume).

The real count of the images available to download you can compute by yourself:

https://www.deviantart.com/dm29/gallery/all?page=103
24 (per page) * 102 + 23 (on the last page) === 2471 total. (while on the site "2478 deviations" is displayed)


https://www.deviantart.com/dm29/art/Squashed-Twilight-Bookmark-348896588
https://www.deviantart.com/dm29/art/IScream-11800313

I have no problem with mtime for these images.

@Twi-Hard
Copy link
Author

Downloads stopping prematurely is still definitely a problem. I've seen it a bunch of times in the past. It also took 3 full downloads of this artist to get the rest of the 1000+ images.

@Hrxn
Copy link
Contributor

Hrxn commented Apr 11, 2022

What even is the point of --abort 9999999?

Simply use "skip": true instead..

Also, I think you should be using an archive file for dA, especially if there are 1000+ files involved etc.

@Twi-Hard
Copy link
Author

Twi-Hard commented Apr 11, 2022

I normally want it to abort when it comes across a file that was already downloaded. Instead of editing the config to "skip": true on the very rare occasion I'd want to skip files, I could just use --abort 9999999 to override the config.
I've been planning on making an archive file but haven't gotten to it yet because the computer my data is on was in the middle of an upgrade since November. And then there were network issues so I couldn't access it reliably until I replaced the NIC. The downloads were updated when there weren't these problems anymore.
Both of these things have nothing to do with the issue. There's been many cases where trying to download an artist only got part of it. The missing files still existed on the site.
Edit:

Also, I think you should be using an archive file for dA, especially if there are 1000+ files involved etc.

It only has to come across one file that's already been downloaded before it stops. It's only needed to skip all of these files because of this issue I made this issue for.

@Twi-Hard
Copy link
Author

Twi-Hard commented Apr 27, 2022

I wish I got a notification for that. I'll see if it fixes it. Thank you :)

@Twi-Hard
Copy link
Author

It's not working for me. It increases the offset by 1 per request and it goes infinitely.

Here's the output for an account with only 11 images as an example:

❯ gdl https://www.deviantart.com/uasmi
[gallery-dl][debug] Starting DownloadJob for 'https://www.deviantart.com/uasmi'
[deviantart][debug] Using DeviantartUserExtractor for 'https://www.deviantart.com/uasmi'
[deviantart][debug] Using DeviantartGalleryExtractor for 'https://www.deviantart.com/uasmi/gallery'
[deviantart][debug] Using custom API credentials (client-id REDACTED)
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.deviantart.com:443
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/user/profile/uasmi HTTP/1.1" 200 384
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/all?username=uasmi&offset=0&limit=24&mature_content=true HTTP/1.1" 200 4828
[deviantart][debug] Switching to private access token
[deviantart][info] Refreshing private access token
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "POST /oauth2/token HTTP/1.1" 200 180
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/all?username=uasmi&offset=0&limit=24&mature_content=true HTTP/1.1" 200 4836
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/deviation/metadata?deviationids%5B0%5D=83E1485E-6021-BC06-C6B1-073CF2DB0F4F&deviationids%5B1%5D=1EC54298-EEBA-7B0C-24A7-F67D90EBFB13&deviationids%5B2%5D=9293C64F-5B40-02F4-8BE2-D1C777FE2D96&deviationids%5B3%5D=67C94C3F-2BDB-D7BC-F6BD-07A47A77E960&deviationids%5B4%5D=5F28FCA8-26AB-D7FA-8DA4-96448FE057C0&deviationids%5B5%5D=F4C6FC0D-4664-51CC-0637-112B10FEB122&deviationids%5B6%5D=6848C6C7-A28F-85DF-7347-1929B3F53A60&deviationids%5B7%5D=09C019FB-A0DE-F44C-8FDC-CEADE0BDEB0C&deviationids%5B8%5D=FB9EC7FC-9073-0F9A-E079-32ECB87FCFC7&deviationids%5B9%5D=50DCD628-14F1-808B-4E5F-DC2FB7CC57B5&deviationids%5B10%5D=71C0D438-40E4-3833-A2CB-00CC6EA520F6&mature_content=true HTTP/1.1" 200 797
[deviantart][info] Collecting folder information for 'uasmi'
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=0&limit=50&mature_content=true HTTP/1.1" 200 1790
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=2&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=3&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=4&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=5&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=6&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=7&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=8&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=9&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=10&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=11&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=12&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=13&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=14&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=15&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=16&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=17&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=18&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=19&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=20&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=21&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=22&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=23&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=24&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=25&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=26&limit=50&mature_content=true HTTP/1.1" 200 1329
[urllib3.connectionpool][debug] https://www.deviantart.com:443 "GET /api/v1/oauth2/gallery/folders?username=uasmi&offset=27&limit=50&mature_content=true HTTP/1.1" 200 1329

The issue seems to be the "folders" option. That option is really important for what I'm doing so hopefully this is fixable.
Thanks :)

@Twi-Hard
Copy link
Author

I said it worked and didn't work at the same time which might have been confusing. The change you made doesn't work with the "folders" option set to true which is important for me. Is that possible to fix?

@mikf
Copy link
Owner

mikf commented May 27, 2022

I apologize for taking this long to properly respond, too many thing going on irl ...

Anyway, the issue with folder listings is fixed in 4f7fe9b. I should have been testing with your config to begin with, and not just with default settings, sorry.

I'm also not sure if "pagination": "manual" even helps with this issue in the first ploace. The only thing that I can think of that would cause gallery-dl to stop prematurely is if the has_more value reported by dA's API is wrong.

This option is meant as a workaround for that, but I haven't encountered a case were it would be necessary myself. Every account with 3k+ deviations that I've tried returned as many with gallery-dl as can be viewed on the website itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants