Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fanbox gallery images downloading out of order (alphabetical hash) #2718

Open
b51de opened this issue Jun 29, 2022 · 6 comments
Open

Fanbox gallery images downloading out of order (alphabetical hash) #2718

b51de opened this issue Jun 29, 2022 · 6 comments

Comments

@b51de
Copy link

b51de commented Jun 29, 2022

I put the proper command in and it downloads the contents fine, but incorrectly in 1-9,A-Z alphabetical order of the Fanbox-side filenames which are all hashes, outputting totally out of order as [xxxxxxx]_1 and so on (the thumbnails come out as _0 just fine). I'll just include what I can and hope the solution is some totally simple filename option I've overlooked like an idiot. Sorry if the formatting isn't what it should look like.
gallery-dl -v "https://www.fanbox.cc/@foobar/posts/!@#$%^" --cookies C:\Users\xxx\Desktop\cookies.txt

[gallery-dl][debug] Version 1.20.4 - Executable
[gallery-dl][debug] Python 3.7.9 - Windows-10-10.0.18362
[gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8
[gallery-dl][debug] Starting DownloadJob for 'https://www.fanbox.cc/@foobar/posts/!@#$%^'
[fanbox][debug] Using FanboxPostExtractor for 'https://www.fanbox.cc/@foobar/posts/!@#$%^'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.fanbox.cc:443
[urllib3.connectionpool][debug] https://api.fanbox.cc:443 "GET /post.info?postId=!@#$%^ HTTP/1.1" 200 3567
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): pixiv.pximg.net:443
[urllib3.connectionpool][debug] https://pixiv.pximg.net:443 "GET /c/1200x630_90_a2_g5/fanbox/public/images/post/!@#$%^/cover/dI89euiV6NES3s9I4ZCXNYRC.jpeg HTTP/1.1" 200 364868

  • .\gallery-dl\fanbox\foobar!@#$%^_0.jpg
    [urllib3.connectionpool][debug] Starting new HTTPS connection (1): downloads.fanbox.cc:443
    [urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1833133/140KfbnjlFMU5jAe7IgxqHKG.jpeg HTTP/1.1" 200 None
  • .\gallery-dl\fanbox\foobar!@#$%^_1.jpg
    [urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1833133/31qnHq0LNGqWEUMHB7XGE56d.jpeg HTTP/1.1" 200 None
  • .\gallery-dl\fanbox\foobar!@#$%^_2.jpg
    {and so on. again, it first downloaded the images with hash filenames starting with 1 and then 3}

Netscape HTTP Cookie File

.fanbox.cc TRUE / TRUE 1659054592 FANBOXSESSID {omit}
.fanbox.cc TRUE / FALSE 1719534064 _ga GA1.2.1664558252.1656429579
.fanbox.cc TRUE / FALSE 1656462128 _gat_gtag_UA_1830249_145 1
.fanbox.cc TRUE / FALSE 1664205577 _gcl_au 1.1.1001063367.1656429577
.fanbox.cc TRUE / FALSE 1656548464 _gid GA1.2.1550279713.1656429579
.fanbox.cc TRUE / TRUE 1814109576 p_ab_d_id 141724759
.fanbox.cc TRUE / TRUE 1814109576 p_ab_id 9
.fanbox.cc TRUE / TRUE 1814109576 p_ab_id_2 0
.fanbox.cc TRUE / TRUE 1719501844 privacy_policy_agreement 3
.fanbox.cc TRUE / TRUE 1719534592 privacy_policy_notification 0
`

@mikf
Copy link
Owner

mikf commented Jul 1, 2022

in 1-9,A-Z alphabetical order

That's just a coincidence. Take https://www.fanbox.cc/@xub/posts/1910054 (NSFW) as a counter example. All files here have the same order as on fanbox itself, and the file hashes are in reverse order.

[urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1910054/H9tzdb9dUvibBnPGl25z0CDb.png HTTP/1.1" 200 1325597
/tmp/fanbox/xub/1910054_0.png
[urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1910054/7wbWoWIg41Vgba7SQZxoJrXp.png HTTP/1.1" 200 1312868
/tmp/fanbox/xub/1910054_1.png
[urllib3.connectionpool][debug] https://downloads.fanbox.cc:443 "GET /images/post/1910054/5lpx9SqlnPzbwHrfkHgjnD0D.png HTTP/1.1" 200 1899017
/tmp/fanbox/xub/1910054_2.png

gallery-dl first downloads the cover image, followed by html embeds, images, files, and external embeds.

Maybe that order is wrong for certain posts?
An option to customize the order should be easy enough to implement.

@b51de
Copy link
Author

b51de commented Jul 5, 2022

Well, I lost my log from the dozens of galleries I downloaded and I'm not subbed to anything this month. Can't exactly verify anything.

Is this a change that was made in the last few months and just nobody has called attention to it? Or is there some kind of command line way to download in a proper sequential order?

@mikf
Copy link
Owner

mikf commented Jul 8, 2022

The file order was like this since fanbox support got added in #1459.

There have only been 2 real commits to fanbox.py which changed anything of importance since then, and they didn't touch anything related to file order:

* f31ab0d2 [fanbox] fetch data for each individual post (fixes #2388)
* 22b04339 [fanbox] support pixiv redirects (closes #2122)

to download in a proper sequential order?

What is the proper order? From all the examples that I have seen, the current order is fine as is, and I do not know what to change unless you (or someone) gives a proper example.

@shrublet
Copy link

shrublet commented Jan 1, 2023

Recently also ran into this issue. I'm not really sure how to diagnose it, but I found a post with replicable incorrect order. https://mochirong.fanbox.cc/posts/3746116 (NSFW) will download in alphabetical hash order instead of the order it appears in the article. I noticed while printing out the content_body in the fanbox extractor that imageMap seems to be ordered alphabetically vs how the image blocks are ordered (correctly). Hope this helps.

@shrublet
Copy link

shrublet commented Jan 1, 2023

Ok I made a really shitty workaround that I'm sure somebody can retrofit or implement in a smarter way. In short, I created a new post entry called "order" that's a list based on the blocks found in content_body that contain imageId. I then use this list of image ids to reorder the imageMap. I don't think this solution is great, but I'll include relevant snippets just in case.

Changed these lines to this -

if content_body:
    if "html" in content_body:
        post["html"] = content_body["html"]
    if post["type"] == "article":
        post["articleBody"] = content_body.copy()
    if "blocks" in content_body:
        content = []
        order = []
        append = content.append
        for block in content_body["blocks"]:
            if "text" in block:
                append(block["text"])
            if "links" in block:
                for link in block["links"]:
                    append(link["url"])
            if "imageId" in block:
                order.append(block["imageId"])
        post["content"] = "\n".join(content)
        post["order"] = order

Added this snippet above this line -

if "imageMap" in content_body:
    reordered_imageMap = {k: content_body["imageMap"][k] for k in post.get("order")}
    content_body["imageMap"] = reordered_imageMap

@mikf
Copy link
Owner

mikf commented Jan 5, 2023

Thanks for the example post and code. This should be fixed in 7d6c846 for images, but I think this might also be an issue with fileMap files. They are most likely handled in the same way as imageMap ones, but I'm not entirely sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants