[fantia] gallery-dl does not download images from new type 'Blog Post' content #2381

SupaStaer · 2022-03-09T16:54:16Z

Fantia has recently added 'Blog Post', which is a new type of content for creators to use.

'Blog Posts' allow creators to embed images within their blog posts.
Currently, gallery-dl does not download any images embedded in 'Blog Posts'.

I have prepared an example of a post that uses the new 'Blog Post' content.
https://fantia.jp/posts/1166373

This post contains the following:
An eyecatcher in the required public portion of the post.
A public image gallery containing 2 images.
A public blog post with 2 images embedded
A fan only (free plan) blog post with 2 images embedded.

From the post, the images downloaded are the eye catcher and the two images in the public image gallery.

The other 4 images are not downloaded.

thatfuckingbird · 2022-03-09T18:04:07Z

fantia.patch.txt

Here is a quick & dirty patch (generated by git diff) that adds support for blog posts + images.
I don't have time to make a proper PR now, but this hopefully makes @mikf 's job easier.
I haven't tried with the non-public one, but it gets the 2 images from the public post fine, so I think it should work for non-public posts too with cookies. Is it possible to add other type of content to a blog post (files, embeds)? If yes, that will probably need additional handling, can you check and make a test post for those too @SupaStaer ?

Implementation details:
Looks like for blog posts, they use the "comment" field of the content entry to store a JSON document (as text), that describes how to build up the actual content of the blog post. So we need to parse the "comment" field as JSON (for blogpost content type), then iterate over that parsed data and extract (1) image urls (2) the actual text of the blogpost.
Since these can interleaved, I first extract all the text, and then all the images, so the full text can be added to the metadata file for each image. I also save the original value of the "comment" field into "content_comment", similar to the other already existing content_* keys.

SupaStaer · 2022-03-09T19:01:51Z

Thank you for the swift feedback!

I tried to add other types of files to the blog post, but was unable to do so.
It looks like it currently supports .jpg .jpeg .gif and .png. It might support additional image types.

It looks like there is an option to add html.
I was unable to cleanly use translation tools to see what this message says.

The final line mentions which html tags are available for use.
Since img tags are allowed, it seem to be possible to embed images in a blog post from another site.

I have added a new section at the bottom of the post to test how blog posts work with images uploaded to fantia and images loaded from html.
https://fantia.jp/posts/1166373

I used this html:
<img src="https://www.w3schools.com/images/lamp.jpg" alt="Lamp" width="32" height="32">

Embedding tweets and YouTube videos does not seem to currently work due to tag restrictions.

Since files cannot be added to blog posts, and only specific html tags can be used for importing html, I think that images are the only file to expect to be extracted from blog post content from fantia at this time.

thatfuckingbird · 2022-03-09T19:25:09Z

Thanks for checking, I feared there would be other types of attachments for zips and so on. Lucky that's not the case.
Fortunately looks like the images/links are easy to extract too, the relevant part of the ops array looks like this for those:

{
  "ops": [
    {
      "insert": "Link to video:\n"
    },
    {
      "attributes": {
        "link": "https://www.youtube.com/watch?v=5SSdvNcAagI"
      },
      "insert": "https://www.youtube.com/watch?v=5SSdvNcAagI"
    },
    {
      "insert": "\n\nhtml img from another site:\n"
    },
    {
      "insert": {
        "image": "https://www.w3schools.com/images/lamp.jpg"
      }
    }
  ]
}

So they can be processed similarly to the fantiaImage entries in my patch.

@thatfuckingbird

from @thatfuckingbird with small adjustments #2381 (comment)

mikf added the site:feature label Mar 11, 2022

mikf added a commit that referenced this issue Mar 11, 2022

[fantia] apply patch (#2381)

e64c2b8

from @thatfuckingbird with small adjustments #2381 (comment)

mikf added the partially-done label Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fantia] gallery-dl does not download images from new type 'Blog Post' content #2381

[fantia] gallery-dl does not download images from new type 'Blog Post' content #2381

SupaStaer commented Mar 9, 2022

thatfuckingbird commented Mar 9, 2022

SupaStaer commented Mar 9, 2022

thatfuckingbird commented Mar 9, 2022

[fantia] gallery-dl does not download images from new type 'Blog Post' content #2381

[fantia] gallery-dl does not download images from new type 'Blog Post' content #2381

Comments

SupaStaer commented Mar 9, 2022

thatfuckingbird commented Mar 9, 2022

SupaStaer commented Mar 9, 2022

thatfuckingbird commented Mar 9, 2022