Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fantia] gallery-dl does not download images from new type 'Blog Post' content #2381

Open
SupaStaer opened this issue Mar 9, 2022 · 3 comments

Comments

@SupaStaer
Copy link

Fantia has recently added 'Blog Post', which is a new type of content for creators to use.
image

'Blog Posts' allow creators to embed images within their blog posts.
Currently, gallery-dl does not download any images embedded in 'Blog Posts'.

I have prepared an example of a post that uses the new 'Blog Post' content.
https://fantia.jp/posts/1166373

This post contains the following:
An eyecatcher in the required public portion of the post.
A public image gallery containing 2 images.
A public blog post with 2 images embedded
A fan only (free plan) blog post with 2 images embedded.

From the post, the images downloaded are the eye catcher and the two images in the public image gallery.
image
The other 4 images are not downloaded.

@thatfuckingbird
Copy link
Contributor

fantia.patch.txt

Here is a quick & dirty patch (generated by git diff) that adds support for blog posts + images.
I don't have time to make a proper PR now, but this hopefully makes @mikf 's job easier.
I haven't tried with the non-public one, but it gets the 2 images from the public post fine, so I think it should work for non-public posts too with cookies. Is it possible to add other type of content to a blog post (files, embeds)? If yes, that will probably need additional handling, can you check and make a test post for those too @SupaStaer ?

Implementation details:
Looks like for blog posts, they use the "comment" field of the content entry to store a JSON document (as text), that describes how to build up the actual content of the blog post. So we need to parse the "comment" field as JSON (for blogpost content type), then iterate over that parsed data and extract (1) image urls (2) the actual text of the blogpost.
Since these can interleaved, I first extract all the text, and then all the images, so the full text can be added to the metadata file for each image. I also save the original value of the "comment" field into "content_comment", similar to the other already existing content_* keys.
image

@SupaStaer
Copy link
Author

Thank you for the swift feedback!

I tried to add other types of files to the blog post, but was unable to do so.
It looks like it currently supports .jpg .jpeg .gif and .png. It might support additional image types.

It looks like there is an option to add html.
I was unable to cleanly use translation tools to see what this message says.
image

The final line mentions which html tags are available for use.
Since img tags are allowed, it seem to be possible to embed images in a blog post from another site.

I have added a new section at the bottom of the post to test how blog posts work with images uploaded to fantia and images loaded from html.
https://fantia.jp/posts/1166373

I used this html:
<img src="https://www.w3schools.com/images/lamp.jpg" alt="Lamp" width="32" height="32">

Embedding tweets and YouTube videos does not seem to currently work due to tag restrictions.

Since files cannot be added to blog posts, and only specific html tags can be used for importing html, I think that images are the only file to expect to be extracted from blog post content from fantia at this time.

@thatfuckingbird
Copy link
Contributor

Thanks for checking, I feared there would be other types of attachments for zips and so on. Lucky that's not the case.
Fortunately looks like the images/links are easy to extract too, the relevant part of the ops array looks like this for those:

{
  "ops": [
    {
      "insert": "Link to video:\n"
    },
    {
      "attributes": {
        "link": "https://www.youtube.com/watch?v=5SSdvNcAagI"
      },
      "insert": "https://www.youtube.com/watch?v=5SSdvNcAagI"
    },
    {
      "insert": "\n\nhtml img from another site:\n"
    },
    {
      "insert": {
        "image": "https://www.w3schools.com/images/lamp.jpg"
      }
    }
  ]
}

So they can be processed similarly to the fantiaImage entries in my patch.

mikf added a commit that referenced this issue Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants