Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter: "content" metadata information does not include emoticons #338

Closed
Defrost4528 opened this issue Jul 16, 2019 · 2 comments
Closed

Comments

@Defrost4528
Copy link

First of all, thank you very much for the fast implementation of the other request! Much appreciated.

I've been testing out the latest version, and it seems to work almost perfectly. Just some things I noticed...

(can be observed using https://twitter.com/yumi_san0112/status/1151144618936823808)

  1. Emoticons are not properly copied and are replaced with a space. Looking at the HTML code, they seem to be represented by an img element, with the proper unicode equivalent in the "alt" attribute, so maybe that can be utilized.
  2. Line breaks are replaced by a space. This isn't a big deal, and honestly I'm not well versed enough in JSON to know if there's an elegant representation for this, so it's just an observation.

Once again, thank you! With the addition to the content text, this program is the perfect solution to backing up Twitter artists, especially important since some accounts seem to come and go. This has made my life so much easier.

mikf added a commit that referenced this issue Jul 17, 2019
- include emoticons
- leave newlines intact
- remove pic.twitter.com/ links at the end
@mikf
Copy link
Owner

mikf commented Jul 17, 2019

Emoticons and newlines are now preserved, and it also removes the pic.twitter.com URL at the end. Since the content parsing has become a bit more complicated and is probably not used by the majority of users, I've put it behind the twitter.content option.

Please test the current implementation and see if this better suits your expectations.
Also thank you for providing an example, makes this a lot easier for me.

And since you are talking about Twitter backup: make sure you are (also) using twitter.com/<user>/media as input and remember that gallery-dl can't necessarily get all tweets of a timeline thanks to Twitter's own restrictions (see #186 and #314). Projects like twint and whatever you find for twitter backup on GitHub might be better suited at preserving Twitter timelines.

@Defrost4528
Copy link
Author

I've tested it thoroughly, and it seems to work great now! Since newlines were added too, I also tested the backslash escaping just to be safe, and it works fine. Thanks!

As for twint, I'll definitely look at that. However, at first glance it seems like a complex method for my simple need (and probably takes some tweaking to get it to download images). My focus is on batch archiving artist's images in an organized way, which your program already practically perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants