-
-
Notifications
You must be signed in to change notification settings - Fork 975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Unshorten shortened URLs when using --write-metadata? #1532
Comments
Something I noticed a while ago when looking for mediafire links on Twitter is that Twitter knows where the URLs redirect to (including bit.ly links). |
In For example, this tweet has an Though for some reason, the tweet I mentioned has a second URL in |
Done (41457db), and thanks for pointing out that the data provides by Twitter already has the expanded URLs, otherwise I might have used HEAD requests for this.
Each Tweet with images or videos has a t.co URL that links to itself for some reason. |
Damn, this looks like a useful change. I wish there were a way to go back and resolve all the t.co urls in all the metadata files I currently have. And yeah, the useless URLs should definitely be removed. |
They probably should be, since anyone going through and parsing it would find it a pain to both figure out where these URLs are coming from and reliably remove them
Is |
The 'full_text' of Tweets with media content usually ends with a t.co link to itself. This commit removes those.
So it turns out that t.co links aren't expanded in, say, author descriptions I'm not sure if it's as simple to fix those as it was for tweet contends, but I don't imagine it wouldn't be |
Yep, in I'll work on the patch sometime tomorrow if you don't get to it first (Sidenote: Should there be a |
You mean one that triggers for all events and not just files? There needs to be a default filename format string for things that aren't files first, e.g. one for a Tweet's metadata independent of any actual files.
No, unless the metadata post processor is configured to trigger on event
Just use |
My main reason for wanting this is in case t.co links no longer work when the tweet that created them gets deleted. It'd be nice to preserve not just the images but also the links in the description.
I know this is possible for at least Twitter, since I made a Tampermonkey script to go through the <a> element's children to find the full URL (see bottom). This should also be trivial for websites that do
xyz.com/redirect?url=abc.com
. Whether or not it should support bit.ly and stuff is for someone else to decideIdeally it'd add a new key like "raw_description" to the written JSON file to preserve backwards compatibility
Relevant part of the tampermonkey script I mentioned (pardon the jank):
The text was updated successfully, but these errors were encountered: