Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-fix Redgifs #659

Merged
merged 7 commits into from
Sep 20, 2022
Merged

Re-fix Redgifs #659

merged 7 commits into from
Sep 20, 2022

Conversation

Soulsuck24
Copy link
Contributor

@Soulsuck24 Soulsuck24 commented Sep 13, 2022

API seems to return incorrect signature value when sending header. Other fixes seems to have worked temporarily but have stopped working so they're removed.

should fully fix #652 and #661

API seems to return incorrect signature value when sending header. Other fixes seems to have worked temporarily but have stopped working so they're removed.
@KylarZeppeli
Copy link

I replaced my local installation of the repo's current site_downloaders/redgifs.py file with the one from this pull request.
It does make downloads from RedGifs possible again, but there is a program breaking bug.
I did some testing and it appears that the program crashes/downloads a html file instead when it attempts to download an image post from RedGifs.
Only one other download from RedGifs from the test run that I did was an image post & it downloaded not the images on the page but the html of the page itself.
It appears to be a direct link to a JPEG, but it redirects to the webpage.

https://i.redgifs.com/i/oddballfemalehousefly.jpg --> https://www.redgifs.com/watch/oddballfemalehousefly (BDFR Downloaded the HTML of page instead of images on page) (2 images here)

https://www.redgifs.com/watch/willingwindyyellowbelliedmarmot (BDFR crashed upon attempting downloading from this link) (4 images here)

The videos appear to be downloading perfectly fine.

python -m bdfr download --log ./Log -S new -s tgirls .

[ 350 successful downloads ]

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/lib/python3.10/site-packages/bdfr/main.py", line 160, in
cli()
File "/usr/lib/python3.10/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/usr/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/lib/python3.10/site-packages/bdfr/main.py", line 89, in cli_download
reddit_downloader.download()
File "/usr/lib/python3.10/site-packages/bdfr/downloader.py", line 45, in download
self._download_submission(submission)
File "/usr/lib/python3.10/site-packages/bdfr/downloader.py", line 92, in _download_submission
content = downloader.find_resources(self.authenticator)
File "/usr/lib/python3.10/site-packages/bdfr/site_downloaders/redgifs.py", line 21, in find_resources
media_urls = self._get_link(self.post.url)
File "/usr/lib/python3.10/site-packages/bdfr/site_downloaders/redgifs.py", line 49, in _get_link
headers=headers,
NameError: name 'headers' is not defined

@Soulsuck24
Copy link
Contributor Author

I will look into that shortly, I haven't run into it yet and didn't think to test it as it looked like it should work from what I had seen in the api. Can you post the reddit ID's for those links? (6 character, likely starts with x)

@Soulsuck24
Copy link
Contributor Author

Okay, seems I pulled a stupid and only removed the headers from one of the two retrieve_url. Can you try with the new changes to make sure it works correctly for you?

@KylarZeppeli
Copy link

HTML was downloaded instead of image: xextbj (has .jpg file extension but actual contents is HTML)
Crashed BDFR: xc49v8

Just updated the redgifs.py file
The one that crashed BDFR doesn't crash it anymore & the image now downloads sucessfully. The other one is still downloaded as HTML but that's a minor issue I suppose

@twentyonerooms87
Copy link

twentyonerooms87 commented Sep 16, 2022

Fix 652 worked for me but I only tested with videos not images. Thank you Soulsuck24! I will test 659 as well.

@Soulsuck24
Copy link
Contributor Author

HTML was downloaded instead of image: xextbj (has .jpg file extension but actual contents is HTML) Crashed BDFR: xc49v8

Just updated the redgifs.py file The one that crashed BDFR doesn't crash it anymore & the image now downloads sucessfully. The other one is still downloaded as HTML but that's a minor issue I suppose

Looks like they're both galleries so if one works the other should too, hmmm. I'll see what I can figure out.

@KylarZeppeli
Copy link

KylarZeppeli commented Sep 16, 2022

To be fair I've had the HTML issue for over a year. It's not just from redgifs though, I remember getting an HTML file from xvideos & onlyfans + random websites before. Occasionally had .com extensions. It's not really a big deal.

@KylarZeppeli
Copy link

I did some more tests & found 24 more HTML files disguised as JPEG files, and every single one of them had links that APPEARED to be direct links to a file but were actually redirect links
It's using the direct downloader instead of the RedGifs or Imgur downloader because it appears to be a direct link to a file but it's actually a redirection to a regular webpage on RedGifs or Imgur

[2022-09-16 15:51:02,137 - bdfr.downloader - DEBUG] - Using Direct with url https://imgur.com/a/OGGGZkd.jpg
[2022-09-16 15:51:42,845 - bdfr.downloader - DEBUG] - Using Direct with url https://www.redgifs.com/watch/smartvillainousspiketail.jpg
[2022-09-16 15:52:03,256 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redgifs.com/i/allunselfishchameleon.jpg

The https://www.redgifs.com links appear to be broken links that redirect to a nonexistant webpage
All the https://imgur.com & https://i.redgifs.com links redirect to a real webpage, however

@Soulsuck24
Copy link
Contributor Author

Soulsuck24 commented Sep 16, 2022

For now I'd say use --skip-domain=redgifs.com till it's fully solved. But, progress is being made at least. It seems the url is being stripped of the new expires, signature and for parameters before attempting to download. So just need to find out how to correct that without breaking other things.

Maybe, I'm not sure if that's what it is anymore... sigh

@KylarZeppeli
Copy link

I thought the problem was that it's using the direct downloader instead of the site specific downloader because the link appears to be a direct link to a file but it's actually a redirect so the direct downloader is not usable but it's already been selected because of the link appearing to point directly to a file. Was I wrong?
Stuff from redgifs can't be downloaded with something that can only download files directly so it makes sense that the only thing it could grab would be the HTML page
Like if you run curl or wget on a redgifs post or a reddit post you're not gonna get the image/video conent, you're gonna get the HTML of the webpage.
I'm assuming that's what's going on here.
I'm not a programmer but I assume you'd have to basically make it ignore direct links to a file (or ones that appear to be such), which could break the downloader.
Like I said, it's not a big deal & it's actually fairly rare. And if I really want the post I can directly download it with yt-dlp anyway.

@KylarZeppeli
Copy link

The command below downloads the same exact file that BDFR downloads for the reddit post with the id of xextbj
curl -Lo "xextbj" "https://i.redgifs.com/i/oddballfemalehousefly.jpg"
And BDFR gives it a .jpg extension because the URL has a .jpg extension.
Why would it have a .jpg extension if it wasn't a jpeg?
It's because RedGifs has decided to make their redirect pages look like a direct link to a file, in this instance, a jpeg.
Hope I'm being clear with what I believe is the issue.

@Soulsuck24
Copy link
Contributor Author

It seems to be a contribution from both. Still working away at it. I thought I had sorted it out but it's still doing weird things.

@KylarZeppeli
Copy link

I just noticed that https://i.redgifs.com/i/oddballfemalehousefly.jpg is the link that is posted on the post over at xextbj
So the error only happens when people choose to post the redirect link instead of the actual link
I think that's all I have to say about this issue, hope you figure out how to fix it!

If this doesn't work then I give up...
@Soulsuck24
Copy link
Contributor Author

Okay, I do believe I've got it fixed now. It turns out it was nothing to do with the parameters. Needed to move up the Redgifs option in the download factory and then figure out why the re.match wasn't working the way that it was already (honestly no idea why it needed that change to work with those links). So this should fix the Redgifs module. I will look into the Imgur one that was posted later. I'm done with this repo for the night lol

@KylarZeppeli
Copy link

As far as I can tell, RedGifs downloads are working perfectly now.
Would fixing the Imgur downloader be a similar procedure?

@Serene-Arc
Copy link
Owner

Hey thanks for all this work, I've been a bit swamped. I gotta update the tests but if they go fine I can merge it.

@Soulsuck24
Copy link
Contributor Author

Soulsuck24 commented Sep 17, 2022

No trouble, it seems like it was mostly edge cases that weren't being taken care of neatly. I'll be the first to admit I don't really run the tests in the repo so would be more likely to break them than update them properly so I'll leave that part to you.

I'll move on to the imgur edge cases that Kylar posted and see if I can sort them out as well.

@KylarZeppeli
Copy link

Have you figured out anything about why the Imgur downloader wasn't working on some posts?

Cover edge cases that shouldn't ever happen but probably will sometime.

Also included Imgur changes to cover similar situations of malformed/redirected links.
@Soulsuck24
Copy link
Contributor Author

So this should hopefully get both sites working properly and cover all the edge cases that I can think of. I wasn't able to dig up any reddit ID's for the Imgur redirects so couldn't test myself but the logic works the same in this version as it does for Redgifs so there shouldn't be any issues.

@KylarZeppeli
Copy link

KylarZeppeli commented Sep 18, 2022

One more thing to mention: During one of my tests yesterday I had one of the runs stall indefinitely with no indication of progress when it came across a post with a french pornhub link, but english pornhub links worked just fine
There was no test.mp4 file in /tmp like there's supposed to be
max_wait_time was set to 60 seconds, but stalled for multiple minutes

Edit: I just did some tests with the latest commit & it now just says "Could not download submission xg5404: No downloader module exists for url" & moves on to the next post, which I am content with. Probably the only french pornhub link I'll ever come across anyway.

@KylarZeppeli
Copy link

All of the Imgur downloads are downloading as an HTML file with a .gifv extension with the latest commit

This reverts commit 2f2b5b7.
Cover edge cases that shouldn't ever happen but probably will sometime.
@Soulsuck24
Copy link
Contributor Author

Soulsuck24 commented Sep 18, 2022

regex is a scourge

Since this pull was for Redgifs and that seems complete I'm going to move the imgur stuff to #662 and work on it separately, which I should have done in the first place.

@Serene-Arc
Copy link
Owner

I love regex lol I'll get to work on the tests...with regex

@Serene-Arc Serene-Arc self-assigned this Sep 20, 2022
@Serene-Arc
Copy link
Owner

Serene-Arc commented Sep 20, 2022

Hey @Soulsuck24 can you please give me push permissions so I can add my tests commit a la here.

(Also for your other branch if it's the same.)

@Soulsuck24
Copy link
Contributor Author

Soulsuck24 commented Sep 20, 2022

@Serene-Arc As far as I can tell you should be able to for both. That checkbox is checked in both PR's.

I unchecked and rechecked both, hopefully that fixed it if it wasn't working. Worst case you can just close both and make a new one for yourself with everything.

@Serene-Arc
Copy link
Owner

Hm I will investigate

@Serene-Arc Serene-Arc merged commit f4598c4 into Serene-Arc:development Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SITE] RedGifs API changes
4 participants