Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reddit] Support of standalone submissions on personal pages of users #2301

Merged
merged 3 commits into from
Feb 13, 2022
Merged

[reddit] Support of standalone submissions on personal pages of users #2301

merged 3 commits into from
Feb 13, 2022

Conversation

Ailothaen
Copy link
Contributor

@Ailothaen Ailothaen commented Feb 13, 2022

While trying to download a submission on a user's personal page (these are submissions that are not bound to a subreddit but to an user, such as this one: https://www.reddit.com/user/TheSpiritTree/comments/srilyf/this_is_a_random_image_posted_for_automation/ ), I noticed that there was no sub-extractor of Reddit for this purpose (there is one for a whole user's page section, but not for single submissions)

This commit adds a class RedditUserSubmissionExtractor which takes care of this, providing the pattern for this situation.

Examples before then after the changes:

root@hestia:~ # python3 -m gallery_dl "https://www.reddit.com/user/TheSpiritTree" --verbose
[gallery-dl][debug] Version 1.20.4
[gallery-dl][debug] Python 3.7.3 - Linux-5.4.51-v7+-armv7l-with-debian-10.6
[gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/user/TheSpiritTree'
[reddit][debug] Using RedditUserExtractor for 'https://www.reddit.com/user/TheSpiritTree'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /user/TheSpiritTree/.json?limit=100&raw_json=1 HTTP/1.1" 200 5734
./gallery-dl/reddit/u_TheSpiritTree/srilyf This is a random image posted for automation testing purposes..jpg

root@hestia:~ # python3 -m gallery_dl "https://www.reddit.com/user/TheSpiritTree/comments/srilyf/this_is_a_random_image_posted_for_automation/" --verbose
[gallery-dl][debug] Version 1.20.4
[gallery-dl][debug] Python 3.7.3 - Linux-5.4.51-v7+-armv7l-with-debian-10.6
[gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/user/TheSpiritTree/comments/srilyf/this_is_a_random_image_posted_for_automation/'
[reddit][debug] Using RedditUserExtractor for 'https://www.reddit.com/user/TheSpiritTree/comments/srilyf/this_is_a_random_image_posted_for_automation/'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /user/TheSpiritTree/comments/.json?limit=100&raw_json=1 HTTP/1.1" 200 4488
[reddit][info] No results for https://www.reddit.com/user/TheSpiritTree/comments/srilyf/this_is_a_random_image_posted_for_automation/
root@hestia:~ # python3 -m gallery_dl "https://www.reddit.com/user/TheSpiritTree" --verbose
[gallery-dl][debug] Version 1.20.4
[gallery-dl][debug] Python 3.7.3 - Linux-5.4.51-v7+-armv7l-with-debian-10.6
[gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/user/TheSpiritTree'
[reddit][debug] Using RedditUserExtractor for 'https://www.reddit.com/user/TheSpiritTree'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /user/TheSpiritTree/.json?limit=100&raw_json=1 HTTP/1.1" 200 5734
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): i.redd.it:443
[urllib3.connectionpool][debug] https://i.redd.it:443 "GET /8fpgv17yqlh81.jpg HTTP/1.1" 200 975588
./gallery-dl/reddit/u_TheSpiritTree/srilyf This is a random image posted for automation testing purposes..jpg

root@hestia:~ # python3 -m gallery_dl "https://www.reddit.com/user/TheSpiritTree/comments/srilyf/this_is_a_random_image_posted_for_automation/" --verbose
[gallery-dl][debug] Version 1.20.4
[gallery-dl][debug] Python 3.7.3 - Linux-5.4.51-v7+-armv7l-with-debian-10.6
[gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/user/TheSpiritTree/comments/srilyf/this_is_a_random_image_posted_for_automation/'
[reddit][debug] Using RedditUserSubmissionExtractor for 'https://www.reddit.com/user/TheSpiritTree/comments/srilyf/this_is_a_random_image_posted_for_automation/'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/srilyf/.json?limit=0&raw_json=1 HTTP/1.1" 200 1676
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): i.redd.it:443
[urllib3.connectionpool][debug] https://i.redd.it:443 "GET /8fpgv17yqlh81.jpg HTTP/1.1" 200 975588

I had to add a $ at the end of the regex for RedditUserExtractor since its regex was catching the URL before the new class otherwise.

Please note that I filled the test variable for this class, but:

  • I am unsure about how to launch the tests and how to fill them as well, so I could not test it, this surely has to be changed
  • The URL I implemented is bound to the account "TheSpiritTree" (which I own). I am not going to remove this submission if it is going to be used in the code, but still feel free to change the URL to something more "resilient".

@mikf
Copy link
Owner

mikf commented Feb 13, 2022

You can run tests with make test in the root directory, or by directly running each test_*.py file in the test directory. Running extractor result tests only works like that (running all takes more than 30 minutes), but you can select specific extractor categories with the second argument:

$ python test/test_results.py reddit
$ python test/test_results.py reddit:usersubmission

The one test you added runs successfully when testing it on my machine.

Also, maybe you could just extend the regex pattern of the existing submission extractor instead of creating a new one using basically the same code.

I think that would just be changing

pattern = (r"(?:https?://)?(?:"
r"(?:\w+\.)?reddit\.com/(?:r/[^/?#]+/comments|gallery)"
r"|redd\.it)/([a-z0-9]+)")

to

    pattern = (r"(?:https?://)?(?:"
               r"(?:\w+\.)?reddit\.com/(?:(?:r|u|user)/[^/?#]+"
               "/comments|gallery)|redd\.it)/([a-z0-9]+)")

(r to (?:r|u|user))

The tests on GitHub failed because the comment in https://github.com/mikf/gallery-dl/pull/2301/files#diff-638fcf254368bdeef37f0ad5e3150e8575e2d5619f4815134f6ea2d6e880a923R249 is too long. flake8 complains about all lines longer than 79 characters.

@Ailothaen
Copy link
Contributor Author

Thank you for the suggestions. I definitely agree that modifying the pattern of RedditSubmissionExtractor is a better design, since in the end, a user submission still remains a submission. (The only point of concern would be that one would not currently be able to distinguish between subreddits and usernames, so it may cause confusion if someone downloads from both usernames and subreddits with the same names)

I am going to push a commit that goes on this design (and corrects the linter issue)

Thank you as well for the guidance on how to run tests. I feel like this should be useful to put on a wiki page/documentation related to development (such as the answer of rachmadaniHaryono related to my question with the Extractor class back then)

@mikf mikf merged commit 203a04a into mikf:master Feb 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants