Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.6 #640

Merged
merged 131 commits into from
Sep 27, 2022
Merged

v2.6 #640

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
99fe331
Bind socket to '0.0.0.0' rather than 'localhost' to allow for more fl…
BlipRanger May 12, 2021
4fd903c
Merge branch 'aliparlakci:master' into master
BlipRanger May 12, 2021
850faff
Add PowerShell scripts
thayol Jan 5, 2022
ac3a8e9
Fix wrong offset
thayol Jan 5, 2022
8ec45a9
Fix Bash script: Failed to write
thayol Jan 6, 2022
3811ec3
Fix offset and remove substring
thayol Jan 6, 2022
49d1626
Merge pull request #581 from Thayol/fix-bash-script-failed-id
Serene-Arc Jan 6, 2022
57b3bb3
Merge pull request #579 from Thayol/add-powershell-scripts
Serene-Arc Jan 6, 2022
0177b43
Add --subscribed option
Serene-Arc Feb 18, 2022
85b2165
Merge pull request #600 from Serene-Arc/enhancement_574
Serene-Arc Feb 18, 2022
9deef63
Add support for Redgifs images and galleries
Serene-Arc Feb 18, 2022
5adb9f9
Merge pull request #601 from Serene-Arc/bug_fix_598
Serene-Arc Feb 18, 2022
6e0c642
Add file scheme naming for archiver
Serene-Arc Feb 18, 2022
6b7e551
Merge pull request #602 from Serene-Arc/bug_fix_531
Serene-Arc Feb 18, 2022
71f8442
Increase version number
Serene-Arc Feb 18, 2022
7645319
Fix gfycat after redgifs changes
Serene-Arc Feb 18, 2022
160ee37
Update test hashes
Serene-Arc Feb 18, 2022
06988c4
Switch redgifs to dynamic file extensions
Serene-Arc Feb 20, 2022
5f779c7
Merge pull request #606 from Serene-Arc/bug_fix_605
Serene-Arc Feb 20, 2022
81b7fe8
Update Readme with some command clarifications
sinclairkosh Mar 21, 2022
a2aa739
Merge pull request #619 from sinclairkosh/patch-1
Serene-Arc Mar 24, 2022
806bd76
Strip any newline characters from names
Serene-Arc Mar 25, 2022
5a3ff88
Add second test case
Serene-Arc Mar 25, 2022
2c93537
Merge pull request #621 from Serene-Arc/bug_fix_616
Serene-Arc Mar 25, 2022
b921d03
Use stdout
chapmanjacobd Apr 18, 2022
68e3674
readme: make --search info a bit more clear
chapmanjacobd Apr 18, 2022
90b6809
Merge pull request #628 from chapmanjacobd/patch-2
Serene-Arc Apr 19, 2022
4e050c5
okay
chapmanjacobd Apr 19, 2022
484bde9
oh there's another one
chapmanjacobd Apr 19, 2022
5775c0a
and this one too
chapmanjacobd Apr 19, 2022
4917fae
Merge branch 'aliparlakci:master' into master
BlipRanger Apr 25, 2022
dbd0c6c
Add support for v.reddit links.
BlipRanger Apr 25, 2022
d64acc2
Add tests, fix style.
BlipRanger Apr 25, 2022
2744075
Fix one test
BlipRanger Apr 25, 2022
1ad2b68
fix: Redirect to /subreddits/search
chapmanjacobd Apr 29, 2022
c410682
Merge pull request #626 from chapmanjacobd/patch-1
Serene-Arc May 8, 2022
81c49de
Replace old Vidble test cases
Serene-Arc May 8, 2022
bfd4817
Update test_connector.py
chapmanjacobd May 8, 2022
ac8855b
Add test case
Serene-Arc Jul 6, 2022
2e68850
Fix some test cases
Serene-Arc Jul 6, 2022
f57590c
Add exclusion options to archiver
Serene-Arc Jul 6, 2022
aede4d5
Merge pull request #642 from Serene-Arc/bug_fix_618
Serene-Arc Jul 6, 2022
12104d5
Update some test hashes
Serene-Arc Jul 7, 2022
a694098
Update test parameter
Serene-Arc Jul 7, 2022
919abb0
Remove bugged test case
Serene-Arc Jul 15, 2022
a620ae9
Add --subscribed option
Serene-Arc Feb 18, 2022
90a2eac
Add support for Redgifs images and galleries
Serene-Arc Feb 18, 2022
e8d7670
Add file scheme naming for archiver
Serene-Arc Feb 18, 2022
a599169
Increase version number
Serene-Arc Feb 18, 2022
f49a1d7
Fix gfycat after redgifs changes
Serene-Arc Feb 18, 2022
1abb776
Update test hashes
Serene-Arc Feb 18, 2022
12982c0
Switch redgifs to dynamic file extensions
Serene-Arc Feb 20, 2022
2bdeaf2
Update Readme with some command clarifications
sinclairkosh Mar 21, 2022
9f3dcec
Strip any newline characters from names
Serene-Arc Mar 25, 2022
53d7ce2
Add second test case
Serene-Arc Mar 25, 2022
e068c9c
readme: make --search info a bit more clear
chapmanjacobd Apr 18, 2022
ad17284
Use stdout
chapmanjacobd Apr 18, 2022
e4a44f1
okay
chapmanjacobd Apr 19, 2022
eb8f9d5
oh there's another one
chapmanjacobd Apr 19, 2022
efea01e
and this one too
chapmanjacobd Apr 19, 2022
decb13b
Replace old Vidble test cases
Serene-Arc May 8, 2022
7d49169
Add test case
Serene-Arc Jul 6, 2022
2d365b6
Fix some test cases
Serene-Arc Jul 6, 2022
8c59329
Add exclusion options to archiver
Serene-Arc Jul 6, 2022
3fd5bad
Update some test hashes
Serene-Arc Jul 7, 2022
7315afe
Update test parameter
Serene-Arc Jul 7, 2022
4f876ee
Remove bugged test case
Serene-Arc Jul 15, 2022
7d4eb47
Rename class
Serene-Arc Jul 15, 2022
9277903
Base VReddit class off of Youtube class
Serene-Arc Jul 15, 2022
86e451d
Fix test case
Serene-Arc Jul 15, 2022
1157c31
Remove bad test case
Serene-Arc Jul 15, 2022
febad9c
Remove bad test case
Serene-Arc Jul 15, 2022
36e32d4
Merge branch 'feature/v.reddit' into development
Serene-Arc Jul 15, 2022
59e57ce
Create protect_master.yml
Serene-Arc Jul 16, 2022
7100291
forgot comma
chapmanjacobd Jul 16, 2022
8ab13b4
Merge pull request #633 from chapmanjacobd/patch-3
Serene-Arc Jul 17, 2022
798ed72
yaml for options
stared Mar 27, 2022
ef82387
underscores in YAML
stared Mar 27, 2022
395bf91
explicit warnings for non-exisitng args
stared Mar 27, 2022
0731de7
instructions for YAML options
stared Mar 27, 2022
5f443fd
a better check for opts
stared Mar 27, 2022
cb3415c
Extract YAML function
Serene-Arc Jul 22, 2022
23e20e6
Rename variable
Serene-Arc Jul 22, 2022
af3f98f
Change logger message level
Serene-Arc Jul 22, 2022
27ca92e
Add simple test
Serene-Arc Jul 22, 2022
7ae318f
Merge pull request #622 from stared/yaml-options
Serene-Arc Jul 22, 2022
1f1e7dc
Fix file path for test
Serene-Arc Jul 23, 2022
607d963
Change file paths for test resource
Serene-Arc Jul 23, 2022
4fc0d5d
Add score filtering
chapmanjacobd May 5, 2022
89653c4
Update README.md
chapmanjacobd May 5, 2022
9545407
Update __main__.py
chapmanjacobd May 5, 2022
7eb2ab6
Update configuration.py
chapmanjacobd May 5, 2022
5d76fcd
Update downloader.py
chapmanjacobd May 5, 2022
f22a8ae
Fix line length
Serene-Arc Jul 23, 2022
2bbf1b6
Change logging message
Serene-Arc Jul 23, 2022
9d63125
Add tests for downloader
Serene-Arc Jul 23, 2022
b47b90f
Add integration tests
Serene-Arc Jul 23, 2022
55c9549
Fix test structure
Serene-Arc Jul 23, 2022
44e4c16
Update bash script
Serene-Arc Jul 23, 2022
4b160c2
Add missing flag
Serene-Arc Jul 23, 2022
cd6bcd8
Merge pull request #634 from chapmanjacobd/patch-4
Serene-Arc Jul 23, 2022
d60b4e7
Fix Redgifs module
Serene-Arc Sep 1, 2022
0767da1
Fix clone integration test setup
Serene-Arc Sep 3, 2022
5dbb4d0
Remove dead link tests
Serene-Arc Sep 3, 2022
35645da
Add missing mark
Serene-Arc Sep 3, 2022
e0a36f4
Re-fix Redgifs
Soulsuck24 Sep 13, 2022
0a9ecac
Redgif image fixes
Soulsuck24 Sep 16, 2022
9574958
Redgifs fixed?
Soulsuck24 Sep 17, 2022
2f2b5b7
Edge case coverage
Soulsuck24 Sep 18, 2022
d4f7dea
Revert "Edge case coverage"
Soulsuck24 Sep 18, 2022
7bd957a
Redo edge case coverage for Redgifs
Soulsuck24 Sep 18, 2022
106d759
Imgur updates
Soulsuck24 Sep 19, 2022
5c343ef
Fix Redgifs tests
Serene-Arc Sep 20, 2022
c4a9da0
Add dev requirements file
Serene-Arc Sep 20, 2022
f4598c4
Merge pull request #659 from Soulsuck24/development
Serene-Arc Sep 20, 2022
0681609
Merge branch 'development' into imgur
Serene-Arc Sep 20, 2022
1dff750
Remove duplicate entries
Serene-Arc Sep 20, 2022
cd05bc3
Fix tests
Serene-Arc Sep 20, 2022
398f7b2
Merge pull request #664 from Soulsuck24/imgur
Serene-Arc Sep 20, 2022
7fef6c4
Update test_clone_integration.py
OMEGARAZER Sep 23, 2022
3906386
Update test_download_integration.py
OMEGARAZER Sep 23, 2022
9c067ad
Update test_connector.py
OMEGARAZER Sep 23, 2022
ca33dee
Update test_download_integration.py
OMEGARAZER Sep 23, 2022
e57932a
Merge pull request #667 from OMEGARAZER/development
Serene-Arc Sep 23, 2022
57e59db
Update Erome link regex
Serene-Arc Sep 22, 2022
7bb2a9a
Remove obsolete test
Serene-Arc Sep 22, 2022
c834314
Update hash
Serene-Arc Sep 22, 2022
3b5f8bc
Update hashes
Serene-Arc Sep 22, 2022
d4664d7
Update yt-dlp requirement version
Serene-Arc Sep 27, 2022
b7d2116
Update test
Serene-Arc Sep 27, 2022
0ce2585
Update path so tests do not skip
Serene-Arc Sep 27, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Declare files that will always have CRLF line endings on checkout.
*.ps1 text eol=crlf
13 changes: 13 additions & 0 deletions .github/workflows/protect_master.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: Protect master branch

on:
pull_request:
branches:
- master
jobs:
merge_check:
runs-on: ubuntu-latest
steps:
- name: Check if the pull request is mergeable to master
run: |
if [[ "$GITHUB_HEAD_REF" == 'development' && "$GITHUB_REPOSITORY" == 'aliparlakci/bulk-downloader-for-reddit' ]]; then exit 0; else exit 1; fi;
53 changes: 50 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,12 @@ However, these commands are not enough. You should chain parameters in [Options]
python3 -m bdfr download ./path/to/output --subreddit Python -L 10
```
```bash
python3 -m bdfr download ./path/to/output --user reddituser --submitted -L 100
```
```bash
python3 -m bdfr download ./path/to/output --user reddituser --submitted --all-comments --comment-context
```
```bash
python3 -m bdfr download ./path/to/output --user me --saved --authenticate -L 25 --file-scheme '{POSTID}'
```
```bash
Expand All @@ -62,6 +68,31 @@ python3 -m bdfr download ./path/to/output --subreddit 'Python, all, mindustry' -
python3 -m bdfr archive ./path/to/output --subreddit all --format yaml -L 500 --folder-scheme ''
```

Alternatively, you can pass options through a YAML file.

```bash
python3 -m bdfr download ./path/to/output --opts my_opts.yaml
```

For example, running it with the following file

```yaml
skip: [mp4, avi]
file_scheme: "{UPVOTES}_{REDDITOR}_{POSTID}_{DATE}"
limit: 10
sort: top
subreddit:
- EarthPorn
- CityPorn
```

would be equilavent to (take note that in YAML there is `file_scheme` instead of `file-scheme`):
```bash
python3 -m bdfr download ./path/to/output --skip mp4 --skip avi --file-scheme "{UPVOTES}_{REDDITOR}_{POSTID}_{DATE}" -L 10 -S top --subreddit EarthPorn --subreddit CityPorn
```

In case when the same option is specified both in the YAML file and in as a command line argument, the command line argument takes prs

## Options

The following options are common between both the `archive` and `download` commands of the BDFR.
Expand All @@ -74,6 +105,10 @@ The following options are common between both the `archive` and `download` comma
- `--config`
- If the path to a configuration file is supplied with this option, the BDFR will use the specified config
- See [Configuration Files](#configuration) for more details
- `--opts`
- Load options from a YAML file.
- Has higher prority than the global config file but lower than command-line arguments.
- See [opts_example.yaml](./opts_example.yaml) for an example file.
- `--disable-module`
- Can be specified multiple times
- Disables certain modules from being used
Expand All @@ -92,8 +127,8 @@ The following options are common between both the `archive` and `download` comma
- This option will make the BDFR use the supplied user's saved posts list as a download source
- This requires an authenticated Reddit instance, using the `--authenticate` flag, as well as `--user` set to `me`
- `--search`
- This will apply the specified search term to specific lists when scraping submissions
- A search term can only be applied to subreddits and multireddits, supplied with the `- s` and `-m` flags respectively
- This will apply the input search term to specific lists when scraping submissions
- A search term can only be applied when using the `--subreddit` and `--multireddit` flags
- `--submitted`
- This will use a user's submissions as a source
- A user must be specified with `--user`
Expand Down Expand Up @@ -192,6 +227,15 @@ The following options apply only to the `download` command. This command downloa
- This skips all submissions from the specified subreddit
- Can be specified multiple times
- Also accepts CSV subreddit names
- `--min-score`
- This skips all submissions which have fewer than specified upvotes
- `--max-score`
- This skips all submissions which have more than specified upvotes
- `--min-score-ratio`
- This skips all submissions which have lower than specified upvote ratio
- `--max-score-ratio`
- This skips all submissions which have higher than specified upvote ratio


### Archiver Options

Expand All @@ -215,7 +259,10 @@ The `clone` command can take all the options listed above for both the `archive`

## Common Command Tricks

A common use case is for subreddits/users to be loaded from a file. The BDFR doesn't support this directly but it is simple enough to do through the command-line. Consider a list of usernames to download; they can be passed through to the BDFR with the following command, assuming that the usernames are in a text file:
A common use case is for subreddits/users to be loaded from a file. The BDFR supports this via YAML file options (`--opts my_opts.yaml`).

Alternatively, you can use the command-line [xargs](https://en.wikipedia.org/wiki/Xargs) function.
For a list of users `users.txt` (one user per line), type:

```bash
cat users.txt | xargs -L 1 echo --user | xargs -L 50 python3 -m bdfr download <ARGS>
Expand Down
14 changes: 10 additions & 4 deletions bdfr/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,19 @@
click.argument('directory', type=str),
click.option('--authenticate', is_flag=True, default=None),
click.option('--config', type=str, default=None),
click.option('--opts', type=str, default=None),
click.option('--disable-module', multiple=True, default=None, type=str),
click.option('--exclude-id', default=None, multiple=True),
click.option('--exclude-id-file', default=None, multiple=True),
click.option('--file-scheme', default=None, type=str),
click.option('--folder-scheme', default=None, type=str),
click.option('--ignore-user', type=str, multiple=True, default=None),
click.option('--include-id-file', multiple=True, default=None),
click.option('--log', type=str, default=None),
click.option('--saved', is_flag=True, default=None),
click.option('--search', default=None, type=str),
click.option('--submitted', is_flag=True, default=None),
click.option('--subscribed', is_flag=True, default=None),
click.option('--time-format', type=str, default=None),
click.option('--upvoted', is_flag=True, default=None),
click.option('-L', '--limit', default=None, type=int),
Expand All @@ -37,17 +43,17 @@
]

_downloader_options = [
click.option('--file-scheme', default=None, type=str),
click.option('--folder-scheme', default=None, type=str),
click.option('--make-hard-links', is_flag=True, default=None),
click.option('--max-wait-time', type=int, default=None),
click.option('--no-dupes', is_flag=True, default=None),
click.option('--search-existing', is_flag=True, default=None),
click.option('--exclude-id', default=None, multiple=True),
click.option('--exclude-id-file', default=None, multiple=True),
click.option('--skip', default=None, multiple=True),
click.option('--skip-domain', default=None, multiple=True),
click.option('--skip-subreddit', default=None, multiple=True),
click.option('--min-score', type=int, default=None),
click.option('--max-score', type=int, default=None),
click.option('--min-score-ratio', type=float, default=None),
click.option('--max-score-ratio', type=float, default=None),
]

_archiver_options = [
Expand Down
3 changes: 3 additions & 0 deletions bdfr/archiver.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ def download(self):
f'Submission {submission.id} in {submission.subreddit.display_name} skipped'
f' due to {submission.author.name if submission.author else "DELETED"} being an ignored user')
continue
if submission.id in self.excluded_submission_ids:
logger.debug(f'Object {submission.id} in exclusion list, skipping')
continue
logger.debug(f'Attempting to archive submission {submission.id}')
self.write_entry(submission)

Expand Down
39 changes: 37 additions & 2 deletions bdfr/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,21 @@
# coding=utf-8

from argparse import Namespace
from pathlib import Path
from typing import Optional
import logging

import click
import yaml

logger = logging.getLogger(__name__)

class Configuration(Namespace):
def __init__(self):
super(Configuration, self).__init__()
self.authenticate = False
self.config = None
self.opts: Optional[str] = None
self.directory: str = '.'
self.disable_module: list[str] = []
self.exclude_id = []
Expand All @@ -33,8 +38,13 @@ def __init__(self):
self.skip: list[str] = []
self.skip_domain: list[str] = []
self.skip_subreddit: list[str] = []
self.min_score = None
self.max_score = None
self.min_score_ratio = None
self.max_score_ratio = None
self.sort: str = 'hot'
self.submitted: bool = False
self.subscribed: bool = False
self.subreddit: list[str] = []
self.time: str = 'all'
self.time_format = None
Expand All @@ -48,6 +58,31 @@ def __init__(self):
self.comment_context: bool = False

def process_click_arguments(self, context: click.Context):
if context.params.get('opts') is not None:
self.parse_yaml_options(context.params['opts'])
for arg_key in context.params.keys():
if arg_key in vars(self) and context.params[arg_key] is not None:
vars(self)[arg_key] = context.params[arg_key]
if not hasattr(self, arg_key):
logger.warning(f'Ignoring an unknown CLI argument: {arg_key}')
continue
val = context.params[arg_key]
if val is None or val == ():
# don't overwrite with an empty value
continue
setattr(self, arg_key, val)

def parse_yaml_options(self, file_path: str):
yaml_file_loc = Path(file_path)
if not yaml_file_loc.exists():
logger.error(f'No YAML file found at {yaml_file_loc}')
return
with open(yaml_file_loc) as file:
try:
opts = yaml.load(file, Loader=yaml.FullLoader)
except yaml.YAMLError as e:
logger.error(f'Could not parse YAML options file: {e}')
return
for arg_key, val in opts.items():
if not hasattr(self, arg_key):
logger.warning(f'Ignoring an unknown YAML argument: {arg_key}')
continue
setattr(self, arg_key, val)
24 changes: 17 additions & 7 deletions bdfr/connector.py
Original file line number Diff line number Diff line change
Expand Up @@ -243,9 +243,19 @@ def split_args_input(entries: list[str]) -> set[str]:
return set(all_entries)

def get_subreddits(self) -> list[praw.models.ListingGenerator]:
if self.args.subreddit:
out = []
for reddit in self.split_args_input(self.args.subreddit):
out = []
subscribed_subreddits = set()
if self.args.subscribed:
if self.args.authenticate:
try:
subscribed_subreddits = list(self.reddit_instance.user.subreddits(limit=None))
subscribed_subreddits = set([s.display_name for s in subscribed_subreddits])
except prawcore.InsufficientScope:
logger.error('BDFR has insufficient scope to access subreddit lists')
else:
logger.error('Cannot find subscribed subreddits without an authenticated instance')
if self.args.subreddit or subscribed_subreddits:
for reddit in self.split_args_input(self.args.subreddit) | subscribed_subreddits:
if reddit == 'friends' and self.authenticated is False:
logger.error('Cannot read friends subreddit without an authenticated instance')
continue
Expand All @@ -270,9 +280,7 @@ def get_subreddits(self) -> list[praw.models.ListingGenerator]:
logger.debug(f'Added submissions from subreddit {reddit}')
except (errors.BulkDownloaderException, praw.exceptions.PRAWException) as e:
logger.error(f'Failed to get submissions for subreddit {reddit}: {e}')
return out
else:
return []
return out

def resolve_user_name(self, in_name: str) -> str:
if in_name == 'me':
Expand Down Expand Up @@ -406,7 +414,9 @@ def check_subreddit_status(subreddit: praw.models.Subreddit):
try:
assert subreddit.id
except prawcore.NotFound:
raise errors.BulkDownloaderException(f'Source {subreddit.display_name} does not exist or cannot be found')
raise errors.BulkDownloaderException(f"Source {subreddit.display_name} cannot be found")
except prawcore.Redirect:
raise errors.BulkDownloaderException(f"Source {subreddit.display_name} does not exist")
except prawcore.Forbidden:
raise errors.BulkDownloaderException(f'Source {subreddit.display_name} is private and cannot be scraped')

Expand Down
2 changes: 1 addition & 1 deletion bdfr/default_config.cfg
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[DEFAULT]
client_id = U-6gk4ZCh3IeNQ
client_secret = 7CZHY6AmKweZME5s50SfDGylaPg
scopes = identity, history, read, save
scopes = identity, history, read, save, mysubreddits
backup_log_count = 3
max_wait_time = 120
time_format = ISO
13 changes: 13 additions & 0 deletions bdfr/downloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,19 @@ def _download_submission(self, submission: praw.models.Submission):
f'Submission {submission.id} in {submission.subreddit.display_name} skipped'
f' due to {submission.author.name if submission.author else "DELETED"} being an ignored user')
return
elif self.args.min_score and submission.score < self.args.min_score:
logger.debug(
f"Submission {submission.id} filtered due to score {submission.score} < [{self.args.min_score}]")
return
elif self.args.max_score and self.args.max_score < submission.score:
logger.debug(
f"Submission {submission.id} filtered due to score {submission.score} > [{self.args.max_score}]")
return
elif (self.args.min_score_ratio and submission.upvote_ratio < self.args.min_score_ratio) or (
self.args.max_score_ratio and self.args.max_score_ratio < submission.upvote_ratio
):
logger.debug(f"Submission {submission.id} filtered due to score ratio ({submission.upvote_ratio})")
return
elif not isinstance(submission, praw.models.Submission):
logger.warning(f'{submission.id} is not a submission')
return
Expand Down
3 changes: 3 additions & 0 deletions bdfr/file_name_formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ def format_path(
if not resource.extension:
raise BulkDownloaderException(f'Resource from {resource.url} has no extension')
file_name = str(self._format_name(resource.source_submission, self.file_format_string))

file_name = re.sub(r'\n', ' ', file_name)

if not re.match(r'.*\.$', file_name) and not re.match(r'^\..*', resource.extension):
ending = index + '.' + resource.extension
else:
Expand Down
11 changes: 6 additions & 5 deletions bdfr/site_downloaders/download_factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,18 @@
from bdfr.site_downloaders.redgifs import Redgifs
from bdfr.site_downloaders.self_post import SelfPost
from bdfr.site_downloaders.vidble import Vidble
from bdfr.site_downloaders.vreddit import VReddit
from bdfr.site_downloaders.youtube import Youtube


class DownloadFactory:
@staticmethod
def pull_lever(url: str) -> Type[BaseDownloader]:
sanitised_url = DownloadFactory.sanitise_url(url)
if re.match(r'(i\.)?imgur.*\.gif.+$', sanitised_url):
if re.match(r'(i\.|m\.)?imgur', sanitised_url):
return Imgur
elif re.match(r'(i\.)?(redgifs|gifdeliverynetwork)', sanitised_url):
return Redgifs
elif re.match(r'.*/.*\.\w{3,4}(\?[\w;&=]*)?$', sanitised_url) and \
not DownloadFactory.is_web_resource(sanitised_url):
return Direct
Expand All @@ -37,16 +40,14 @@ def pull_lever(url: str) -> Type[BaseDownloader]:
return Gallery
elif re.match(r'gfycat\.', sanitised_url):
return Gfycat
elif re.match(r'(m\.)?imgur.*', sanitised_url):
return Imgur
elif re.match(r'(redgifs|gifdeliverynetwork)', sanitised_url):
return Redgifs
elif re.match(r'reddit\.com/r/', sanitised_url):
return SelfPost
elif re.match(r'(m\.)?youtu\.?be', sanitised_url):
return Youtube
elif re.match(r'i\.redd\.it.*', sanitised_url):
return Direct
elif re.match(r'v\.redd\.it.*', sanitised_url):
return VReddit
elif re.match(r'pornhub\.com.*', sanitised_url):
return PornHub
elif re.match(r'vidble\.com', sanitised_url):
Expand Down
4 changes: 2 additions & 2 deletions bdfr/site_downloaders/gfycat.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ def find_resources(self, authenticator: Optional[SiteAuthenticator] = None) -> l
return super().find_resources(authenticator)

@staticmethod
def _get_link(url: str) -> str:
def _get_link(url: str) -> set[str]:
gfycat_id = re.match(r'.*/(.*?)/?$', url).group(1)
url = 'https://gfycat.com/' + gfycat_id

Expand All @@ -39,4 +39,4 @@ def _get_link(url: str) -> str:
raise SiteDownloaderError(f'Failed to download Gfycat link {url}: {e}')
except json.JSONDecodeError as e:
raise SiteDownloaderError(f'Did not receive valid JSON data: {e}')
return out
return {out,}
10 changes: 6 additions & 4 deletions bdfr/site_downloaders/imgur.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,12 @@ def _compute_image_url(self, image: dict) -> Resource:

@staticmethod
def _get_data(link: str) -> dict:
link = link.rstrip('?')
if re.match(r'(?i).*\.gif.+$', link):
link = link.replace('i.imgur', 'imgur')
link = re.sub('(?i)\\.gif.+$', '', link)
try:
imgur_id = re.match(r'.*/(.*?)(\..{0,})?$', link).group(1)
gallery = 'a/' if re.search(r'.*/(.*?)(gallery/|a/)', link) else ''
link = f'https://imgur.com/{gallery}{imgur_id}'
except AttributeError:
raise SiteDownloaderError(f'Could not extract Imgur ID from {link}')

res = Imgur.retrieve_url(link, cookies={'over18': '1', 'postpagebeta': '0'})

Expand Down
Loading