[input-file] Allow comments after a URL #2808

AlttiRi · 2022-08-07T23:12:48Z

I would like to add comments in the input file (--input-file) after a URL. After " #" (" " a space character (one or more) and "#")

For example:

https://example.com/11 # 404
https://example.com/22 # important
https://example.com/3  # todo: download related media

For this gallery-dl currently prints the error:

[gallery-dl][error] No suitable extractor found for 'https://example.com/11 # 404'

The follow format is working, but it looks less readable:

# 404
https://example.com/11
# important
https://example.com/22
# todo: download related media 
https://example.com/33

More over it already expects only one URL per line, I can't to do:

https://example.com/11 https://example.com/22
https://example.com/33

[gallery-dl][error] No suitable extractor found for 'https://example.com/11 https://example.com/22'

There is also no valid URL with a "plain" space character. The space character is present either with %20 or with + (in the search params) in a URL.

Addition (why I noted these two things):

...so it will not be the breaking change.

The text was updated successfully, but these errors were encountered:

Hrxn · 2022-08-09T06:26:18Z

The follow format is working, but it looks less readable:
# 404
https://example.com/11
# important
https://example.com/22
# todo: download related media 
https://example.com/33

Debatable...

I think it's pretty straightforward to understand.
But it's for mikf to decide if he wants to extend parsing support here for trailing comments in line...

More over it already expects only one URL per line, I can't to do:
https://example.com/11 https://example.com/22
https://example.com/33

Yes, this is by design? I mean, some kind of delimiter has to used..

There is also no valid URL with a "plain" space character. The space character is present either with %20 or with + in a URL.

Yes, this is actually how it's supposed to be, though..
https://url.spec.whatwg.org/#url-units
https://url.spec.whatwg.org/#url-code-points

everything after the first " #" (space + hash) gets ignored

mikf · 2022-08-10T19:11:48Z

The follow format is working, but it looks less readable:

Personally I add blank lines between comment-URL-blocks, so that I can triple click URLs without having anything extra selected

# 404
https://example.com/11

# important
https://example.com/22

# todo: download related media 
https://example.com/33

There is also no valid URL with a "plain" space character. The space character is present either with %20 or with + (in the search params) in a URL.

True, but gallery-dl can usually handle both, url-escaped and -unescaped, versions of a URL.

AlttiRi · 2022-08-10T20:14:26Z

Thanks.

I think \t also should be counted as a space.
It's useful to use tabs to align text with auto-padding of it (works fine in Notepad, Notepad++).

When you have hundreds of URL (for regular downloading of recently added content with --download-archive and --abort 10) it would be easier just to mark a URL with a comment after it, instead of splitting the "URL's sequence" by a comment.

https://example.com/124

# other comment
# https://example.com/13

https://example.com/14345
https://example.com/15

# some comment
https://example.com/16

https://example.com/17

# 404
# https://example.com/18

https://example.com/19876

# 404
# https://example.com/404

https://example.com/4

https://example.com/124
# https://example.com/13   # other comment
https://example.com/14345
https://example.com/15
https://example.com/16     # some comment
https://example.com/17
# https://example.com/18   # 404
https://example.com/19876
# https://example.com/404  # 404
https://example.com/4

AlttiRi · 2022-09-08T15:50:45Z

Thank you.

Works fine with some extractors.

However, it does not work with Pixiv if there are more than one space character before # sign:

gallery-dl: No suitable extractor found for 'https://www.pixiv.net/en/users/42083333    '

gallery-dl/gallery_dl/__init__.py

Lines 98 to 101 in 2ed5802

    
           if " #" in line: 
        
               line = line.partition(" #")[0] 
        
           elif "\t#" in line: 
        
               line = line.partition("\t#")[0]

An example fix:

if " #" in line or "\t#" in line: 
    line = re.split("[ \t]+#", line, 1)[0]

mikf · 2022-09-09T16:17:18Z

Sorry for the oversight. I must've only tested with URLs where extra whitespace/characters after a URL don't matter and just assumed it would work for all.

Fixed in bdad9c4

AlttiRi · 2022-10-06T20:33:53Z

There is a minor bug (with bdad9c4) in case when you use "\t#" and " #" in one line:

https://example.com/123      #not-followed #3d
                       ^ \t is here

gallery-dl: No suitable extractor found for 'https://example.com/123      #not-followed'

and move 'parse_inputfile()' to util.py

mikf added a commit that referenced this issue Aug 10, 2022

allow comments after input file URLs (#2808)

d0adc13

everything after the first " #" (space + hash) gets ignored

mikf added the feature-request label Aug 10, 2022

mikf closed this as completed Aug 10, 2022

mikf added a commit that referenced this issue Aug 12, 2022

allow tabstops as whitespace before input file comments (#2808)

764906e

mikf added a commit that referenced this issue Sep 9, 2022

remove whitespace before comments in input file URLs (#2808)

bdad9c4

mikf added a commit that referenced this issue Oct 8, 2022

fix bug when processing input file comments (#2808)

a6e2d96

and move 'parse_inputfile()' to util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[input-file] Allow comments after a URL #2808

[input-file] Allow comments after a URL #2808

AlttiRi commented Aug 7, 2022 •

edited

Loading

Hrxn commented Aug 9, 2022

mikf commented Aug 10, 2022

AlttiRi commented Aug 10, 2022 •

edited

Loading

AlttiRi commented Sep 8, 2022 •

edited

Loading

mikf commented Sep 9, 2022

AlttiRi commented Oct 6, 2022

[input-file] Allow comments after a URL #2808

[input-file] Allow comments after a URL #2808

Comments

AlttiRi commented Aug 7, 2022 • edited Loading

Hrxn commented Aug 9, 2022

mikf commented Aug 10, 2022

AlttiRi commented Aug 10, 2022 • edited Loading

AlttiRi commented Sep 8, 2022 • edited Loading

mikf commented Sep 9, 2022

AlttiRi commented Oct 6, 2022

AlttiRi commented Aug 7, 2022 •

edited

Loading

AlttiRi commented Aug 10, 2022 •

edited

Loading

AlttiRi commented Sep 8, 2022 •

edited

Loading