Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to format a string metadata? #318

Closed
paulolimac opened this issue Jun 23, 2019 · 20 comments
Closed

How to format a string metadata? #318

paulolimac opened this issue Jun 23, 2019 · 20 comments

Comments

@paulolimac
Copy link

I'm trying to format a metadata string, with:

  • lowercase and,
  • replace (blankspace to underline).

Unsuccessfully, I tried the python f-string methods:
{title.lower().replace(' ', '_')}

Example:

My command:

$ gallery-dl --config=./my_config.json --simulate https://www.flickr.com/photos/mellbin/7085146945/

My my_config.json:

{
        "extractor":
        {
                "flickr":
                {
                        "filename": "{title!l}.{extension}"
                }

	}
}

My terminal output:

/copenhagen bikehaven by mellbin - bike cycle bicycle - 2012 - 6010.jpg

My desired output (lowercase + replaced blankspaces to underlines):

/copenhagen_bikehaven_by_mellbin_-_bike_cycle_bicycle_-_2012_-_6010.jpg
@mikf
Copy link
Owner

mikf commented Jun 23, 2019

There is now a format method to replace replace a (sub)string with another (95b1e4c).
Whenever you get to use that code, change your format string to "{title!l:R /_/}.{extension}"and you'll get your desired output.

There are in general at least two "problems" with this approach, but hey, it works.

  1. Like with all other "special" formatting options, you can only use it once. With the current system there is no way of combining multiple replacements and/or other formatting methods.
  2. You'd have to technically specify the space-with-underscore replacement for every format field, since every one of them could potentially have spaces in them. Maybe in this case it would be better to have a global replace-x-with-y mapping (that could eventually be used as argument for str.translate())

@mikf
Copy link
Owner

mikf commented Jun 23, 2019

And before I forget: f-strings look really nice, but they can't be used here. They must be placed in the actual source code and have to be read and evaluated by a Python interpreter to work. Storing them in config files is therefore out of the question, but being able to do stuff like {title.lower().replace(' ', '_')} would be amazing.

@paulolimac
Copy link
Author

It works like a charm, @mikf. Thank you. 👍
And about f-string, I only tried without commitment. 😏

@Hrxn
Copy link
Contributor

Hrxn commented Jun 29, 2019

Okay, just to get this straight..

Like with all other "special" formatting options, you can only use it once. With the current system there is no way of combining multiple replacements and/or other formatting methods.

This means that I cannot use something like this
"filename": "{title:R /_/}{hash}.{extension}"
in conjunction with this
"filename": "{title:?/ /}{hash}.{extension}"
, right?
(Example taken from my Imgur extractor setting)

@mikf
Copy link
Owner

mikf commented Jun 29, 2019

You cannot use M and ?, for example, in the same replacement field:
"filename": "{title:?/_/R /_/}{hash}.{extension}"
-> [imgur][error] Applying filename format string failed: ValueError: Invalid format specifier

but you can use them across multiple different fields, albeit only one per, and even put the standard format specifiers at the end:
"filename": "{title:?/ /}{hash:R0/O/*^9}.{extension:Rjpeg/jpg/}"

@Hrxn
Copy link
Contributor

Hrxn commented Jun 30, 2019

Okay, thanks. That's what I assumed, just wanted to be sure.

Also, I just realized that my example does not really make a lot of sense.
I mean, instead of using "filename": "{title:?/ /}{hash}.{extension}" one could simply just use "filename": "{title:R /_/}{hash}.{extension}", or actually rather "filename": "{title:R /_/}_{hash}.{extension}".

Adding a whitespace with {x:?/ /} and replacing them with {x:R /_/} at the same time is... kinda stupid. 😄

@God-damnit-all
Copy link
Contributor

@mikf Is it possible to make it to where all of the following extensions are converted to jpg? jfif, jpeg, jpe, jif, jfi

@mikf
Copy link
Owner

mikf commented Sep 26, 2020

Possible, but rather cumbersome.

Since 90e4c64, it is possible to use multiple replacement operations among other things, so you could theoretically use {extension:Rjfif/jpg/Rjfif/jpg/Rjpeg/jpg/Rjpe/jpg/Rjif/jpg/Rjfi/jpg/}, but that's obviously not a good solution.

I'll probably add an option that allows replacing/renaming filename extensions in a better way.

@God-damnit-all
Copy link
Contributor

God-damnit-all commented Oct 1, 2020

@mikf Thanks for the tip. I have one more thing I want to do, this time with titles.
Due to Windows filename restrictions, I want to replace ? with Ɂ, " with '', and : with . However when I try to use backslash to escape the characters it just gives me an Invalid \escape error. Is there a way to do this?

Additionally, I sometimes notice the title in the filename is truncated, presumably to avoid long filenames, but my system is configured for long path support, is there any way to override that?

@mikf
Copy link
Owner

mikf commented Oct 8, 2020

@ImportTaste You don't need to backslash escape ? and :, only ":
"{title:R?/Ɂ/R:/꞉/R\"/''/}"

There is also a general character replacement option: path-restrict

"replace invalid path characters with unicode alternatives": null,
"path-restrict": {
"\\": "⧹",
"/" : "⧸",
"|" : "│",
":" : "꞉",
"*" : "∗",
"?" : "?",
"\"": "″",
"<" : "﹤",
">" : "﹥"
}

@God-damnit-all
Copy link
Contributor

@ImportTaste You don't need to backslash escape ? and :, only ":
"{title:R?/Ɂ/R:/꞉/R\"/''/}"

There is also a general character replacement option: path-restrict

"replace invalid path characters with unicode alternatives": null,
"path-restrict": {
"\\": "⧹",
"/" : "⧸",
"|" : "│",
":" : "꞉",
"*" : "∗",
"?" : "?",
"\"": "″",
"<" : "﹤",
">" : "﹥"
}

Oh, I missed that, thank you. Is there any way to get it to stop truncating the title? I have long paths enabled so I don't really need them shortened.

@mikf
Copy link
Owner

mikf commented Oct 8, 2020

File or directory names generally don't get truncated. There are a some exceptions in the default format strings for exhentai, reddit, and all *reactor sites because the resulting name could get longer than 256 characters in some edge cases and someone opened an issue reporting that, but you can override the default filename/directory format strings for those sites to not trim anything.

For example filenames for Reddit default to "{id}{num:? //>02} {title[:220]}.{extension}", but you can change that to "{id}{num:? //>02} {title}.{extension}" and have it use the full title.

@God-damnit-all
Copy link
Contributor

File or directory names generally don't get truncated. There are a some exceptions in the default format strings for exhentai, reddit, and all *reactor sites because the resulting name could get longer than 256 characters in some edge cases and someone opened an issue reporting that, but you can override the default filename/directory format strings for those sites to not trim anything.

For example filenames for Reddit default to "{id}{num:? //>02} {title[:220]}.{extension}", but you can change that to "{id}{num:? //>02} {title}.{extension}" and have it use the full title.

Thanks once again, that was very helpful.

@God-damnit-all
Copy link
Contributor

@mikf Is there a 'trim' option? Sometimes titles have leading spaces or trailing spaces that I'd like to be rid of.

@mikf
Copy link
Owner

mikf commented Oct 15, 2020

Not yet, but that should be quite easy to implement.

mikf added a commit that referenced this issue Nov 2, 2020
@God-damnit-all
Copy link
Contributor

@mikf Does extension-map have to be defined for every extractor?

@mikf
Copy link
Owner

mikf commented Nov 3, 2020

Like with all/most generic extractor options, you can set a general value and/or a specific value for certain extractors. And if it isn't set or null, it'll use the default (currently a noop, but it's going to get changed to at least replace jpeg with jpg in 1.16.0).

For example

{
    "extractor": {
        "extension-map": {"a": "b"},

        "danbooru": {
            "extension-map": {"c": "d"}
        },

        "pixiv": {
            "extension-map": null
        }
    }
}

will use the default for Pixiv, {"c": "d"} for Danbooru, and {"a": "b"} for everything else

@Hrxn
Copy link
Contributor

Hrxn commented Jun 26, 2021

FYI:
Edit: Used examples from my old gallery-dl.conf template, which used an incorrect field name for Imgur, that has been changed by a site update in the meantime: {hash} -> {id}
Fixed now below

@mikf Okay, now that filename has an object with Python expressions to check against, my earlier example in here combining :R and :? formatters for a single name field should be viable, I think.

Let's see, the idea is to combine both of these somehow:
"filename": "{title:R /_/}{id}.{extension}"
"filename": "{title:?/ /}{id}.{extension}"

Okay, maybe something like this:

{
    "title == ''" : "{id}.{extension}",
    ""            : "{title:R /_/}{id}.{extension}"
}

Would this work? (Assuming you can check that a title is present this way)

Although I just realized there may be another way to address this.
Considering this name for directory which I use in "imgur" for Albums:
"directory": ["Imgur", "Albums", "{album[title]!t:R /_/}{album[id]}"]

Because if album[title] already evaluates to nothing, then both the !t and :R options are meaningless, and the resulting filename should only be album[id], or not?
Because if this would work, the functionality from my example would have already been present all along..sigh.

@mikf
Copy link
Owner

mikf commented Jun 26, 2021

@Hrxn it is possible to combine these "special" formatting options since 90e4c64 (v1.13.0), i.e. "{title:?/ /R /_/}{hash}.{extension}". Using conditional filenames would work as well, as would using the path-restrict option to replace spaces with underscores.

Assuming you can check that a title is present this way

I'd rather do "not title", or even "not locals().get('title')". The latter also works when there is no title variable present at all. Trying to access title in that case would result in a NameError exception.

Because if album[title] already evaluates to nothing, then both the !t and :R options are meaningless, and the resulting filename should only be album[hash], or not?

As long as this nothing is an empty string and not None/null, than yes, {album[title]!t:R /_/} would be "ignored" and only the value from {album[hash]} gets used.

@Hrxn
Copy link
Contributor

Hrxn commented Jun 26, 2021

@Hrxn it is possible to combine these "special" formatting options since 90e4c64 (v1.13.0), i.e. "{title:?/ /R /_/}{hash}.{extension}". Using conditional filenames would work as well, as would using the path-restrict option to replace spaces with underscores.

Good to know, thanks!
And I totally forgot that I was already using
"path-restrict": "/ \\\\|/<>:\"?*"
at the base config level. Silly me.

Oh well, checking this again is probably not much of slowdown anyway, I assume..

Assuming you can check that a title is present this way

I'd rather do "not title", or even "not locals().get('title')". The latter also works when there is no title variable present at all. Trying to access title in that case results in a NameError exception.

I see. So it's best to use "not locals().get('title')", because it basically works all the time? 😄

Because if album[title] already evaluates to nothing, then both the !t and :R options are meaningless, and the resulting filename should only be album[hash], or not?

As long as this nothing is an empty string and not None/null, than yes, {album[title]!t:R /_/} would be "ignored" and only the value from {album[hash]} gets used.

Yeah, that's what I meant. I tried that with an example - after writing this comment, whoops - and it is indeed working like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants