[feature request] User-defined booleans (and special values) #1651

Poshi · 2024-09-18T10:40:42Z

the booleans in Miller are clearly defined as

boolean: literals true and false; results of ==, <, >, etc.

The result of comparisons are internal and will always be booleans, but for the external boolean data it can only be "true" or "false" to be considered as a boolean.
The world is vast and many other languages (or businesses) have other representation. For example, I work quite a lot with Python, and printing a boolean yields "True", not "true":

$ python -c "print(1==1)"
True

Some other system or business could decide to use "on/off", or "up/down", or "TRUE/FALSE"... Covering all the possibilities by default is impossible, but being able to configure which strings should be considered as truth values would be a nice addition, so someone working with a system that usually uses "True/False" should be able to do that natively without conversions or having to compare strings.
Beware that some formats have these values fixed: in JSON files, you can only use bare "true/false" (no quotes) to represent the booleans. My comment only serves for the formats where the boolean values are not well defined like CSV.

I'm aware about No autoconvert to boolean, but I think that this is not related to the issue: I'm just asking to have user-configurable boolean literals, as we already have a couple of literals for that duty.

Thanks for your effort!

The text was updated successfully, but these errors were encountered:

johnkerl · 2024-09-18T13:09:18Z

@Poshi this is awesome!!

Two options I can think of:

Have this inference be done within the file-format readers -- users would need to pass in the desired values-that-are-true and values-that-are-false from a command-line flag probably ...
Have this be offered as an opt-in verb -- mlr boolify perhaps -- again maybe with flags like mlr boolify --true True,on,yes --false False,off,no ...

Poshi · 2024-09-18T14:44:38Z

I would personally go for option 1. The option 2, going for a verb, doesn't feel to me like the way to go.
If this have been working for "true" and "false" literals, why it should work differently from now on?

For option 1, the values that are true/false can be provided in the command line, but having them in the .mlrrc would be more comfortable for routine use. Environment vars can also be set up. With the usual precedence: config file<<env vars<<command line.

aborruso · 2024-09-18T16:08:28Z

Have this be offered as an opt-in verb -- mlr boolify perhaps -- again maybe with flags like mlr boolify --true True,on,yes --false False,off,no ...

I love the verb option.

In any case, thanks for the idea to @Poshi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] User-defined booleans (and special values) #1651

[feature request] User-defined booleans (and special values) #1651

Poshi commented Sep 18, 2024

johnkerl commented Sep 18, 2024

Poshi commented Sep 18, 2024

aborruso commented Sep 18, 2024

[feature request] User-defined booleans (and special values) #1651

[feature request] User-defined booleans (and special values) #1651

Comments

Poshi commented Sep 18, 2024

johnkerl commented Sep 18, 2024

Poshi commented Sep 18, 2024

aborruso commented Sep 18, 2024