Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] User-defined booleans (and special values) #1651

Open
Poshi opened this issue Sep 18, 2024 · 3 comments
Open

[feature request] User-defined booleans (and special values) #1651

Poshi opened this issue Sep 18, 2024 · 3 comments

Comments

@Poshi
Copy link
Contributor

Poshi commented Sep 18, 2024

Dear @johnkerl,

the booleans in Miller are clearly defined as

boolean: literals true and false; results of ==, <, >, etc.

The result of comparisons are internal and will always be booleans, but for the external boolean data it can only be "true" or "false" to be considered as a boolean.
The world is vast and many other languages (or businesses) have other representation. For example, I work quite a lot with Python, and printing a boolean yields "True", not "true":

$ python -c "print(1==1)"
True

Some other system or business could decide to use "on/off", or "up/down", or "TRUE/FALSE"... Covering all the possibilities by default is impossible, but being able to configure which strings should be considered as truth values would be a nice addition, so someone working with a system that usually uses "True/False" should be able to do that natively without conversions or having to compare strings.
Beware that some formats have these values fixed: in JSON files, you can only use bare "true/false" (no quotes) to represent the booleans. My comment only serves for the formats where the boolean values are not well defined like CSV.

I'm aware about No autoconvert to boolean, but I think that this is not related to the issue: I'm just asking to have user-configurable boolean literals, as we already have a couple of literals for that duty.

Thanks for your effort!

@johnkerl
Copy link
Owner

@Poshi this is awesome!!

Two options I can think of:

  • Have this inference be done within the file-format readers -- users would need to pass in the desired values-that-are-true and values-that-are-false from a command-line flag probably ...
  • Have this be offered as an opt-in verb -- mlr boolify perhaps -- again maybe with flags like mlr boolify --true True,on,yes --false False,off,no ...

@Poshi
Copy link
Contributor Author

Poshi commented Sep 18, 2024

I would personally go for option 1. The option 2, going for a verb, doesn't feel to me like the way to go.
If this have been working for "true" and "false" literals, why it should work differently from now on?

For option 1, the values that are true/false can be provided in the command line, but having them in the .mlrrc would be more comfortable for routine use. Environment vars can also be set up. With the usual precedence: config file<<env vars<<command line.

@aborruso
Copy link
Contributor

  • Have this be offered as an opt-in verb -- mlr boolify perhaps -- again maybe with flags like mlr boolify --true True,on,yes --false False,off,no ...

I love the verb option.

In any case, thanks for the idea to @Poshi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants