Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting a zero-padded string/number using int() auto-detects as octal #1241

Closed
adren opened this issue Mar 23, 2023 · 9 comments
Closed

Comments

@adren
Copy link

adren commented Mar 23, 2023

Assuming a file created with the following command:

(echo "id1";seq -f "%07g" 1 20) > /tmp/ids.tsv

which contains a list of zero-padded numbers

id1
0000001
0000002
0000003
0000004
0000005
...

When trying to convert the numbers into a new column

mlr --tsvlite put '$id2=int($id1)' /tmp/ids.tsv

some lines are returning errors and the corresponding conversions seems to be shifted

id1     id2
0000001 1
0000002 2
0000003 3
0000004 4
0000005 5
0000006 6
0000007 7
0000008 (error)
0000009 (error)
0000010 8
0000011 9
0000012 10
0000013 11
0000014 12
0000015 13
0000016 14
0000017 15
0000018 (error)
0000019 (error)
0000020 16

It has been tested with Miller 5.10 (with a different output), 6.4.0 and 6.6.0

@johnkerl
Copy link
Owner

@adren the leading 0 indicates octal input.

I can modify the int function to take a second argument for the desired base (here, 10) -- or, a separate function which is always assuming decimal input.

@adren
Copy link
Author

adren commented Mar 23, 2023

thanks @johnkerl for the explanation
that makes sense and the "shifting" should have hinted me towards the octal auto-detection

and indeed, being able to force the base numbering could be the solution as I don't know how to force the base10 detection.

As it seems the auto-detection is done for each line individually, forcing to a specific base would be really helpful to avoid such misinterpretation of the given numbers

alternatively, I would have done it with stripping/trimming the leading zeros if the lstrip function could have accepted a given character to trim instead of just a space

@johnkerl
Copy link
Owner

alternatively, I would have done it with stripping/trimming the leading zeros if the lstrip function could have accepted a given character to trim instead of just a space

Wow, great idea -- thanks! :)

@adren
Copy link
Author

adren commented Mar 23, 2023

Nevertheless, forcing a base is also something that might be useful to avoid incorrect auto-detection

I don't know the logic behind to detect an octal from a binary, but such enhancement might be helpful to speed up a little bit the conversions on large input for a given base, and could also lead to new types (hexadecimal)

@aborruso
Copy link
Contributor

It's not what you want:

(echo "id1";seq -f "%07g" 1 20) | mlr --c2j put '$new=int(sub($id1,"^0*",""))'
{
  "id1": "0000001",
  "new": 1
},
{
  "id1": "0000002",
  "new": 2
},
{
  "id1": "0000003",
  "new": 3
}

@adren
Copy link
Author

adren commented Mar 23, 2023

It's not what you want:

(echo "id1";seq -f "%07g" 1 20) | mlr --c2j put '$new=int(sub($id1,"^0*",""))'

yes indeed, this works fine although it might take longer to compute on large file than a plain lstrip
TIMTOWTDI ;-)

Thanks @aborruso

@johnkerl
Copy link
Owner

@adren #1244 is merged

@adren adren closed this as completed Mar 24, 2023
@adren
Copy link
Author

adren commented Mar 24, 2023

Not a bug

...Maybe the title should be changed to "Converting a zero-padded string/number auto-detects as octal"

@johnkerl johnkerl changed the title Converting an integer from a string ("int" function) returns errors and shift output Converting a zero-padded string/number using int() auto-detects as octal Mar 24, 2023
@johnkerl
Copy link
Owner

Thanks @adren ! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants