-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utils/astring: add expected control characters #134
base: main
Are you sure you want to change the base?
Conversation
With !p and ]104 issue #133 doesn't happen and my test passes. |
It looks to me the sequences are:
I don't have details about the OS command Ref. https://gist.github.com/justinmk/a5102f9a0c1810437885a04a07ef0a91 |
@@ -37,6 +37,7 @@ def strip_console_codes(output, custom_codes=None): | |||
output = f"\x1b[m{output}" | |||
console_codes = "%[G@8]|\\[[@A-HJ-MPXa-hl-nqrsu\\`]" | |||
console_codes += "|\\[[\\d;]+[HJKgqnrm]|#8|\\([B0UK]|\\)|\\[\\?2004[lh]" | |||
console_codes += "|[c]|[!p]|[\\]104]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the problem with "c" (and actually with all the existing and added codes) is that we only assume they are matched from the beginning, but we don't force that. This means something like \x1bthis-is-a-broken-text-c
would now be accepted and result in output his-is-a-broken-text-c
.
Note this is not introduced by your change, it just makes it more visible (as matching a single "c" is quite probable).
The fix for this should be IMO to match from the beginning (especially since we erase characters from the beginning). So your fix should add |^c|^!p|^]104
to ensure only \x1bc
/\x1b!p
and \x1b]104
is matched.
As for the brackets, it's a regexp so [c]
matches any character out of the brackets. At this point we don't expect any other standalone characters so I'd suggest just the c
without the brackets.
Similarly the [!p]
which actually matches only !
out of the !pfoo
string as it matches one of the characters inside the brackets. So that one really has to be without the brackets.
The last one is actually generic and vendors are free to specify any ]\d+
strings so here I'd probably consider ^]\d+
rather than one of the variants, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it seems even worse then I though. Astring blindly ignores many issues, I have fixed some of them and added a selftest here: #135 but IMO it'd be even better to simply replace it with a regexp to substitute all console codes with ""
instead of this custom handling.
Anyway I understand that this code is here and the behaviour was odd for a long time. So the question is whether we really want to do something about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case my comments were not clear, I'm ready to ack this patch once your part of the regexp works properly, therefore something like console_codes += "|^c|^!p|^]104
or console_codes += "|^c|^!p|^]\d+
. @clebergnu you had some concerns, would this work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try this, thanks for the review, @ldoktor AFAICR @clebergnu 's concern was that we first confirm those are actually console codes. I did it in all cases IMO, just not sure what the OS operation 104 is supposed to mean. As you might have seen I have reached out to many people in the OS, kernel space and no answer so far. But if the test passes, it passes, and the control characters all start with \x1b so I think it's safe to assume we're not breaking 'normal strings'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case my comments were not clear, I'm ready to ack this patch once your part of the regexp works properly, therefore something like console_codes += "|^c|^!p|^]104 or console_codes += "|^c|^!p|^]\d+.
Either I'm misunderstanding something or your proposal doesn't work. With the indicated regex the algorithm raises at "!p".
The problematic line that I reported starts with "ESC c" but no line starts with "ESC !p". The control character doesn't imply that the lines start with these characters. We see the soft reset command isn't the first, so for the same right "c" doesn't need to be the first. Also, in this xterm reference there's no indicated it must be the first https://gist.github.com/justinmk/a5102f9a0c1810437885a04a07ef0a91 Maybe the algorithm works a bit different from what we read in the code? I read out the tmp_word. Its value changes as listed below.
However, my original regex works and it doesn't filter out 'c' if there's no ESC in front of it as we can see in terms such as "Connected" or "avocado" right in the first line.
I think we should merge this as-is and not accept #135 for the outlined reasons.
Please let me know if you disagree. Otherwise I hope we can merge this because there's no workaround besides this patch for my tests.
tmp_word: [mConnected to domain 'avocado-vt-vm1'
Escape character is ^] (Ctrl + ])
LOADPARM=[ ]
Using virtio-blk.
Using SCSI scheme.
s390-ccw Enumerated Boot Menu.
[0] default
...
[ 0.558508] Checked W+X mappings: passed, no W+X pages found
[ 0.558514] Run /init as init process
[ 0.563984] systemd[1]: Successfully made /usr/ read-only.
2024-08-30 09:09:55,831 astring L0055 DEBUG| tmp_word: c
2024-08-30 09:09:55,831 astring L0055 DEBUG| tmp_word: [!p
2024-08-30 09:09:55,831 astring L0055 DEBUG| tmp_word: ]104^G
2024-08-30 09:09:55,831 astring L0055 DEBUG| tmp_word: [?7h[ 0.564148] systemd[1]: systemd 256-13.el10 running in system mode (+PAM +AUDIT +SELINU
X -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBCRYPTSETUP_PLUGINS +LIB
FDISK +PCRE2 +PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT +LIBARCHIVE)
[ 0.564153] systemd[1]: Detected virtualization kvm.
[ 0.564157] systemd[1]: Detected architecture s390x.
[ 0.564159] systemd[1]: Running in initrd.
Welcome to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clebergnu IIUC it should be OSC command: https://en.wikipedia.org/wiki/ANSI_escape_code#OSC which are not strictly defined (which is why I suggested the \d
to catch them all).
As for this version I mentioned the brackets are unnecessary as you only define one or a few characters (it's regexp so the []
brackets means one of the characters.
And the c
itself sounds a bit too vague without the ^
because it simply matches 1 character whenever c
is present after the initial text. Take this as an example: aexpect.utils.astring.strip_console_codes("0\x1b[0m1\x1b2345\x1b67890123456789")
fails but aexpect.utils.astring.strip_console_codes("0\x1b[0m1\x1b23c45\x1b67890123456789")
succeeds and reports the c
as part of the output, stripping the 2
simply because it cuts of the first character.
So yes, the #135 should rapidly change the situation but even before that one (and I think your commit should go first) shouldn't introduce more of the odd behaviours.
As for the !p
, thanks for sharing the snippet (would be nice to get the full original output). From there it looks like your getting \x1b[!p
and not just \x1b!p
, which is why my regexp does not work. In that case it should be |^c|^\\[!p|^]104
or (IMO better) |^c|^\\[!p|^]\d+
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clebergnu with #135 as-is the error just reproduces. But I ran another test that uses the console and that one passes. I'm trying to figure out now what the right expression should be according to #135, |^c|^\\[!p|^]\d+
doesn't work. Python itself complains about the invalid escape sequence \d
but |^c|^\\[!p|^]\\d+
doesn't work either.
I am still not sure I understand why the "^", I have not understood the algorithm yet. I hope to have a moment for this soon.
Fixes: avocado-framework#133 In recent kernel boots, stripping raised error because of missing control characters. Add them: `c` reset screen `!p` soft reset terminal `]104` - execute OS command '104' These control characters are listed in the xterm control sequence cheat sheet at https://gist.github.com/justinmk/a5102f9a0c1810437885a04a07ef0a91 Signed-off-by: Sebastian Mitterle <[email protected]>
Hello @smitterl any updates on this one? The |
Fixes: #133
In recent kernel boots, stripping raised error because of missing control characters.
Add them:
c
reset screen!p
soft reset terminal]104
- execute OS command '104'These control characters are listed in the xterm control sequence cheat
sheet at
https://gist.github.com/justinmk/a5102f9a0c1810437885a04a07ef0a91