Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When tty in use the short running exec can miss part of the output #8483

Open
afazekas opened this issue Aug 15, 2024 · 6 comments
Open

When tty in use the short running exec can miss part of the output #8483

afazekas opened this issue Aug 15, 2024 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@afazekas
Copy link

afazekas commented Aug 15, 2024

What happened?

e2e end testuite used oc rsh to verify parts of the running system,
the expected standard output was missing ~ 1% of the time.

What did you expect to happen?

The full stdout/stderr of a command is visible even when exec -it used 100% percent of the time.

How can we reproduce it (as minimally and precisely as possible)?

Have any container running for instance busybox/fedora/.. running the sleep infinity.
oc rsh mycontainer echo foobar
or
crictl exec -it mycontainer echo foobar

Did not had issue:

without -i or without -t

oc exec mycontainer -- echo foobar

waiting at the end

kubectl exec -it mycontainer -- sh -c "echo mytext_$i; sleep 1"`

Anything else we need to know?

Simple script:

i=0
while true; do
	(( i++ ))
	out=`kubectl exec -it pod/fedora2 -- echo "mytext_$i"`
	err=$?
        if ! echo $out | grep mytext; then
		echo OOPS
		echo "XX${out}YY"
		echo "err: $err"
		echo "i: $i"
		exit 1
	fi
done

Bumped cri-o version did not solved the issue.
podman with crun/runc did not had similar issue.

CRI-O and Kubernetes version

$ crio --version
Version:        1.29.5-5.rhaos4.16.git7032128.el9
GitCommit:      unknown
GitCommitDate:  unknown
GitTreeState:   clean
GoVersion:      go1.21.9 (Red Hat 1.21.9-1.el9_4)
Compiler:       gc
Platform:       linux/amd64
Linkmode:       dynamic
BuildTags:      
  rpm_crashtraceback
  libtrust_openssl
  selinux
  seccomp
  exclude_graphdriver_devicemapper
  exclude_graphdriver_btrfs
  containers_image_ostree_stub
LDFlags:           -compressdwarf=false -B 0x63612eab69102abdedde2ea20f4d8f5c4a4d5cc2 -extldflags '-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 '
SeccompEnabled:   true
AppArmorEnabled:  false
$ crio --version
crio version 1.30.4-5.rhaos4.17.git95e494c.el9
Version:        1.30.4-5.rhaos4.17.git95e494c.el9
GitCommit:      unknown
GitCommitDate:  unknown
GitTreeState:   clean
GoVersion:      go1.22.5 (Red Hat 1.22.5-1.el9)
Compiler:       gc
Platform:       linux/amd64
Linkmode:       dynamic
BuildTags:      
  rpm_crashtraceback
  libtrust_openssl
  selinux
  seccomp
  exclude_graphdriver_devicemapper
  exclude_graphdriver_btrfs
  containers_image_ostree_stub
LDFlags:           -compressdwarf=false -B 0x5d6defb9f9569fe331c2f65ba1b4a27f7b82de22 -extldflags '-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 '
SeccompEnabled:   true
AppArmorEnabled:  false
$ kubectl version --output=json
{
  "clientVersion": {
    "major": "1",
    "minor": "29",
    "gitVersion": "v1.29.6",
    "gitCommit": "fa846511dbeb7e08cf77265056397283c6c896f9",
    "gitTreeState": "clean",
    "buildDate": "2024-07-11T10:58:42Z",
    "goVersion": "go1.21.11 (Red Hat 1.21.11-1.el9_4) X:strictfipsruntime",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "kustomizeVersion": "v5.0.4-0.20230601165947-6ce0bf390ce3",
  "serverVersion": {
    "major": "1",
    "minor": "29",
    "gitVersion": "v1.29.5+29c95f3",
    "gitCommit": "a0d6f21a1c9beefaa8600c62861da3b014b804aa",
    "gitTreeState": "clean",
    "buildDate": "2024-06-17T02:43:13Z",
    "goVersion": "go1.21.9 (Red Hat 1.21.9-1.el9_4) X:strictfipsruntime",
    "compiler": "gc",
    "platform": "linux/amd64"
  }
}

OS version

# On Linux:
$ cat /etc/os-release
NAME="Red Hat Enterprise Linux CoreOS"
ID="rhcos"
ID_LIKE="rhel fedora"
VERSION="416.94.202406172220-0"
VERSION_ID="4.16"
VARIANT="CoreOS"
VARIANT_ID=coreos
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 416.94.202406172220-0"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos::coreos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://docs.okd.io/latest/welcome/index.html"
BUG_REPORT_URL="https://access.redhat.com/labs/rhir/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.16"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.16"
OPENSHIFT_VERSION="4.16"
RHEL_VERSION=9.4
OSTREE_VERSION="416.94.202406172220-0"

$ uname -a
Linux master-0.ocp.openstack.lab 5.14.0-427.22.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jun 10 09:23:36 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

dev-scripts created compact openshift env . Also seen in crc.
@afazekas afazekas added the kind/bug Categorizes issue or PR as related to a bug. label Aug 15, 2024
@saschagrunert
Copy link
Member

saschagrunert commented Aug 15, 2024

Thank you for the issue report @afazekas! Hm, so a kubectl exec -it should result in something like this in the logs:

DEBU[…] Request: &ExecRequest{ContainerId:1293ee29666010f3a4c94a614b64f31d773f72919629fe12396620ecd2f64271,Cmd:[echo hi],Tty:true,Stdin:true,Stdout:true,Stderr:false,}  file="otel-collector/interceptors.go:63" id=1ea0bbf9-4cbf-46d1-973e-effa346decea name=/runtime.v1.RuntimeService/Exec
DEBU[…] Response: &ExecResponse{Url:http://127.0.0.1:45717/exec/Er4AHscM,}  file="otel-collector/interceptors.go:75" id=1ea0bbf9-4cbf-46d1-973e-effa346decea name=/runtime.v1.RuntimeService/Exec
DEBU[…] Starting to track exec PID 490199 for container 1293ee29666010f3a4c94a614b64f31d773f72919629fe12396620ecd2f64271 (should kill = true) ...  file="oci/container.go:773"

Unfortunately the issue not reproducible for me locally. If it's for you, do you think you could extract the CRI-O debug logs to see if there is anything unusual? The journal logs around that time may also provide some more insights.

I also assume that you use runc and conmon, is that correct?

@afazekas
Copy link
Author

Thanks for the quick replay.

I am using the system defaults so runc .

The different on the strace, when used kubectl/oc.
$ strace -f -p <crio_pid> -v -s 4096 -tt -o /tmp/crio_traces2

[core@master-0 ~]$ grep  mytext_128 /tmp/crio_traces2 | cut -c 1-120
3178  13:40:29.288050 <... read resumed>"\0\0\4\10\0\1\3372\t\0\2\0\0\0\0\4\10\0\0\0\0\0\0\0021E\0\0\t\1\4\1\3372\r\203\
3182  13:40:30.542290 write(355, "{\"terminal\":true,\"user\":{\"uid\":0,\"gid\":0,\"additionalGids\":[0]},\"args\":[\"e
542532 13:40:31.670033 <... read resumed>"{\"terminal\":true,\"user\":{\"uid\":0,\"gid\":0,\"additionalGids\":[0]},\"arg
542744 13:40:32.738801 write(11, "{\"args\":[\"echo\",\"mytext_128\"],\"env\":[\"PATH=/usr/local/sbin:/usr/local/bin:/us
542754 13:40:33.124418 <... read resumed>"{\"args\":[\"echo\",\"mytext_128\"],\"env\":[\"PATH=/usr/local/sbin:/usr/local
542754 13:40:33.613695 execve("/usr/bin/echo", ["echo", "mytext_128"], ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/
542754 13:40:33.674204 write(1, "mytext_128\n", 11 <unfinished ...>
542744 13:40:33.681543 <... read resumed>"mytext_128\n", 32768) = 11
542744 13:40:33.682206 write(1, "mytext_128\n", 11 <unfinished ...>
3178  13:40:33.683174 <... read resumed>"mytext_128\r\n", 32768) = 12
3178  13:40:33.687257 write(337, "mytext_128\r\n", 12 <unfinished ...>
[core@master-0 ~]$ grep  mytext_129 /tmp/crio_traces2 | cut -c 1-120
17804 13:40:33.983960 <... read resumed>"\0\0\t\1\4\1\3372S\203\206\370\354\353\352\351\350\347\0\0_\0\1\1\3372S\0\0\0\0
18206 13:40:34.928224 write(321, "{\"terminal\":true,\"user\":{\"uid\":0,\"gid\":0,\"additionalGids\":[0]},\"args\":[\"e
542976 13:40:35.921860 <... read resumed>"{\"terminal\":true,\"user\":{\"uid\":0,\"gid\":0,\"additionalGids\":[0]},\"arg
543055 13:40:36.886683 write(11, "{\"args\":[\"echo\",\"mytext_129\"],\"env\":[\"PATH=/usr/local/sbin:/usr/local/bin:/us
543114 13:40:37.308786 <... read resumed>"{\"args\":[\"echo\",\"mytext_129\"],\"env\":[\"PATH=/usr/local/sbin:/usr/local
543114 13:40:37.889139 execve("/usr/bin/echo", ["echo", "mytext_129"], ["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/
543114 13:40:37.962068 write(1, "mytext_129\n", 11 <unfinished ...>
543196 13:40:37.974473 <... read resumed>"mytext_129\n", 32768) = 11
543196 13:40:37.975805 write(1, "mytext_129\n", 11 <unfinished ...>
17976 13:40:37.977743 <... read resumed>"mytext_129\r\n", 32768) = 12

It is visible the echo did a write(2) in both cases, but the buffer content was not moved with mytext_129 ,
line like 3178 13:40:33.687257 write(337, "mytext_128\r\n", 12 <unfinished ...> is not there.

@afazekas
Copy link
Author

afazekas commented Aug 16, 2024

log has something strange which sounds related. Probably not just from the bad cases.

Aug 16 09:24:52 master-0.ocp.openstack.lab crio[3710914]: time="2024-08-16 09:24:52.108278647Z" level=debug msg="Response: &ExecSyncResponse{Stdout:[FILTERED],Stderr:[],ExitCode:0,}" file="otel-collector/interceptors.go:74" id=2e92020e-4>
Aug 16 09:24:52 master-0.ocp.openstack.lab crio[3710914]: time="2024-08-16 09:24:52.121558962Z" level=warning msg="Stdout copy error: read /dev/ptmx: input/output error" file="oci/oci_unix.go:69"
Aug 16 09:24:52 master-0.ocp.openstack.lab crio[3710914]: time="2024-08-16 09:24:52.126164725Z" level=debug msg="Request: &ExecSyncRequest{ContainerId:030ad477e162a94bbb1cdd1f043214b3904ac5da11b24be654d7ed1f8d229df0,Cmd:[/bin/bash -c set>

@afazekas
Copy link
Author

Can you try in a "CRC VM for OpenShift 4.15.17", or can you suggest a crc version when it is not reproducible ?

Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2024
@kwilczynski kwilczynski removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 30, 2024
Copy link

A friendly reminder that this issue had no activity for 30 days.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

3 participants