Performance tuning a simple shell pipeline (are we missing SIGPIPE?) #373

rw · 2017-03-13T20:52:33Z

I'm trying to convert a bash script to use sh, version sh==1.12.10. Here's an example command that I'm trying:

sh.head(sh.cat('/usr/share/dict/words', _piped='direct'))

The above takes about 2.81 seconds on my system (timed with %timeit in an IPython shell).

When running in bash, using the command time (cat /usr/share/dict/words | head), it is of course much quicker: 4 milliseconds.

Am I using sh correctly? The docs don't seem to cover troubleshooting pipeline performance issues.

The text was updated successfully, but these errors were encountered:

rw · 2017-03-13T21:25:24Z

My best workaround is to do:

sh.sh('-c', 'cat /usr/share/dict/words | head')

But according to %timeit that's still much slower at 15ms per iteration.

amoffat · 2017-03-13T21:28:44Z

So _piped='direct' isn't a thing anymore. Direct piping is now the default, so you just want what the docs say is allowed... True or "out". Actually, using "direct" is causing piping to not work at all (a side bug maybe), since it's not a valid value, which accounts for your slowdown.

However, when you use _piped=True, cat immediately fails with a broken pipe, because sh.head hasn't started yet. cat seems to be the only program that has this issue, since this works fine:

sh.head(sh.head('/usr/share/dict/words', _piped=True, n=10000))

Not sure if this is a bug with sh

amoffat · 2017-03-13T21:32:16Z

Also as you can probably see from sh.head(sh.head(file you can just use sh.head once on your file :)

rw · 2017-03-13T21:37:08Z

Thanks for the quick response. Are you using master? I ran your example and I get a SIGPIPE exception:

head: error writing 'standard output': Broken pipe

amoffat · 2017-03-13T21:43:37Z

I'm on 1.12.10. Running it a few times now fails for me sometimes. If I change it to n=100000 it fails consistently. This looks like a timing issue with sh. I'll need to dig into it further.

amoffat · 2017-03-14T05:30:01Z

Ok, this issue should be fixed in 1.12.11. Basically the problem was related to SIGPIPE as you originally suggested. Python sets SIGPIPE to SIG_IGN on startup, so spawned processes were ignoring SIGPIPE. But they were still dying (from error code 1) from errors when a write reported EPIPE.

However, a race existed because sometimes the piping source process finished before the piping destination process, so there never was a "hang up" on the fd (and therefore never an EPIPE)...the data just stayed in the pipe buffer until the destination process could read from it. I imagine that's why the tests never caught it. Your sample code caught it because the cat was longer lived than the head and the head didn't consume all the data.

The fix was to make sure spawned processes saw SIGPIPE, and then suppress any exception generated by _piped processes that received a SIGPIPE.

Anyways good find, confirm that it works for you and let me know.

rw · 2017-03-14T08:49:59Z

Works great, thanks for fixing this so quickly. I'm glad my hunch about SIGPIPE was useful :-)

rw · 2017-03-14T09:01:11Z

Note that sh takes about double the time of invoking the same command with just bash. In the following example, I try to minimize overhead by doing the command resolution only once:

import sh
head = sh.head
cat = sh.cat
%timeit -n 100 -r 5 head(cat('/usr/share/dict/words', _piped=True))
100 loops, best of 5: 30.8 ms per loop

And, here, I use the aforementioned kludge of invoking the entire command in a single sh call:

import sh
shell = sh.sh
%timeit -n 100 -r 5 shell('-c', 'cat /usr/share/dict/words | head')
100 loops, best of 5: 17.2 ms per loop

Finally, here it is running in bash:

$ time sh -c 'cat /usr/share/dict/words | head'
A
a
aa
aal
aalii
aam
Aani
aardvark
aardwolf
Aaron

real    0m0.011s
user    0m0.004s
sys     0m0.007s

amoffat · 2017-03-14T15:12:55Z

Does that performance gap shrink the longer the processes run? Try piping to head -n 50000

rw · 2017-03-14T21:43:01Z

The difference doesn't shrink on my system as n grows.

More importantly, thanks for fixing this, it's much better!

* pypi readme doc bugfix [PR#377](amoffat/sh#377) * bugfix for relative paths to `sh.Command` not expanding to absolute paths [#372](amoffat/sh#372) * updated for python 3.6 * bugfix for SIGPIPE not being handled correctly on pipelined processes [#373](amoffat/sh#373)

amoffat added the bug label Mar 13, 2017

This was referenced Mar 14, 2017

Update sh to 1.12.11 Sanji-IO/sanji-bundle-cellular#112

Closed

Update sh to 1.12.11 valerymelou/cookiecutter-django-gulp#100

Merged

rw closed this as completed Mar 14, 2017

This was referenced Jun 7, 2017

Update sh to 1.12.14 jayfk/cookiecutter-saas#257

Open

Update sh to 1.12.14 pydanny/cookiecutter-djangopackage#198

Closed

Update sh to 1.12.14 Sanji-IO/sanji-bundle-cellular#134

Merged

pyup-bot mentioned this issue Jun 28, 2017

Update sh to 1.12.14 abkfenris/gage-beaglebone#38

Open

pyup-bot mentioned this issue Jul 16, 2017

Initial Update LuisAlejandro/candyshop#8

Closed

pyup-bot mentioned this issue Jul 26, 2017

Initial Update NdagiStanley/room-allocation#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tuning a simple shell pipeline (are we missing SIGPIPE?) #373

Performance tuning a simple shell pipeline (are we missing SIGPIPE?) #373

rw commented Mar 13, 2017

rw commented Mar 13, 2017 •

edited

Loading

amoffat commented Mar 13, 2017 •

edited

Loading

amoffat commented Mar 13, 2017

rw commented Mar 13, 2017

amoffat commented Mar 13, 2017

amoffat commented Mar 14, 2017 •

edited

Loading

rw commented Mar 14, 2017

rw commented Mar 14, 2017

amoffat commented Mar 14, 2017

rw commented Mar 14, 2017 •

edited

Loading

Performance tuning a simple shell pipeline (are we missing SIGPIPE?) #373

Performance tuning a simple shell pipeline (are we missing SIGPIPE?) #373

Comments

rw commented Mar 13, 2017

rw commented Mar 13, 2017 • edited Loading

amoffat commented Mar 13, 2017 • edited Loading

amoffat commented Mar 13, 2017

rw commented Mar 13, 2017

amoffat commented Mar 13, 2017

amoffat commented Mar 14, 2017 • edited Loading

rw commented Mar 14, 2017

rw commented Mar 14, 2017

amoffat commented Mar 14, 2017

rw commented Mar 14, 2017 • edited Loading

rw commented Mar 13, 2017 •

edited

Loading

amoffat commented Mar 13, 2017 •

edited

Loading

amoffat commented Mar 14, 2017 •

edited

Loading

rw commented Mar 14, 2017 •

edited

Loading