This file contains a steadily increasing set of shell script constructs that are useable in real-world applications. In case any of these are not portable to reasonable traditional shell environments those are stated explicitly.
The most reliable way to start shell script is to have string #!/bin/sh
as the first line. In that case portability should be taken care very
well. Perhaps in these days requirement to support traditional bourne
shell constructs could (in most cases) be skipped (and ported whenever
necessary (if ever)). Modern shell constructs are just so much convenient
to use, and tested.
#!/bin/sh
may be dash
(linux), bash
(linux) and something that works
pretty much like dash
in bsd systems. I’d guess newer solaris derivatives
use something ksh
related. Android devices have something based on
mksh
. macOS uses bash
3.2.
To make the shell script more robust, the next line should be:
Every shell script should start with this line. set -e
makes script
exit in case any command returns nonzero (with exceptions) and set -u
makes undefined variable to be an error (which makes script exit at least
with -e
).
The use of these is somewhat related to warning options in C
compilation
and use strict; use warnings
in perl program.
With set -e
the script must take care of the cases when command may
return nonzero. The methods are e.g. if command; then ... fi
,
command [ && ... ] || ...
and while command; do ... done
and so on…
Some commands, like expr
returns nonzero in special ways, which is
somewhat unfortunate in this context.
But in some cases the nonzero exit value of the command does not cause
the shell to exit. A non-last command in pipeline is obvious case, but
also at least in 2 command substitution cases: command arguments
(printf %s\\n `false`; echo $?
) or when variable is declared with
e.g. `export` and `readonly` (readonly rov=`false`
) (there are more
of these, e.g. local – but these don’t work on all modern shells…).
New to me (2021-05): The following does not cause shell to exit:
false && true
(i.e not false && true || true
)
Between executed commands. If false && true
is last in a function
and the caller does not handle nonzero return value of a function, shell
will exit (i.e. funcall
vs. funcall && :
– the difference with
funcall || :
is that former sets $?
to some nonzero value and with || :
the $?
will be zero (0)).
Ok, let’s sample (before short information of so important set -u
)
$ sh -ec 'fn() { false && :; }; fn; echo foo' zsh: exit 1 sh -ec 'fn() { false; }; fn; echo $?' $ sh -ec 'fn() { false; }; fn && :; echo $?' 1 $ sh -ec 'fn() { false; }; fn || :; echo $?' 0 $ sh -ec 'fn() { false && :; :; }; fn; echo $?' 0
set -u
just means all variables needs to be introduced before referenced;
with positional parameters ${1-}
”fallback” can be used when there is
just no other suitable options.
set -f
makes script not do filename expansion using glob patterns. This
can make the script more robust – just that is initially (like with -eu
developers forget this and the filename expansions they try fails…)
In case script is bash, appending --posix
to the command line gives
chance that the developer script works more like simpler modern shells.
That also prohibits some shell builtins to be overwritten by bash
functions defined in environment (like unset () { echo rm -rf /; }
),
which is at least a bit of a security enchancement.
Systems have /bin/sh
for sure, but e.g. bash
may be somewhere else
in $PATH. The common convention is to use #!/usr/bin/env
to run
the another shell. With that no further options can be added to the
command line (but look ideas in templates/ directory for that, if needed).
Latest info: since bash 4.3 after declaring (associative) arrays, some
access of those gives ‘unbound variable’ errors. To fix these, add
=()
to the declare line; as an example
declare -A associative_array=()
Just tested:
$ bash -c 'set -u; declare -a a; echo ${#a[*]}' bash: a: unbound variable
$ bash -c 'set -u; declare -a a=(); echo ${#a[*]}' 0
In year 2002 I was writing a shell script on a Solaris box. For convenience it had e.g. GNU coreutils software installed on /opt/gnu/bin (which was included in $PATH). After I got the script ready, I shipped it to another machine, where it – (un)surprisingly – failed to work properly.
As all of you have guessed, the problem was using GNU extensions in the script. Since then I’ve had
PATH='/sbin:/usr/sbin:/bin:/usr/bin'; export PATH
(with some variations) in near beginning of my shell scripts…
Whenever needed in some systems, the PATH= line is outcommented, but in general it is there.
Another thing that is usually there is:
LANG=C LC_ALL=C; export LANG LC_ALL; unset LANGUAGE
Too bad there are no other locales than C
there by default. Latest
try was C.UTF-8
but that still failed to exist on some systems.
Previously I’ve set en_US.UTF-8
but there is some pain points
with e.g. script(1)
printing dates in 12 hour format (awful!). I’d
use en_IE.UTF-8
but that is even more uncommon than C.UTF-8
.
With C
, python3(1)
suffers. With any unknown local, perl(1)
shrieks. Based on the needs I have to set the locale variables
accordingly, change in mid-script or (sometimes) drop the setting
altogether.
Sometimes the environment variables must be set exactly: then the only way may be:
test ${__CLEAR_ENV} = yep || exec /usr/bin/env -i __CLEAR_ENV=yep "USER=$USER" ... /bin/sh "$0" "$@" unset __CLEAR_ENV
Ulimit and umask may matter. Note that in bash umask can take same symbolic
options as chmod() but e.g. dash cannot (dash: 1: umask: Illegal mode: umask
)
Command substitution allows the output of a command to be substituted in place of the command name itself.
There are 2 formats for command substitution; the legacy “backquoted”:
`command [args]`
and the new:
$(command [args])
I like to use the `legacy` version because (whenever the command line is “simple”):
- it looks clearer
- it is portable to older shells
- it works on Amigashell ;)
These formats works almost the same, but not exactly. e.g.:
$ printf %s\\n `printf %s 's/\\(.\\)/\\1/'` s/\(.\)/\1/ $ printf %s\\n $(printf %s 's/\\(.\\)/\\1/') s/\\(.\\)/\\1/
more or less the rest of this script is work in progress – I decided not to delete those, in case someone finds it useful…
IFS – the internal field separator, is a variable containing (multiple
ascii characters) that are used to split unquoted $variables to separate
arguments to function or command. By default it contains characteds space,
tab and newline (in this order) (but to make things more complicated, e.g.
zsh appends \0
to this list, making e.g. extraction of single characters
from this variable harder…)
By changing contents of $IFS to something else, many things can be done within a shell script, to be able to restore IFS to it’s original value following lines can be added to the beginning of shell script.
saved_IFS=$IFS readonly saved_IFS
XXX move next content elsewere, and continue with what can be done with changed $IFS
Note that the syntax readonly var=$val
was not used; It is not portable
to traditional shell; also that format has subtle behavior differences(*),
and for example readonly $var=$val
will assign val to variable named
by expanded value of var (this applies also to export $var=$val
and so on).
(*) in bash readonly var=$IFS
would not assign full IFS to var,
$IFS
would need to be quoted there (which is not normally needed).
(use instead of set -
)
Traditionally set -
works as set +xv
(but this is not echoed!) –
just that zsh in native mode clears positional parameters.. AARGH!
In zsh – set - "$@
” would work, too…
to test:
for shell in zsh bash ksh dash sh do $shell -c 'set a b c; echo $#; set -; echo $#' $shell -c 'set a b c; echo $#; set - "$@"; echo $#' done
Could not find zsh option to “disable” this feature (sh_option_letters
did not do it). emulate ksh
does it
(New info 2023-03: zsh command -v
outputs aliases as those are defined,
and just the names of functions, so it cannot be used to find path to an
executable for sure. That is unfortunate. (++) below.)
Often, one needs to know whether a particular executable is available in
the system. In many systems which
, hash
and command -v
could be used
to figure this out, but…
which
, while often printing path to executable and exiting zero in case
command is found, in some systems prints output to stdout even command is
not found and in some systems don’t exit nonzero even command if not found.
Which is also not usually shell builtin (only in zsh
), requiring shell
to do execve(2)
in addition to 1 or 2 fork(2)
s
The user-visible hash
behaviour is exit value, being zero when command
is found and most often nonzero when not found – but ksh
exits zero
even command is not found.
command -v
outputs path when found and exits nonzero when not
– except on zsh – see (++) above. Therefore dropping suggestion to
use command -v
to store path of an executable (perhaps use which on zsh,
and command -v on other shell - or use iwhich
shown below). One note
about the -v
option of command
on /Modern/(*) shell below.
(*) http://pubs.opengroup.org/onlinepubs/009695399/utilities/command.html
mentions that the -v
option might not be available in all shells that
claims to have POSIX compatibility – all Modern shells I’ve tested
have this feature, though.
To have which functionality that works with all shells, one could use the following (which may even be fastest as no forks required).
iwhich () { case $1 in */*) test -x "$1" || return 1 case $# in 3) eval $3=\$1 ;; *) return 1 ;; esac esac IFS=: for _v in $PATH do test -x "$_v/$1" || continue _v=$_v/$1 case $# in 3) eval $3=\$_v ;; *) eval $1=\$_v ;; esac IFS=$saved_IFS return 0 done IFS=$saved_IFS return 1 }
Now, iwhich ls
would assign path of ls
to variable $ls
.
And, iwhich ssh-agent as ssh_agent
would assing path of ssh-agent
to variable ssh_agent
.