The Serverless Shell (sshell
) brings the power of shell scripting to the serverless world.
It is particularly convenient to mine large public data sets available on the internet.
A typical use case is to download a data sample on the local machine, write the code logic using this sample, then execute the same code massively in parallel in a serverless infrastructure.
Currently, only AWS Lambda is supported as a serverless backend. Work is in progress to include other platforms.
A white paper describing sshell
is available online.
wget https://github.com/crucial-project/serverless-shell/releases/download/2.2/serverless-shell-2.2-linux-bin.tar.gz
tar zxvf serverless-shell-2.2-linux-bin.tar.gz
export ACCOUNT=%AWS_ACCOUNT_ID%
export ROLE_NAME=%AWS_IAM_ROLE_NAME% # the role must have access to AWS Lambda (e.g., AWSLambda_FullAccess)
export REGION=%AWS_REGION%
cat serverless-shell-2.2/config.properties.tmpl | sed s,ACCOUNT,${ACCOUNT},g | sed s,ROLE_NAME,${ROLE_NAME},g | sed s,REGION,${REGION},g > config.properties
export CONFIG_DIR=.; sed -i s,FUNCTION_ARN,$(./serverless-shell-2.2/deploy.sh -create | grep FunctionArn | awk '{print $2}' | sed s,[\"\,],,g),g config.properties
source ./serverless-shell-2.2/utils.sh
sshell ls # check that everything works
Presentation of sshell
at Middleware'21:
https://www.youtube.com/watch?v=PFodHPlF0wE
Slides of the video:
https://docs.google.com/presentation/d/1I54PuDrdlzGc7y5FRAFknHAtxuM8lJwHUQOfBE5VzQY/edit?usp=sharing
The syntax of the serverless shell is straightforward.
Use sshell
followed by the command(s) to execute.
It is also possible to pass a script file with sshell -f
.
Several examples are provided in the examples
directory.
In particular, some unit tests can be run under examples/test.sh
.
More involved examples are available in examples/commoncrawl.sh
.
These examples cover the use of the Common Crawl data set.
For instance, the average
function in commoncrawl.sh
evaluates the average web page size over a chunk of this data set.
It is possible to mount a distributed file system in sshell
.
To date, only AWS EFS is supported.
To mount it, the user has to fill the fields aws.efs.accesspointid
and aws.efs.localmountpath
in config.properties
.
The script deploy.sh
checks if such fields are present, and if so, it mounts the distributed file system where appropriate.
Side effects when executing sshell
are possible using a shared objects layer.
To access this feature, it is necessary to launch one (or more) DSO servers, as explained in the installation guide.
Once this is done, indicate the entry point of the DSO layer in config.properties
with variable crucial.server
(whose default value is IP:PORT
).
Then, re-install the serverless shell (deploy.sh -delete; deploy.sh -create
).
To check that everything works fine, you may run examples/tests-stateful.sh
.
The syntax to access a data type is of the form type -n name operation
, where type
is the data type and name
its storage key in the DSO layer.
For instance, counter -n my_counter -1 1
increment by 1 the counter named my_counter
.
The data types currently available are listed under serverless-shell/src/main/bin/aliases.sh
.