Debian packaging for Apache Storm distributed realtime computation system.
The goal of this project is to provide a flexible tool to build a debian package, that follows debian standards and uses default configs, supplied with storm release. Packaged storm can be used as easy as storm-zip unpacked elsewhere, and, at the same time, provides a flexibility to configure it for reliable and convenient long-term high-load production use.
Storm provides several services (nimbus, supervisor, drpc, ...). This project provides separate packages for each service with corresponding systemd unit files. According to debian paradigm, the systemd services should be enabled and start right after the package installation.
Packages for following services are provided:
storm-drpc
storm-logviewer
storm-nimbus
storm-supervisor
storm-ui
Also there is storm-common
package, that is a dependency for the service-packages.
There is also storm
package that installs and starts all provided services, which may be useful on single-node setup.
Previously init scripts, upstart conf and runit files were provided. Now only systemd is supported. See History section below for details.
See ./STORM_VERSION
file for supported storm version.
If you just need a storm package, there are some in Releases.
- Compatibility
- Building a package
- Using a package
- Details
- Dependencies and Requirements
- License
- Links
- These packages are meant to be used with Debian Jessie. Presumably they can be ran on any other debian-based distribution that supports systemd.
- There are previous versions (up to 0.9.1) built with FPM here. See also tags/branches and forks for different versions of storm.
- Clone the repository.
- Edit the
apache-storm/debian/changelog
to set packaging version/maintainer to your preferred values, so you get contacted if other people will use the package compiled by you. - Make sure you have desired version specified in
./STORM_VERSION
file and inapache-storm/debian/changelog
.
In case you don't have debian running locally, docker container can be used.
For that docker
and make
should be installed in your system.
Run make docker_package
, and the packages going to be built.
- Install necessary dependencies (see
Dockerfile
orbuild.sh
). - Call
make orig
, this will download and prepare the upstream tarball. In case you want to buildSNAPSHOT
or modified storm version - follow the instructions in next paragraph. - Run the
build.sh
. It will go to the nestedapache-storm
folder, that containsdebian
and execute the command to build package. The packages will be created in project root folder. - [Optional] After you have built a package and want to take a look at its content, run the next command to display package layout. Pass-in your package name and version:
# example for storm-common dpkg -c ./storm-common_*.deb
- [Optional] Cleanup the file tree.
ch ./apache-storm dpkg-buildpackage -rfakeroot -Tclean
Vagrant can be used to automatically provision the machine to build the script.
# prepare upstream tarball
make orig
# prepare and enter vm (debian)
vagrant up debian
vagrant ssh debian
# to build in ubuntu use `vagrant up ubuntu && vagrant ssh ubuntu`
cd /vagrant
# then run
sudo ./build.sh
Probably the other debian-based distribution can be used as well. See ./Vagrantfile
.
Follow instructions in storm/DEVELOPER.md to create a storm distribution.
# First, build the code.
# You may skip tests with `-DskipTests=true` to save time
$ mvn clean install
# Create the binary distribution.
$ cd storm-dist/binary && mvn package
Then convert storm-dist/binary/target/apache-storm-<version>.zip
to *.tar.gz
and move it to downloads
. Update the STORM_VERSION
and debian/changelog
files. Then proceed like with provided upstream tarball, as described above.
According to storm 1.2.2 guide you have to have following things installed:
- Java 7+ (Apache Storm 1.x is tested through travis ci against both java 7 and java 8 JDKs)
- Python 2.6.6 (Python 3.x should work too, but is not tested as part of our CI enviornment)
During the installation storm package also creates and/or enables existing storm user.
-
After you install desired packages, e.g. supervisor and logviewer, edit the
/etc/storm/storm.yaml
to specify nimbus and zookeeper path. -
Note the services are enabled and started automatically after installation since it is common practice in debian and also OK on updates. You may want to stop them to do some initial configuration:
systemctl status storm-nimbus systemctl stop storm-nimbus
NOTE: the autorestart is configured in
*.service
unit file. When crashed or killed, the services are going to be started again by systemd! (Earlier that was done withrunit
). -
Configure storm the way you need using
/etc/storm/storm_env.ini
. Hint: use software configuraiton management tools. -
Set limits in /etc/security/limits.conf. Earlier it was set in
/etc/default/storm
.# /etc/security/limits.conf # ... #<domain> <type> <item> <value> storm hard nofile 15000
-
(On the first installation) remember to start the services after the configuration was done.
systemdctl start storm-nimbus # and also other services
At some point, it is a good idea to use software configuration management tools to manage configuration of storm clusters. Checkout saltstack, chef, puppet, ansible.
Basically there are 2 folders (except configs, logs and init scripts):
$STORM_HOME
- created by package, stores all the libs and storm executables inlib
andbin
subfoldersstorm.local.dir
- should be created by user and mentioned in storm.yaml, by default§STORM_HOME/storm-local
is used.
Checking the history of this fpm-project, initially $STORM_HOME
was /opt/storm
.
Then some of the forks used /usr/lib/storm
,
then original maintaner used /var/lib/storm
,
and another forks moved to use /opt/storm
...
So, there was a bit of a chaos (in 2014 :-).
Storm distribution deviate from debian packaging conventions and does not separate libs from executables.
So all the stuff that has to do something with storm goes to one $STORM_HOME
folder.
The dilemma is how to organize a package, due to different perception by admins and (storm) developers:
| | ADMINS (Debian) | DEVELOPERS
-----------------------------------------------------------------------------------------------
| Binary files | /usr/bin/* | $STORM_HOME/bin/*
| Librariers | /usr/lib/storm | $STORM_HOME/lib/*
| Configs | /etc/storm/ | $STORM_HOME/conf/*
| Logback config | /etc/storm/logback.xml | $STORM_HOME/logback/cluster.xml
| Logs | /var/log/storm | $STORM_HOME/logs/*
| storm.local.dir | /var/lib/storm/* | ? (e.g. /mnt/storm, see Links)
| Supervisors (systemd) | /lib/systemd/system/storm-*.service | N/A
Also, there are 2 concepts: the software could be either packaged or not-packaged.
There is also Filesystem Hierarchy Standard aka FHS
(here), that says /opt
is for programs that are not packaged and don't follow the standards.
You'd just put all the libraries there together with the program.
That is the case when you want to install something directly from archive: unpack the archive into /opt
and start thinking about the service management ;-).
So... using the configuration files in this repository the storm becomes packaged and starts to follow FHS. This is achieved by giving symlinks.
See below how $STORM_HOME
folder looks like:
drwxr-xr-x 2 root root 4096 Jul 24 15:00 bin
-rw-r--r-- 1 root root 34239 Jun 12 22:46 CHANGELOG.md
lrwxrwxrwx 1 root root 10 Jul 24 14:39 conf -> /etc/storm
-rw-r--r-- 1 root root 538 Mar 13 00:17 DISCLAIMER
drwxr-xr-x 2 root root 4096 Jul 24 15:00 lib
-rw-r--r-- 1 root root 22822 Jun 11 18:07 LICENSE
lrwxrwxrwx 1 root root 10 Jul 24 14:39 logback -> /etc/storm
lrwxrwxrwx 1 root root 14 Jul 24 14:39 logs -> /var/log/storm
-rw-r--r-- 1 root root 981 Jun 10 15:10 NOTICE
drwxr-xr-x 5 root root 4096 Jul 24 15:00 public
-rw-r--r-- 1 root root 7445 Jun 9 16:24 README.markdown
-rw-r--r-- 1 root root 17 Jun 16 14:22 RELEASE
-rw-r--r-- 1 root root 3581 May 29 14:20 SECURITY.md
lrwxrwxrwx 1 root root 14 Jul 24 15:37 storm-local -> /var/lib/storm
var/log/storm
and /var/lib/storm
are owned by storm user, so processes that
are also running under storm user can write state and logs.
Also /usr/bin/storm
points to /usr/lib/storm/bin/storm
and after the installation storm
is accessible from command line.
This gives a precise control on configurations, log files and binaries following FHS. Also such a schema satisfies both developers and admins paradigms.
By default storm shipped pre-configured to log into ${storm.home}/logs/
This configuration is done in logback.xml
.
Because ${STORM_HOME}/logs/
are symlinked to /var/log/storm
they end up where expected by admins.
Provisioning script bootstrap.sh
installs all needed dependencies for Debian-based distribution to build a package.
Same script is used to provision Vagrant environment.
I have previously used FPM to build storm 0.8 till 0.9.1. But it was hard to maintain and also messy, while there were only potential benefits to parametrize build for ubuntu (upstart) and theoretically rpm.
Also, before 0.9.1 building storm involved building zmq and jzmq packages. That was a pain, details here. Now these dependencies are long gone and storm flies with netty by default.
In recent years all the major distributions moved avay from using SysVInit system, and started using systemd. So did this project.
Upstart was supported at some point, but now (2018) ubuntu defaults to systemd for some time.
runit was supported at some point (in 2014), but now the autorestart is managed by systemd out of the box, and runit is not supported anymore.
Apache License 2.0, same as for Apache Storm project.
You may be interested to keep in mind next projects.
- Storm framework for Mesos with Debian packaging
- Wirbelsturm - a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data related infrastructure.
- storm-deploy
- Tutorial how to install storm on .rpm based distibution - Running multi-node storm cluster by Michael Noll
- Forks of storm-deb-packaging scripts that use FPM
Also, interesting materials related to this repository.
- according to this discussion debian package should not remove any users on removal. Recommended behaviour is disabling a user.
- This is a good answer "where should software be installed".