Skip to content

Commit

Permalink
Document ways for starting Luigi inside Python code (#2301)
Browse files Browse the repository at this point in the history
1. Rename command_line file to running_luigi
2. Add description how to start luigi tasks using luigi.build function from luigi.interface module
  • Loading branch information
nryanov authored and Tarrasch committed Feb 26, 2018
1 parent c8f4497 commit 037eb71
Show file tree
Hide file tree
Showing 3 changed files with 110 additions and 39 deletions.
38 changes: 0 additions & 38 deletions doc/command_line.rst

This file was deleted.

2 changes: 1 addition & 1 deletion doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Table of Contents
workflows.rst
tasks.rst
parameters.rst
command_line.rst
running_luigi.rst
central_scheduler.rst
execution_model.rst
luigi_patterns.rst
Expand Down
109 changes: 109 additions & 0 deletions doc/running_luigi.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
.. _RunningLuigi:

Running from the Command Line
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The prefered way to run Luigi tasks is through the ``luigi`` command line tool
that will be installed with the pip package.

.. code-block:: python
# my_module.py, available in your sys.path
import luigi
class MyTask(luigi.Task):
x = luigi.IntParameter()
y = luigi.IntParameter(default=45)
def run(self):
print self.x + self.y
Should be run like this

.. code-block:: console
$ luigi --module my_module MyTask --x 123 --y 456 --local-scheduler
Or alternatively like this:

.. code-block:: console
$ python -m luigi --module my_module MyTask --x 100 --local-scheduler
Note that if a parameter name contains '_', it should be replaced by '-'.
For example, if MyTask had a parameter called 'my_parameter':

.. code-block:: console
$ luigi --module my_module MyTask --my-parameter 100 --local-scheduler
Running from Python code
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Another way to start tasks from Python code is using ``luigi.build(tasks, worker_scheduler_factory=None, **env_params)``
from ``luigi.interface`` module.

This way of running luigi tasks is useful if you want to get some dynamic parameters from another
source, such as database, or provide additional logic before you start tasks.

One notable difference is that ``build`` defaults to not using the identical process lock.
If you want to change this behaviour, just pass ``no_lock=False``.


.. code-block:: python
class MyTask1(luigi.Task):
x = luigi.IntParameter()
y = luigi.IntParameter(default=0)
def run(self):
print self.x + self.y
class MyTask2(luigi.Task):
x = luigi.IntParameter()
y = luigi.IntParameter(default=1)
z = luigi.IntParameter(default=2)
def run(self):
print self.x * self.y * self.z
if __name__ == '__main__':
luigi.build([MyTask1(x=10), MyTask2(x=15, z=3)])
Also, it is possible to pass additional parameters to ``build`` such as host, port, workers and local_scheduler:

.. code-block:: python
if __name__ == '__main__':
luigi.build([MyTask1(x=1)], worker=5)
To achieve some special requirements you can pass to ``build`` your ``worker_scheduler_factory``
which will return your worker and\or scheduler implementations:

.. code-block:: python
class MyWorker(Worker):
# some custom logic
class MyFactory(object):
def create_local_scheduler(self):
return scheduler.Scheduler(prune_on_get_work=True, record_task_history=False)
def create_remote_scheduler(self, url):
return rpc.RemoteScheduler(url)
def create_worker(self, scheduler, worker_processes, assistant=False):
# return your worker instance
return MyWorker(
scheduler=scheduler, worker_processes=worker_processes, assistant=assistant)
if __name__ == '__main__':
luigi.build([MyTask1(x=1), worker_scheduler_factory=MyFactory())
In some cases (like task queue) it may be useful.

0 comments on commit 037eb71

Please sign in to comment.