apache · tqchen · Mar 30, 2020 · Mar 29, 2020 · Mar 29, 2020 · Mar 29, 2020
diff --git a/docker/README.md b/docker/README.md
@@ -40,8 +40,8 @@ The helper bash script can be useful to build demo sessions.
 
 ## Prebuilt Docker Images
 
-We provide several pre-built images for doing quick exploration with TVM installed.
-For example, you can run the following command to get ```tvmai/demo-cpu``` image.
+You can use third party pre-built images for doing quick exploration with TVM installed.
+For example, you can run the following command to launch ```tvmai/demo-cpu``` image.
 
 ```bash
 /path/to/tvm/docker/bash.sh tvmai/demo-cpu
@@ -52,7 +52,8 @@ Then inside the docker container, you can type the following command to start th
 jupyter notebook
 ```
 
-Check out https://hub.docker.com/r/tvmai/ to get the full list of available prebuilt images.
+You can find some un-official prebuilt images in https://hub.docker.com/r/tvmai/ .
+Note that these are convenience images and are not part of the ASF release.
 
 
 ## Use Local Build Script

diff --git a/docs/contribute/document.rst b/docs/contribute/document.rst
@@ -103,3 +103,17 @@ The tutorial code will run on our build server to generate the document page.
 So we may have a restriction like not being able to access a remote Raspberry Pi,
 in such case add a flag variable to the tutorial (e.g. `use_rasp`) and allow users to easily switch to the real device by changing one flag.
 Then use the existing environment to demonstrate the usage.
+
+
+Refer to Another Location in the Document
+-----------------------------------------
+Please use sphinx's `:ref:` markup to refer to another location in the same doc.
+
+.. code-block:: rst
+
+   .. _document-my-section-tag
+
+   My Section
+   ----------
+
+   You can use :ref:`document-my-section-tag` to refer to My Section.
diff --git a/docs/dev/codebase_walkthrough.rst b/docs/dev/codebase_walkthrough.rst
@@ -46,7 +46,7 @@ One of the interesting aspects of the TVM codebase is that interoperability betw
 Vector Add Example
 *******************************************
 
-We use a simple example that uses the low level TVM API directly. The example is vector addition, which is covered in detail in `this tutorial <https://docs.tvm.ai/tutorials/get_started.html#sphx-glr-tutorials-get-started-py>`_.
+We use a simple example that uses the low level TVM API directly. The example is vector addition, which is covered in detail in :ref:`tutorial-tensor-expr-get-started`
 
 ::
 
@@ -66,9 +66,9 @@ Here, types of ``A``, ``B``, ``C`` are ``tvm.tensor.Tensor``, defined in ``pytho
        def __call__(self, *indices):
           ...
 
-The object protocol is the basis of exposing C++ types to frontend languages, including Python. The way TVM implements Python wrapping is not straightforward. It is briefly covered in `this document <https://docs.tvm.ai/dev/runtime.html#tvm-node-and-compiler-stack>`_, and details are in ``python/tvm/_ffi/`` if you are interested.
+The object protocol is the basis of exposing C++ types to frontend languages, including Python. The way TVM implements Python wrapping is not straightforward. It is briefly covered in :ref:`tvm-runtime-system`, and details are in ``python/tvm/_ffi/`` if you are interested.
 
-We use the ``TVM_REGISTER_*`` macro to expose C++ functions to frontend languages, in the form of a `PackedFunc <https://docs.tvm.ai/dev/runtime.html#packedfunc>`_. A ``PackedFunc`` is another mechanism by which TVM implements interoperability between C++ and Python. In particular, this is what makes calling Python functions from the C++ codebase very easy.
+We use the ``TVM_REGISTER_*`` macro to expose C++ functions to frontend languages, in the form of a :ref:`tvm-runtime-system-packed-func`. A ``PackedFunc`` is another mechanism by which TVM implements interoperability between C++ and Python. In particular, this is what makes calling Python functions from the C++ codebase very easy.
 You can also checkout `FFI Navigator <https://github.com/tqchen/ffi-navigator>`_ which allows you to navigate between python and c++ FFI calls.
 
 A ``Tensor`` object has an ``Operation`` object associated with it, defined in ``python/tvm/te/tensor.py``, ``include/tvm/te/operation.h``, and ``src/tvm/te/operation`` subdirectory. A ``Tensor`` is an output of its ``Operation`` object. Each ``Operation`` object has in turn ``input_tensors()`` method, which returns a list of input ``Tensor`` to it. This way we can keep track of dependencies between ``Operation``.
@@ -121,9 +121,7 @@ Lowering is done by ``tvm.lower()`` function, defined in ``python/tvm/build_modu
       stmt = schedule.ScheduleOps(sch, bounds)
       ...
 
-Bound inference is the process where all loop bounds and sizes of intermediate buffers are inferred. If you target the CUDA backend and you use shared memory, its required minimum size is automatically determined here. Bound inference is implemented in ``src/te/schedule/bound.cc``, ``src/te/schedule/graph.cc`` and ``src/te/schedule/message_passing.cc``. For more information on how bound inference works, see `InferBound Pass`_.
-
-.. _InferBound Pass: http://docs.tvm.ai/dev/inferbound.html
+Bound inference is the process where all loop bounds and sizes of intermediate buffers are inferred. If you target the CUDA backend and you use shared memory, its required minimum size is automatically determined here. Bound inference is implemented in ``src/te/schedule/bound.cc``, ``src/te/schedule/graph.cc`` and ``src/te/schedule/message_passing.cc``. For more information on how bound inference works, see :ref:`dev-InferBound-Pass`.
 
 
 ``stmt``, which is the output of ``ScheduleOps()``, represents an initial loop nest structure. If you have applied ``reorder`` or ``split`` primitives to your schedule, then the initial loop nest already reflects those changes. ``ScheduleOps()`` is defined in ``src/te/schedule/schedule_ops.cc``.

diff --git a/docs/dev/inferbound.rst b/docs/dev/inferbound.rst
@@ -15,10 +15,13 @@
     specific language governing permissions and limitations
     under the License.
 
+.. _dev-InferBound-Pass:
+
 *******************************************
 InferBound Pass
 *******************************************
 
+
 The InferBound pass is run after normalize, and before ScheduleOps `build_module.py <https://github.com/apache/incubator-tvm/blob/master/python/tvm/build_module.py>`_. The main job of InferBound is to create the bounds map, which specifies a Range for each IterVar in the program. These bounds are then passed to ScheduleOps, where they are used to set the extents of For loops, see `MakeLoopNest <https://github.com/apache/incubator-tvm/blob/master/src/op/op_util.cc>`_, and to set the sizes of allocated buffers (`BuildRealize <https://github.com/apache/incubator-tvm/blob/master/src/op/compute_op.cc>`_), among other uses.
 
 The output of InferBound is a map from IterVar to Range:
@@ -83,14 +86,14 @@ A TVM schedule is composed of Stages. Each stage has exactly one Operation, e.g.
    		Array<IterVarRelation> relations;
    		// remainder omitted
    	};
-   	
+
    	class OperationNode : public Node {
    	public:
    		virtual Array<IterVar> root_iter_vars();
    		virtual Array<Tensor> InputTensors();
    		// remainder omitted
    	};
-   	
+
    	class ComputeOpNode : public OperationNode {
    	public:
    		Array<IterVar> axis;

diff --git a/docs/dev/relay_pass_infra.rst b/docs/dev/relay_pass_infra.rst
@@ -169,12 +169,12 @@ subclasses at the level of modules, functions, or sequences of passes..
 
     class PassNode : RelayNode {
       virtual PassInfo Info() const = 0;
-      virtual Module operator()(const Module& mod
+      virtual Module operator()(const IRModule& mod
                                 const PassContext& pass_ctx) const = 0;
     };
 
-The functor shows how a pass must be realized, i.e. it always works on a `Relay
-module`_ under a certain context. All passes are designed in a ``Module`` to ``Module``
+The functor shows how a pass must be realized, i.e. it always works on a
+:py:class:`IRModule` under a certain context. All passes are designed in a ``Module`` to ``Module``
 manner. Therefore, optimizations governed by the pass infra will
 always update the whole module.
 
@@ -649,8 +649,6 @@ For more pass infra related examples in Python and C++, please refer to
 
 .. _Block: https://mxnet.incubator.apache.org/api/python/docs/api/gluon/block.html#gluon-block
 
-.. _Relay module: https://docs.tvm.ai/langref/relay_expr.html#module-and-global-functions
-
 .. _include/tvm/ir/transform.h: https://github.com/apache/incubator-tvm/blob/master/include/tvm/ir/transform.h
 
 .. _src/relay/ir/transform.cc: https://github.com/apache/incubator-tvm/blob/master/src/relay/ir/transform.cc
@@ -665,4 +663,4 @@ For more pass infra related examples in Python and C++, please refer to
 
 .. _tests/cpp/relay_transform_sequential.cc: https://github.com/apache/incubator-tvm/blob/master/tests/cpp/relay_transform_sequential.cc
 
-.. _include/tvm/relay/transform.h: https://github.com/apache/incubator-tvm/blob/master/include/tvm/relay/transform.h
+.. _include/tvm/relay/transform.h: https://github.com/apache/incubator-tvm/blob/master/include/tvm/relay/transform.h
diff --git a/docs/dev/runtime.rst b/docs/dev/runtime.rst
@@ -37,6 +37,8 @@ We need to satisfy quite a few interesting requirements:
 We want to be able to define a function from any language and call from another.
 We also want the runtime core to be minimal to deploy to embedded devices.
 
+.. _tvm-runtime-system-packed-func:
+
 PackedFunc
 ----------
 
@@ -176,9 +178,8 @@ Under the hood, we have an RPCModule that serializes the arguments to do the dat
 
 The RPC server itself is minimum and can be bundled into the runtime. We can start a minimum TVM
 RPC server on iPhone/android/raspberry pi or even the browser. The cross compilation on server and shipping of the module for testing can be done in the same script. Checkout
-`Cross compilation and RPC tutorial`_ for more details.
+:ref:`tutorial-cross-compilation-and-rpc` for more details.
 
-.. _Cross compilation and RPC tutorial: https://docs.tvm.ai/tutorials/cross_compilation_and_rpc.html#sphx-glr-tutorials-cross-compilation-and-rpc-py
 
 This instant feedback gives us a lot of advantages. For example, to test the correctness of generated code on iPhone, we no longer have to write test-cases in swift/objective-c from scratch -- We can use RPC to execute on iPhone, copy the result back and do verification on the host via numpy. We can also do the profiling using the same script.
 

diff --git a/docs/install/docker.rst b/docs/install/docker.rst
@@ -19,28 +19,26 @@
 
 Docker Images
 =============
-We provide several prebuilt docker images to quickly try out TVM.
-These images are also helpful run through TVM demo and tutorials.
-You can get the docker images via the following steps.
+We provide docker utility scripts to help developers to setup development environment.
+They are also helpful run through TVM demo and tutorials.
 We need `docker <https://docs.docker.com/engine/installation/>`_ and
 `nvidia-docker <https://github.com/NVIDIA/nvidia-docker/>`_ if we want to use cuda.
 
-First, clone TVM repo to get the auxiliary scripts
+Get a tvm source distribution or clone the github repo to get the auxiliary scripts
 
 .. code:: bash
 
     git clone --recursive https://github.com/apache/incubator-tvm tvm
 
 
-We can then use the following command to launch a `tvmai/demo-cpu` image.
+We can then use the following command to launch a docker image.
 
 .. code:: bash
 
-    /path/to/tvm/docker/bash.sh tvmai/demo-cpu
-
-You can also change `demo-cpu` to `demo-gpu` to get a CUDA enabled image.
-You can find all the prebuilt images in `<https://hub.docker.com/r/tvmai/>`_
+    /path/to/tvm/docker/bash.sh <image-name>
 
+Here the image-name can be a local docker image name, e.g. `tvm.ci_cpu` after you have done
+the local build. Or a pre-built third party image (`tvmai/demo-cpu` or `tvmai/ci-gpu`).
 
 This auxiliary script does the following things:
 
@@ -67,7 +65,10 @@ Note that on macOS, because we use bridge network, jupyter notebook will be repo
 at an URL like ``http://{container_hostname}:8888/?token=...``. You should replace the ``container_hostname``
 with ``localhost`` when pasting it into browser.
 
+You can find some un-official prebuilt images in `<https://hub.docker.com/r/tvmai/>`_.
+Note that these are convenience images and are not part of the ASF release.
+
 Docker Source
 -------------
-Check out `<https://github.com/apache/incubator-tvm/tree/master/docker>`_ if you are interested in
+Check out `The docker source <https://github.com/apache/incubator-tvm/tree/master/docker>`_ if you are interested in
 building your own docker images.
diff --git a/docs/install/from_source.rst b/docs/install/from_source.rst
@@ -25,7 +25,12 @@ scratch on various systems. It consists of two steps:
 1. First build the shared library from the C++ codes (`libtvm.so` for linux, `libtvm.dylib` for macOS and `libtvm.dll` for windows).
 2. Setup for the language packages (e.g. Python Package).
 
-To get started, clone TVM repo from github. It is important to clone the submodules along, with ``--recursive`` option.
+To get started, download tvm source code from the `Download Page <https://tvm.apache.org/download>`_.
+
+Developers: Get Source from Github
+----------------------------------
+You can also choose to clone the source repo from github.
+It is important to clone the submodules along, with ``--recursive`` option.
 
 .. code:: bash
 

diff --git a/docs/vta/index.rst b/docs/vta/index.rst
@@ -22,7 +22,7 @@ VTA: Deep Learning Accelerator Stack
 
 The Versatile Tensor Accelerator (VTA) is an open, generic, and customizable deep learning accelerator with a complete TVM-based compiler stack. We designed VTA to expose the most salient and common characteristics of mainstream deep learning accelerators. Together TVM and VTA form an end-to-end hardware-software deep learning system stack that includes hardware design, drivers, a JIT runtime, and an optimizing compiler stack based on TVM.
 
-.. image:: http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png
+.. image:: http://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png
    :align: center
    :width: 60%
 

diff --git a/tests/scripts/task_python_docs.sh b/tests/scripts/task_python_docs.sh
@@ -35,7 +35,7 @@ find . -type f -path "*.log" | xargs rm -f
 # C++ doc
 make doc
 rm -f docs/doxygen/html/*.map docs/doxygen/html/*.md5
-mv docs/doxygen docs/_build/html/doxygen
+mv docs/doxygen/html docs/_build/html/doxygen
 
 # JS doc
 jsdoc -c web/.jsdoc_conf.json web/tvm_runtime.js web/README.md

diff --git a/tutorials/dev/relay_pass_infra.py b/tutorials/dev/relay_pass_infra.py
@@ -27,24 +27,15 @@
 introduced an infrastructure to manage the optimization passes.
 
 The optimizations of a Relay program could be applied at various granularity,
-namely function-level and module-level using `FunctionPass`_ and `ModulePass`_
-respectively. Or users can rely on `Sequential`_ to apply a sequence of passes
+namely function-level and module-level using :py:class:`tvm.relay.transform.FunctionPass`
+and py:class:`tvm.relay.transform.ModulePass`
+respectively. Or users can rely on py:class:`tvm.relay.transform.Sequential` to apply a sequence of passes
 on a Relay program where the dependencies between passes can be resolved by the
 pass infra. For more details about each type of these passes, please refer to
-the `pass infra doc`_.
+the :ref:`relay-pass-infra`
 
 This tutorial demostrates how developers can use the Relay pass infra to perform
 a certain optimization and create an optimization pipeline.
-
-.. _FunctionPass: https://docs.tvm.ai/api/python/relay/transform.html#tvm.relay.transform.FunctionPass
-
-.. _ModulePass: https://docs.tvm.ai/api/python/relay/transform.html#tvm.relay.transform.ModulePass
-
-.. _Sequential: https://docs.tvm.ai/api/python/relay/transform.html#tvm.relay.transform.Sequential
-
-.. _pass infra doc: https://docs.tvm.ai/dev/relay_pass_infra.html
-
-.. _ToANormalForm: https://docs.tvm.ai/api/python/relay/transform.html#tvm.relay.transform.ToANormalForm
 """
 
 import numpy as np
@@ -130,27 +121,27 @@ def alter_conv2d(attrs, inputs, tinfos, out_type):
 print(mod)
 
 ###############################################################################
-# Use `Sequential`_ to Apply a Sequence of Passes
-# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# Use Sequential to Apply a Sequence of Passes
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 # Applying passes as above is actually tedious and it may require users to have
 # better understanding about the dependencies between them. For example, fusion
 # currently doesn't work well on let bindings. Therefore, we would not be able
-# to fuse operators that were fusable if `ToANormalForm`_ is applied before
+# to fuse operators that were fusable if :py:func:`relay.transform.ToANormalForm` is applied before
 # fusion, as this pass generates let bindings for each expression to
 # canonicalize a Relay program.
 #
-# Relay, hence, provides `Sequential`_ to alleviate developers from handling
+# Relay, hence, provides :py:class:`tvm.relay.transform.Sequential` to alleviate developers from handling
 # these issues explicitly by specifying the required passes of each pass and
 # packing them as a whole to execute. For example, the same passes can now be
-# applied using the sequential style as the following. `Sequential`_ is
+# applied using the sequential style as the following. :py:class:`tvm.relay.transform.Sequential` is
 # similiar to `torch.nn.sequential <https://pytorch.org/docs/stable/nn.html#torch.nn.Sequential>`_
 # and `mxnet.gluon.block <https://mxnet.incubator.apache.org/api/python/docs/_modules/mxnet/gluon/block.html>`_.
 # For example, `torch.nn.sequential` is used to contain a sequence of PyTorch
 # `Modules` that will be added to build a network. It focuses on the network
-# layers. Instead, the `Sequential`_ in our pass infra works on the optimizing
+# layers. Instead, the :py:class:`tvm.relay.transform.Sequential` in our pass infra works on the optimizing
 # pass.
 
-# Now let's execute some passes through `Sequential`_
+# Now let's execute some passes through :py:class:`tvm.relay.transform.Sequential`
 f = example()
 mod = tvm.IRModule.from_expr(f)
 # Glob the interested passes.
@@ -165,7 +156,8 @@ def alter_conv2d(attrs, inputs, tinfos, out_type):
 # identical addition operations. This is because `EliminateCommonSubexpr`
 # was not actually performed. The reason is because only the passes that have
 # optimization level less or equal to 2 will be executed by default under
-# `Sequential`_. The pass infra, however, provides a configuration interface
+# :py:class:`tvm.relay.transform.Sequential`. The pass infra,
+# however, provides a configuration interface
 # for users to customize the optimization level that they want to execute.
 
 with relay.build_config(opt_level=3):

diff --git a/tutorials/language/tensorize.py b/tutorials/language/tensorize.py
@@ -304,7 +304,7 @@ def _reduce_update():
 # For example, INT8 quantization on Intel CPUs uses tensorization
 # to invoke AVX instruction directly.
 # It also enables TVM to compile to ASICs -
-# checkout `VTA <https://docs.tvm.ai/vta/index.html>`_ for details.
+# checkout :ref:`vta-index` for details.
 # We also demonstrates how to use inline assembly importing,
 # which helps users inject asm easily into the schedule.
 #
diff --git a/tutorials/tensor_expr_get_started.py b/tutorials/tensor_expr_get_started.py
@@ -15,6 +15,8 @@
 # specific language governing permissions and limitations
 # under the License.
 """
+.. _tutorial-tensor-expr-get-started:
+
 Get Started with Tensor Expression
 ==================================
 **Author**: `Tianqi Chen <https://tqchen.github.io>`_

diff --git a/vta/tutorials/autotvm/tune_relay_vta.py b/vta/tutorials/autotvm/tune_relay_vta.py
@@ -141,7 +141,7 @@ def compile_network(env, target, model, start_pack, stop_pack):
 # Now we can register our devices to the tracker. The first step is to
 # build the TVM runtime for the Pynq devices.
 #
-# Follow `this section <https://docs.tvm.ai/vta/install.html#pynq-side-rpc-server-build-deployment>`_
+# Follow :ref:`vta-index`
 # to build the TVM runtime on the device. Then register the device to the tracker with:
 #
 # .. code-block:: bash

diff --git a/vta/tutorials/frontend/deploy_classification.py b/vta/tutorials/frontend/deploy_classification.py
@@ -245,7 +245,7 @@
 m.set_input('data', image)
 
 # Perform inference and gather execution statistics
-# More on: https://docs.tvm.ai/api/python/module.html#tvm.runtime.Module.time_evaluator
+# More on: :py:method:`tvm.runtime.Module.time_evaluator`
 num = 4 # number of times we run module for a single measurement
 rep = 3 # number of measurements (we derive std dev from this)
 timer = m.module.time_evaluator("run", ctx, number=num, repeat=rep)

diff --git a/vta/tutorials/frontend/deploy_detection.py b/vta/tutorials/frontend/deploy_detection.py
@@ -270,7 +270,7 @@
 m.set_input(**params)
 
 # Perform inference and gather execution statistics
-# More on: https://docs.tvm.ai/api/python/module.html#tvm.runtime.Module.time_evaluator
+# More on: :py:method:`tvm.runtime.Module.time_evaluator`
 num = 4 # number of times we run module for a single measurement
 rep = 3 # number of measurements (we derive std dev from this)
 timer = m.module.time_evaluator("run", ctx, number=num, repeat=rep)