Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][DOCS] Document 'without' value for HADOOP_VERSION in pip installation #30436

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions python/docs/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ If you want to install extra dependencies for a specific componenet, you can ins

pip install pyspark[sql]

For PySpark with a different Hadoop version, you can install it by using ``HADOOP_VERSION`` environment variables as below:
For PySpark with/without a specific Hadoop version, you can install it by using ``HADOOP_VERSION`` environment variables as below:

.. code-block:: bash

Expand All @@ -68,8 +68,13 @@ It is recommended to use ``-v`` option in ``pip`` to track the installation and

HADOOP_VERSION=2.7 pip install pyspark -v

Supported versions of Hadoop are ``HADOOP_VERSION=2.7`` and ``HADOOP_VERSION=3.2`` (default).
Note that this installation of PySpark with a different version of Hadoop is experimental. It can change or be removed between minor releases.
Supported values in ``HADOOP_VERSION`` are:

- ``without``: Spark pre-built with user-provided Apache Hadoop
- ``2.7``: Spark pre-built for Apache Hadoop 2.7
- ``3.2``: Spark pre-built for Apache Hadoop 3.2 and later (default)
Comment on lines +73 to +75
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!


Note that this installation way of PySpark with/without a specific Hadoop version is experimental. It can change or be removed between minor releases.


Using Conda
Expand Down