Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create nebula-plato.md #1055

Merged
merged 5 commits into from
Jan 19, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs-2.0/20.appendix/6.eco-tool-version.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,14 @@ Nebula Algorithm (Algorithm for short) is a Spark application based on [GraphX](
|:---|:---|
| {{ nebula.release }} | {{algorithm.release}}(2c61ca5) |

## Nebula Plato

Nebula Plato is an application that integrates the open-source Plato Graph Computing Framework, with which Nebula Plato performs graph computations on Nebula Graph database data. For details, see [What is Nebula Plato](../nebula-plato.md).

|Nebula Graph version|Plato version(commit id)|
|:---|:---|
| {{ nebula.release }} | {{plato.release}}(d895634) |

## Nebula Console

Nebula Console is the native CLI client of Nebula Graph. For how to use it, see [Connect Nebula Graph](../2.quick-start/3.connect-to-nebula-graph.md).
Expand Down
231 changes: 231 additions & 0 deletions docs-2.0/nebula-plato.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# Nebula Plato

[Nebula Plato](https://github.com/vesoft-inc/nebula-algorithm) is an application that integrates the open-source Plato Graph Computing Framework, with which Nebula Plato performs graph computations on Nebula Graph database data.

!!! enterpriseonly

Only available for the Nebula Graph Enterprise Edition.

## Scenarios

You can import data from data sources as Nebula Graph clusters, CSV files on HDFS, or local CSV files into Nebula Plato and export the graph computation results to Nebula Graph clusters, CSV files on HDFS, or local CSV files from Nebula Plato.


## Limitations

When you import Nebula Graph cluster data into Nebula Plato and export the graph computation results from Nebula Plato to a Nebula Graph cluster, the graph computation results can only be exported to the graph space where the data source is located.

## Version compatibility

The version correspondence between Nebula Plato and Nebula Graph is as follows.

|Nebula Plato|Nebula Graph|
|:---|:---|
|{{plato.release}}|{{nebula.release}}|

## Graph algorithms

Nebula Plato supports the following graph algorithms.

| Algorithm |Description |Category |
|:----------------------|:----------------|:-----------|
| APSP | All Pair Shortest Path | Path |
| SSSP | Single Source Shortest Path | Path |
| BFS | Breadth-first search | Path |
| PageRank | It is used to rank web pages. | Node importance measurement |
| KCore | k-Cores | Node importance measurement |
| DegreeCentrality | It is a simple count of the total number of connections linked to a vertex. | Node importance measurement |
| TriangleCount | It counts the number of triangles. | Graph feature |
| LPA | Label Propagation Algorithm | Community discovery |
| WCC | World Competitive Contests | Community discovery |
| LOUVAIN | It detects communities in large networks. | Community discovery |
| HANP | Hop attenuation & Node Preference | Community discovery |
| Clustering Coefficient| It is a measure of the degree to which nodes in a graph tend to cluster together. | Clustering |

## Install Nebula Plato

When installing a cluster of multiple Nebula Plato on multiple nodes, you need to install Nebula Plato to the same path and set up SSH-free login between nodes.

### Install Nebula Plato with RPM packages

```bash
sudo rpm -i nebula-plato-1.0-centos.x86_64.rpm --prefix /home/xxx/nebula-plato
```

### Install Nebula Plato with the source code

The preparations for compiling Nebula Plato are similar to compiling Nebula Graph. For details, see [Resource preparations](4.deployment-and-installation/1.resource-preparations.md).

1. Clone the `plato` repository.

```bash
$ git clone -b {{plato.branch}} https://github.com/vesoft-inc/plato.t
cooper-lzy marked this conversation as resolved.
Show resolved Hide resolved
```

2. Access the `plato` directory.

```bash
$ cd plato
```

3. Execute the following script to install compile dependencies.

```bash
$ sudo ./docker/install-dependencies.sh
```

4. Download a static library and compile it.

```bash
$ ./3rdtools.sh distclean && ./3rdtools.sh install
```

5. Compile Nebula Plato.

```bash
$ ./build.sh
```

## How to use Nebula Plato

After installation, you can set parameters of different algorithms and then execute a script to obtain the results of the algorithms and export them to the specified format.

1. Select one node from the Nebula Plato cluster and then access the `scripts` directory.

```bash
$ cd scripts
```

2. Confirm the data source and export path. Configuration steps are as follows.

- Nebula Graph clusters as the data source

1. Modify the configuration file `nebula.conf` to configure the Nebula Graph cluster.

```bash
# The number of retries connecting to Nebula Graph.
--retry=3
# The name of the graph space where you read or write data.
--space=baskeyballplayer

# Read data from Nebula Graph.
# The metad process address.
--meta_server_addrs=192.168.8.100:9559, 192.168.8.101:9559, 192.168.8.102:9559
# The name of edges.
--edges=LIKES
# The name of the property to be read as the weight of the edge. Can be either the attribute name or _rank.
#--edge_data_fields
# The number of rows read per scan.
--read_batch_size=10000

# Write data to Nebula Graph.
# The graphd process address.
--graph_server_addrs=192.168.8.100:9669
# The account to log into Nebula Graph.
--user=root
# The password to log into Nebula Graph.
--password=nebula
# The pattern used to write data back to Nebula Graph: insert or update.
--mode=insert
# The tag name written back to Nebula Graph.
--tag=pagerank
# The property name corresponding to the tag.
--prop=pr
# The property type corresponding the the tag.
--type=double
# The number of rows per write.
--write_batch_size=1000
# The file path where the data failed to be written back to Nebula Graph is stored.
--err_file=/home/jie.wang/plato/err.txt
```

2. Modify the related parameters in the script to be used, such as `run_pagerank.sh`.

```bash
# The sum of the number of processes running on all machines in the cluster. It is recommended to be the number of machines or the number of nodes in the NUMA architecture.
WNUM=3
# The number of threads per process. It is recommended to set the maximum value to be the number of hardware threads of the machine.
WCORES=4
# The path to the data source.
# Set to read data from Nebula Graph via the nebula.conf file.
INPUT=${INPUT:="nebula:$PROJECT/scripts/nebula.conf"}
# Set to read data from the CSV files on HDFS or on local directories.
# #INPUT=${INPUT:="$PROJECT/data/graph/v100_e2150_ua_c3.csv"}

# The export path to the graph computation results.
# Data can be exported to a Nebula Graph. If the data source is also a Nebula Graph, the results will be exported to the graph space specified in nebula.conf.
OUTPUT=${OUTPUT:="nebula:$PROJECT/scripts/nebula.conf"}
# Data can also be exported to the CSV files on HDFS or on local directories.
# OUTPUT=${OUTPUT:='hdfs://192.168.8.100:9000/_test/output'}

# If the value is true, it is a directed graph, if false, it is an undirected graph.
IS_DIRECTED=${IS_DIRECTED:=true}
# Set whether to encode ID or not.
NEED_ENCODE=${NEED_ENCODE:=true}
# The ID type of the data source vertices. For example string, int32, and int64.
VTYPE=${VTYPE:=int32}
# Encoding type. The value distributed specifies the distributed vertex ID encoding. The value single specifies the single-machine vertex ID encoding.
ENCODER=${ENCODER:="distributed"}
# The parameter for the PageRank algorithm. Algorithms differ in parameters.
EPS=${EPS:=0.0001}
DAMPING=${DAMPING:=0.85}
# The number of iterations.
ITERATIONS=${ITERATIONS:=100}
```

- Local or HDFS CSV files as the data source

Modify parameters in the script to be used, such as `run_pagerank.sh`.

```bash
# The sum of the number of processes running on all machines in the cluster. It is recommended to be the number of machines or the number of nodes in the NUMA architecture.
WNUM=3
# The number of threads per process. It is recommended to set the maximum value to be the number of hardware threads of the machine.
WCORES=4
# The path to the data source.
# Set to read data from Nebula Graph via the nebula.conf file.
# INPUT=${INPUT:="nebula:$PROJECT/scripts/nebula.conf"}
# Set to read data from the CSV files on HDFS or on local directories.
INPUT=${INPUT:="$PROJECT/data/graph/v100_e2150_ua_c3.csv"}

# The export path to the graph computation results.
# Data can be exported to a Nebula Graph. If the data source is also a Nebula Graph, the results will be exported to the graph space specified in nebula.conf.
# OUTPUT=${OUTPUT:="nebula:$PROJECT/scripts/nebula.conf"}
# Data can also be exported to the CSV files on HDFS or on local directories.
OUTPUT=${OUTPUT:='hdfs://192.168.8.100:9000/_test/output'}

# If the value is true, it is a directed graph, if false, it is an undirected graph.
IS_DIRECTED=${IS_DIRECTED:=true}
# Set whether to encode ID or not.
NEED_ENCODE=${NEED_ENCODE:=true}
# The ID type of the data source vertices. For example string, int32, and int64.
VTYPE=${VTYPE:=int32}
# The value distributed specifies the distributed vertex ID encoding. The value single specifies the single-machine vertex ID encoding.
ENCODER=${ENCODER:="distributed"}
# The parameter for the PageRank algorithm. Algorithms differ in parameters.
EPS=${EPS:=0.0001}
DAMPING=${DAMPING:=0.85}
# The number of iterations.
ITERATIONS=${ITERATIONS:=100}
```

3. Modify the configuration file `cluster` to set the Nebula Plato cluster nodes and task assignment weights for executing the algorithm.

```bash
# Nebula Plato Cluster Node IP Addresses: Task Assignment Weights
192.168.8.200:1
192.168.8.201:1
192.168.8.202:1
```

4. Run the algorithm script. For example:

```bash
./run_pagerank.sh
```

5. View the graph computation results in the export path.

- For exporting to a Nebula Graph cluster, check the results according to the settings in `nebula.conf`.

- For exporting the results to the CSV files on HDFS or on local directories, check the results according to the settings in `OUTPUT`, which is a compressed file in the `.gz` format.
5 changes: 5 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ extra:
algorithm:
release: 2.6.2
branch: v2.6
plato:
release: 1.0.0
branch: v1.0.0
sparkconnector:
release: 2.6.1
branch: v2.6
Expand Down Expand Up @@ -492,6 +495,8 @@ nav:

- Nebula Algorithm: nebula-algorithm.md

- Nebula Plato: nebula-plato.md

- Nebula Spark Connector: nebula-spark-connector.md

- Nebula Flink Connector: nebula-flink-connector.md
Expand Down