How to connect from outside Spark cluster? #220

iAlex97 · 2023-07-22T09:59:08Z

Hello, I just got started with Nebula and used the latest Nebula Operator helm chart to deploy a cluster on a 3 node Kubernetes cluster. The overall process was seamless and I could connect from my local machine using NebulaCli to the exposed NodePort for graphd.

However, to be able to import and process data I would like to use the Spark Connector and interact with the Nebula Cluster from an external Spark Cluster. So I built the jars for Spark 3.x and set to use it, just to realise that it needs access to the metad thrift port (9559).

General Question
I checked the chart values and there isn't a way to expose metadata service as NodePort. I also read the docs on how to Connect to NebulaGraph databases from outside a NebulaGraph cluster via Ingress and I think I can forward an external port to the internal thrift port, but this seems like a "hack" rather than a solution. Furthermore the Spark Connector documentation and code doesn't include any way of specifying credentials when doing READ operation and this further makes me believe that metad port shouldn't be publicly exposed.

How can I connect to the metadata service from outside the Nebula cluster network in a secure way?

Is Nebula designed to be accessed only from clients in the same private network?

The text was updated successfully, but these errors were encountered:

wey-gu · 2023-07-23T08:52:06Z

Dear @iAlex97

Thanks!

In 99% of the case, only grpahd is the exposed endpoint of the system, thus the k8s operator was designed so that traffic between metad and storaged was considered internal traffic only(and no auth was done, either).

Metad/storaged was only needed(to be accessed outside of the nebula cluster) when the analytical task is needed(to scan all edges and tags bypassing graphd from all storaged).

The tricky thing here is the client now gets the host lists of storaged from meta, thus simply exposing them isn't enough, we need a further hack to enable the was accessed to the exact same domain name and port as it was inside the cluster during the service discovery.

I previously created some "hack", too.

For now, we encourage putting the spark reading part inside the NebulaGraph cluster's namespace to enable ease of accessing internal endpoints properly, or, deploy it in bare-metal infra and assign metad/storaged hostname/ip that's routable/accessible outside of the cluster.

iAlex97 · 2023-07-23T11:22:14Z

Dear @wey-gu,

Thank you so much for taking the time to write such a detailed answer and shedding some light on this matter.

I think this issue can be closed now.

wey-gu · 2023-07-24T01:40:45Z

Dear @iAlex97 ,

Thanks again for the issue and welcome to the NebulaGraph community!

iAlex97 closed this as completed Jul 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to connect from outside Spark cluster? #220

How to connect from outside Spark cluster? #220

iAlex97 commented Jul 22, 2023

wey-gu commented Jul 23, 2023

iAlex97 commented Jul 23, 2023

wey-gu commented Jul 24, 2023

How to connect from outside Spark cluster? #220

How to connect from outside Spark cluster? #220

Comments

iAlex97 commented Jul 22, 2023

wey-gu commented Jul 23, 2023

iAlex97 commented Jul 23, 2023

wey-gu commented Jul 24, 2023