Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to connect from outside Spark cluster? #220

Closed
iAlex97 opened this issue Jul 22, 2023 · 3 comments
Closed

How to connect from outside Spark cluster? #220

iAlex97 opened this issue Jul 22, 2023 · 3 comments

Comments

@iAlex97
Copy link

iAlex97 commented Jul 22, 2023

Hello, I just got started with Nebula and used the latest Nebula Operator helm chart to deploy a cluster on a 3 node Kubernetes cluster. The overall process was seamless and I could connect from my local machine using NebulaCli to the exposed NodePort for graphd.

However, to be able to import and process data I would like to use the Spark Connector and interact with the Nebula Cluster from an external Spark Cluster. So I built the jars for Spark 3.x and set to use it, just to realise that it needs access to the metad thrift port (9559).

General Question
I checked the chart values and there isn't a way to expose metadata service as NodePort. I also read the docs on how to Connect to NebulaGraph databases from outside a NebulaGraph cluster via Ingress and I think I can forward an external port to the internal thrift port, but this seems like a "hack" rather than a solution. Furthermore the Spark Connector documentation and code doesn't include any way of specifying credentials when doing READ operation and this further makes me believe that metad port shouldn't be publicly exposed.

How can I connect to the metadata service from outside the Nebula cluster network in a secure way?

Is Nebula designed to be accessed only from clients in the same private network?

@wey-gu
Copy link
Contributor

wey-gu commented Jul 23, 2023

Dear @iAlex97

Thanks!

In 99% of the case, only grpahd is the exposed endpoint of the system, thus the k8s operator was designed so that traffic between metad and storaged was considered internal traffic only(and no auth was done, either).

Metad/storaged was only needed(to be accessed outside of the nebula cluster) when the analytical task is needed(to scan all edges and tags bypassing graphd from all storaged).

The tricky thing here is the client now gets the host lists of storaged from meta, thus simply exposing them isn't enough, we need a further hack to enable the was accessed to the exact same domain name and port as it was inside the cluster during the service discovery.

I previously created some "hack", too.

For now, we encourage putting the spark reading part inside the NebulaGraph cluster's namespace to enable ease of accessing internal endpoints properly, or, deploy it in bare-metal infra and assign metad/storaged hostname/ip that's routable/accessible outside of the cluster.

@iAlex97
Copy link
Author

iAlex97 commented Jul 23, 2023

Dear @wey-gu,

Thank you so much for taking the time to write such a detailed answer and shedding some light on this matter.

I think this issue can be closed now.

@iAlex97 iAlex97 closed this as completed Jul 23, 2023
@wey-gu
Copy link
Contributor

wey-gu commented Jul 24, 2023

Dear @iAlex97 ,

Thanks again for the issue and welcome to the NebulaGraph community!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants