-
Notifications
You must be signed in to change notification settings - Fork 499
Help with Apache Ozone #4890
Replies: 3 comments · 9 replies
-
Hi, although I do not see any exact errors in the logs cited, except the fact that the client does not have a token: Please take a look into Spark documentation about the configuration option: |
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @fapifta, |
Beta Was this translation helpful? Give feedback.
All reactions
-
I have never tried to run such a setup so far, but as I understand the spark docs, this option is something that tells spark where to get delegation tokens from. Based on the message I cited from SaslRpcClient, either the service does not support tokens, which is not true in case of Ozone, or the client does not have one. Now if the client does not have one, then I believe it should get it, and based on the documentation this is the way to make the client ask for a delegation token to the Ozone service. Ozone configurations are necessary to be there in an HA setup, because that contains the information about the hosts that are providing the Ozone Manager HA service, and that is used by our filesystem client which should be on the spark job's classpath in order to be ale to connect to Ozone via the spark filesystem access mechanisms internally. I do not have much exposure to spark either, in this mode it might be an other configuration option (I haven't found any other with a quick search), but the issue is the lack of delegation token on the workers side I am pretty certain. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank for your deep dive on the issue! I'll get back to you shortly with the reply. Just wanna say, that it feels like we need turn some option in Spark on to facilitate getting delegation tokens (as well as block access tokens). But it's deeply hidden from us. |
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @fapifta, I've tried the option: 23/06/19 08:21:27 DEBUG HadoopFSDelegationTokenProvider: Delegation token renewer is: jovyan@REALM 23/06/19 08:21:31 DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolPB info:@org.apache.hadoop.security.token.TokenInfo(value=org.apache.hadoop.ozone.security.OzoneDelegationTokenSelector.class) 23/06/19 08:21:32 DEBUG SaslRpcClient: Sending sasl message state: RESPONSE 23/06/19 08:21:32 DEBUG SaslRpcClient: Sending sasl message state: RESPONSE 23/06/19 08:21:32 DEBUG OMFailoverProxyProviderBase: Failing over OM from om1:0 to om1:0
23/06/19 08:21:32 DEBUG UserGroupInformation: Failed to get groups for user jovyan What are your thoughts on that part? It's confusing that the client didn't receive the tokens, but next it tried to authenticate itself via Kerberos. Is it new method of authentication or some continuation of the previous one? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
All reactions
-
The first important piece of the puzzle seems to be this message: This means that spark got the token with a renewer, and stored it for further use. Note that token URI is The next interesting line is where we try to get the token for the service: We get null for service 10.246.0.182, which seems to be legit, as we got a token for om-0.om.default.svc.realm. What I would check next is whether the reverse and forward DNS resolution works properly for the service URL in all the pod that are in play here (Spark driver, Spark worker, Ozone Manager). The group resolution issue is not breaking the functionality I believe, but will affect how Ozone assigned permissions, and it might have implication on data access, so you might want to have your user created within the same pods. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@fapifta Thank for guiding thoughts! But could we just exclude DNS resolution at all and use only IPs? 23/06/20 05:24:49 DEBUG HadoopFSDelegationTokenProvider: Delegation token renewer is: jovyan@REALM I've been wondering whether "getting" means that the token has been obtained? |
Beta Was this translation helpful? Give feedback.
All reactions
-
HadoopFSDelegationTokenProvider is a class in the spark scala code, what it does I believe is that it obtains the token from the underlying filesystem. In the Hadoop FileSystem interface, there is a method getDelegationToken that is defined by the DelegationTokenIssuer interface, and implemented by the FileSystem interface implementations, that is the regular way to obtain tokens from Ozone or from HDFS, but the same DelegationTokenIssuer is implemented for HBase and for YARN for example on their client APIs. I am not sure if the IP addresses were working, was the behaviour changed after using the IP address of the service? As internally Ozone uses URLs for this, it should be ok to use IP addresses, as the java URL/URI classes can deal with an IP address/port combination just as well as with a domain name/port combination. I am not sure though if there is any functionality where we obtain a domain name via a reverese DNS lookup in this case, but that might still cause trouble... |
Beta Was this translation helpful? Give feedback.
All reactions
-
As to this question "I am not sure if the IP addresses were working, was the behaviour changed after using the IP address of the service?" additionally, getting token for the IP address of the service appeared in logs: 23/06/20 05:24:49 DEBUG HadoopFSDelegationTokenProvider: Delegation token renewer is: jovyan@REALM |
Beta Was this translation helpful? Give feedback.
All reactions
-
I am not sure but you might misunderstood the question... It is ok, and visible in the logs you have added that the token is obtained based on IP address, but did that help, and is the job able to run with this approach, or it is still failing? |
Beta Was this translation helpful? Give feedback.
All reactions
-
Unfortunately, it's still failing. @fapifta if we let's wrap it up, basically we can say that Ozone Manager gives token but it desappears somehow on the client side. Can we count on Ozone Manager is able to give delegation tokens? Thanks in advance! |
Beta Was this translation helpful? Give feedback.
All reactions
-
Dear @fapifta, let me show your some updates. I found the way to force using DNS names instead of IP address. However, I am still receiving the same phrase in logs regarding Null token. Here you are the news: |
Beta Was this translation helpful? Give feedback.
-
I would greatly appreciate it if you could assist me with my concern about Apache Ozone setup.
Let me introduce myself in a few words. I am a Senior MLOps who works for a worldwide enterprise with 50K+ employees on board. The project is a ML Platform for Data Scientists (under NDA). We use a cutting edge setup: Apache Ozone 1.3.0 + Spark 3.2.3 (cluster mode) + Kerberos + Kubernetes.
The primary concern is Spark workers can't authenticate on Ozone Manager. In a mean time, Spark driver alone can do it.
Once debugged, it has become clear that SaslRpcServer could not send tokens for SaslRpcClient.
Spark driver logs (Sasl client):
23/06/12 04:35:18 DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolPB info:@org.apache.hadoop.security.token.TokenInfo(value=org.apache.hadoop.ozone.security.OzoneDelegationTokenSelector.class)
23/06/12 04:35:18 DEBUG OzoneDelegationTokenSelector: Got tokens: null for service 10.246.0.181:9862
23/06/12 04:35:18 DEBUG SaslRpcClient: tokens aren't supported for this protocol or user doesn't have one
23/06/12 04:35:18 DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolPB info:@org.apache.hadoop.security.KerberosInfo(clientPrincipal="", serverPrincipal="ozone.om.kerberos.principal")
23/06/12 04:35:18 DEBUG SaslRpcClient: getting serverKey: ozone.om.kerberos.principal conf value: OM/_HOST@REALM principal: OM/om-0.om.default.svc.realm@REALM
23/06/12 04:35:18 DEBUG SaslRpcClient: RPC Server's Kerberos principal name for protocol=org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolPB is OM/om-0.om.default.svc.realm@REALM
23
Ozone Manager logs (Sasl server):
om-0.om.default.svc.realm@REALM
23/06/12 04:35:13 DEBUG UserGroupInformation: PrivilegedAction [as: OM/om-0.om.default.svc.realm@REALM (auth:KERBEROS)][action: org.apache.hadoop.security.SaslRpcServer$1@89eef25]
java.lang.Exception
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.security.SaslRpcServer.create(SaslRpcServer.java:153)
at org.apache.hadoop.ipc.Server$Connection.createSaslServer(Server.java:2403)
at org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:2148)
at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:2042)
at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1984)
at org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:2786)
at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:2584)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:2333)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:1449)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1304)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1275)
23/06/12 04:35:13 DEBUG SaslRpcServer: Created SASL server with mechanism = GSSAPI
23/06/12 04:35:13 DEBUG Server: Have read input token of size 569 for processing by saslServer.evaluateResponse()
23/06/12 04:35:13 DEBUG Server: Will send CHALLENGE token of size 110 from saslServer.
23/06/12 04:35:13 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:47315
23/06/12 04:35:13 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:47315 Wrote 134 bytes.
23/06/12 04:35:13 DEBUG Server: got #-33
23/06/12 04:35:13 DEBUG Server: Have read input token of size 0 for processing by saslServer.evaluateResponse()
23/06/12 04:35:13 DEBUG Server: Will send CHALLENGE token of size 65 from saslServer.
23/06/12 04:35:13 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:47315
23/06/12 04:35:13 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:47315 Wrote 89 bytes.
23/06/12 04:35:13 DEBUG Server: got #-33
23/06/12 04:35:13 DEBUG Server: Have read input token of size 65 for processing by saslServer.evaluateResponse()
23/06/12 04:35:13 DEBUG SaslRpcServer: SASL server GSSAPI callback: setting canonicalized client ID: jovyan@REALM
23/06/12 04:35:13 DEBUG Server: Will send SUCCESS token of size null from saslServer.
23/06/12 04:35:13 DEBUG Server: SASL server context established. Negotiated QoP is auth
23/06/12 04:35:13 DEBUG Server: SASL server successfully authenticated client: jovyan@REALM (auth:KERBEROS)
23/06/12 04:35:13 INFO Server: Auth successful for jovyan@REALM (auth:KERBEROS) from 10.48.24.92:47315
23/06/12 04:35:13 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:47315
23/06/12 04:35:13 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:47315 Wrote 22 bytes.
23/06/12 04:35:13 DEBUG Server: got #-3
23/06/12 04:35:13 DEBUG Server: Successfully authorized userInfo {
effectiveUser: "jovyan@REALM"
}
protocol: "org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol"
23/06/12 04:35:13 DEBUG Server: got #0
23/06/12 04:35:13 DEBUG Server: IPC Server handler 26 on default port 9862: Call#0 Retry#0 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 10.48.24.92:47315 for RpcKind RPC_PROTOCOL_BUFFER
23/06/12 04:35:13 DEBUG UserGroupInformation: PrivilegedAction [as: jovyan@REALM (auth:KERBEROS)][action: Call#0 Retry#0 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 10.48.24.92:47315]
java.lang.Exception
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)
23/06/12 04:35:13 DEBUG OzoneManagerProtocolServerSideTranslatorPB: OzoneProtocol ServiceList request is received
23/06/12 04:35:13 DEBUG OzoneManagerRequestHandler: Received OMRequest: cmdType: ServiceList
traceID: ""
clientId: "client-DFC69A08DEE9"
version: 3
serviceListRequest {
}
,
23/06/12 04:35:13 DEBUG Server: IPC Server handler 26 on default port 9862: responding to Call#0 Retry#0 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 10.48.24.92:47315
23/06/12 04:35:13 DEBUG Server: IPC Server handler 26 on default port 9862: responding to Call#0 Retry#0 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 10.48.24.92:47315 Wrote 4104 bytes.
23/06/12 04:35:13 DEBUG ProcessingDetails: Served: [Call#0 Retry#0 org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 10.48.24.92:47315] name=submitRequest user=jovyan@REALM (auth:KERBEROS) details=enqueueTime=81151 queueTime=74660 handlerTime=1483591 processingTime=17113267 lockfreeTime=17113267 lockwaitTime=0 locksharedTime=0 lockexclusiveTime=0 responseTime=949496
23/06/12 04:35:18 DEBUG Server: Server connection from 10.48.24.92:42870; # active connections: 2; # queued calls: 0
23/06/12 04:35:18 DEBUG Server: got #-33
23/06/12 04:35:18 DEBUG SaslRpcServer: Created SASL server with mechanism = DIGEST-MD5
23/06/12 04:35:18 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:42870
23/06/12 04:35:18 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:42870 Wrote 217 bytes.
23/06/12 04:35:18 DEBUG Server: got #-33
23/06/12 04:35:18 DEBUG SaslRpcServer: Kerberos principal name is OM/om-0.om.default.svc.realm@REALM
23/06/12 04:35:18 DEBUG UserGroupInformation: PrivilegedAction [as: OM/om-0.om.default.svc.realm@REALM (auth:KERBEROS)][action: org.apache.hadoop.security.SaslRpcServer$1@df6879b]
java.lang.Exception
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.security.SaslRpcServer.create(SaslRpcServer.java:153)
at org.apache.hadoop.ipc.Server$Connection.createSaslServer(Server.java:2403)
at org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:2148)
at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:2042)
at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1984)
at org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:2786)
at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:2584)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:2333)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:1449)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1304)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1275)
23/06/12 04:35:18 DEBUG SaslRpcServer: Created SASL server with mechanism = GSSAPI
23/06/12 04:35:18 DEBUG Server: Have read input token of size 587 for processing by saslServer.evaluateResponse()
23/06/12 04:35:18 DEBUG Server: Will send CHALLENGE token of size 110 from saslServer.
23/06/12 04:35:18 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:42870
23/06/12 04:35:18 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:42870 Wrote 134 bytes.
23/06/12 04:35:18 DEBUG Server: got #-33
23/06/12 04:35:18 DEBUG Server: Have read input token of size 0 for processing by saslServer.evaluateResponse()
23/06/12 04:35:18 DEBUG Server: Will send CHALLENGE token of size 65 from saslServer.
23/06/12 04:35:18 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:42870
23/06/12 04:35:18 DEBUG Server: Socket Reader #1 for port 9862: responding to Call#-33 Retry#-1 null from 10.48.24.92:42870 Wrote 89 bytes.
23/06/12 04:35:18 DEBUG Server: got #-33
23/06/12 04:35:18 DEBUG Server: Have read input token of size 65 for processing by saslServer.evaluateResponse()
23/06/12 04:35:18 DEBUG SaslRpcServer: SASL server GSSAPI callback: setting canonicalized client ID: jovyan@REALM
23/06/12 04:35:18 DEBUG Server: Will send SUCCESS token of size null from saslServer.
23/06/12 04:35:18 DEBUG Server: SASL server context established. Negotiated QoP is auth
23/06/12 04:35:18 DEBUG Server: SASL server successfully authenticated client: jovyan@REALM (auth:KERBEROS)
I look forward to receiving your reply as soon as possible. Thanking you in anticipation of your kind cooperation.
P.S.:
I would appreciate if we could collaborate on the above
Beta Was this translation helpful? Give feedback.
All reactions