Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Elasticsearch-Hadoop support HTTPS proxy connections ? #2230

Open
1 of 2 tasks
AVN9399 opened this issue May 21, 2024 · 0 comments
Open
1 of 2 tasks

Does Elasticsearch-Hadoop support HTTPS proxy connections ? #2230

AVN9399 opened this issue May 21, 2024 · 0 comments

Comments

@AVN9399
Copy link

AVN9399 commented May 21, 2024

What kind an issue is this?

  • Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
    The easier it is to track down the bug, the faster it is solved.
  • Feature Request. Start by telling us what problem you’re trying to solve.
    Often a solution already exists! Don’t send pull requests to implement new features without
    first getting our support. Sometimes we leave features out on purpose to keep the project small.

Issue description

n my virtual machine, my Spark application is trying to push data to an Elasticsearch server using the JavaEsSparkSQL.saveToEs() function. This requires passing through an HTTPS proxy. Although it seems Spark can connect to the proxy server, it appears not to recognize the proxy's username and password.

The proxy is fully accessible using curl or from other parts of my Java code.

I've tried numerous approaches, but none have been successful. I'm unsure if Elasticsearch-Hadoop supports HTTPS proxy connections.

Steps to reproduce

Code:

My spark-defaults.conf

spark.executor.extraJavaOptions   -Dhttps.proxyHost=proxyhost-Dhttps.proxyPort=proxyport- Dhttps.proxyUser=XXXX  -Dhttps.proxyPassword=XXXX -Djdk.http.auth.tunneling.disabledSchemes=  -Djdk.http.auth.proxying.disabledSchemes=
spark.driver.extraJavaOptions     -Dderby.system.home=/tmp/derby/ -Dhttps.proxyHost=proxyhost  -Dhttps.proxyPort=proxyport -Dhttps.proxyUser=XXXX  -Dhttps.proxyPassword=XXXX  -Djdk.http.auth.tunneling.disabledSchemes=   -Djdk.http.auth.proxying.disabledSchemes=

Strack trace:

24/05/21 17:08:17 DEBUG HeaderProcessor: Added HTTP Headers to method: [X-Opaque-ID: [spark] [portail] [Projet_P16016_SEARCH ENGINE To Elastic Search Data] [app-20240521170655-0092]
, User-Agent: elasticsearch-hadoop/8.12.0 spark/3.1.1
, Content-Type: application/json
, Accept: application/json
]
24/05/21 17:08:17 DEBUG CommonsHttpTransport: Using regular user provider to wrap rest request
24/05/21 17:08:17 TRACE CommonsHttpTransport: Tx [HTTPS proxyhost:proxyport][GET]@[elasticsearchserver:443][]?[null] w/ payload [null]
24/05/21 17:08:17 WARN HttpMethodDirector: Required proxy credentials not available for BASIC <any realm>@proxyhost:proxyport
24/05/21 17:08:17 WARN HttpMethodDirector: Preemptive authentication requested but no default proxy credentials available
24/05/21 17:08:17 INFO AuthChallengeProcessor: Basic authentication scheme selected
24/05/21 17:08:17 INFO HttpMethodDirector: Failure authenticating with BASIC 'ECH'@proxyhost:proxyport
24/05/21 17:08:17 TRACE CommonsHttpTransport: Rx [HTTPS proxy proxyhost:proxyport]@[10.86.XX.XXX] [407-Proxy Authentication Required]
...
...
...
24/05/21 17:08:17 TRACE CommonsHttpTransport: Closing HTTP transport to 109es125.fr1.esaas.tech.orange:443
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:403)
        at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:99)
        at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:81)
        at org.elasticsearch.spark.sql.api.java.JavaEsSparkSQL$.saveToEs(JavaEsSparkSQL.scala:51)
        at org.elasticsearch.spark.sql.api.java.JavaEsSparkSQL.saveToEs(JavaEsSparkSQL.scala)
        at com.orange.bigdata.app.elk.v2.IndexationSEBruteOptimiseParEntite.main(IndexationSEBruteOptimiseParEntite.java:273)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [GET] on [] failed; server[109es125.fr1.esaas.tech.orange:443] returned [407|Proxy Authentication Required]
...
...
 at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:487)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:444)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:438)
        at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:406)
        at org.elasticsearch.hadoop.rest.RestClient.mainInfo(RestClient.java:755)
        at org.elasticsearch.hadoop.rest.InitializationUtils.discoverClusterInfo(InitializationUtils.java:393)
        ... 17 more

Version Info

OS :Linux
JVM : Temurin-jdk-8
Hadoop/Spark: Spark3
ES-Hadoop : 8.12.0
ES : 7.17.2

Feature description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant