Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/aws setup - [WIP] #523

Open
wants to merge 36 commits into
base: 1.0.0x
Choose a base branch
from
Open

Feature/aws setup - [WIP] #523

wants to merge 36 commits into from

Conversation

dos65
Copy link
Contributor

@dos65 dos65 commented Sep 30, 2018

This is working implementation of built-in integration with EMR. Cluster configuration may be placed into context settings(examples) and mist can dynamically spawn EMR-clusters.

This requires introduces new mini-submodules:

  • agent - simple application for running on EMR master node to guarantee shutdown cluster in case if mist instance was stopped
  • aws-init-setup - performs initial AWS environment setup(creates if necessary roles, security groups, ssh keys) + generates mist config with this information

This branch is ready for preview and already published here + cloudformation template.

What's left:

  • handle context updates in a more elegant way. Not every update should lead to starting a new cluster. Some setting of emr and ours could be updated on working cluster(maxJobs, instanceCount)
  • http api for workers was commented. need to reimplement it's.
  • rest api for clusters
  • provide a way to put several worker-runners into launcher-settings section and allow contexts to use them
  • provide a way to configure manual-launcher - the similar like we have for workers

@Avik1993
Copy link

Avik1993 commented May 9, 2020

Hi @dos65
i was trying out this patch, but on starting job with required context, its failing with

ERROR 2020-05-09T11:59:33.324 [d5400064-2e35-413b-8b17-32261d930310] FailedEvent with Error:
 java.lang.RuntimeException: Context is broken
	at io.hydrosphere.mist.master.execution.JobActor$$anonfun$io$hydrosphere$mist$master$execution$JobActor$$initial$1.applyOrElse(JobActor.scala:59)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
	at io.hydrosphere.mist.master.execution.JobActor.akka$actor$Timers$$super$aroundReceive(JobActor.scala:24)
	at akka.actor.Timers$class.aroundReceive(Timers.scala:44)
	at io.hydrosphere.mist.master.execution.JobActor.aroundReceive(JobActor.scala:24)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527)
	at akka.actor.ActorCell.invoke(ActorCell.scala:496)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
	at akka.dispatch.Mailbox.run(Mailbox.scala:224)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: Unknown settings name default_emr for ctx root_emr_ctx
	at io.hydrosphere.mist.master.execution.ClustersService$$anon$1.start(ClustersService.scala:46)
	at io.hydrosphere.mist.master.execution.ExecutionService$$anonfun$2$$anonfun$3.apply(ExecutionService.scala:119)
	at io.hydrosphere.mist.master.execution.ExecutionService$$anonfun$2$$anonfun$3.apply(ExecutionService.scala:119)
	at io.hydrosphere.mist.master.execution.ContextFrontend.io$hydrosphere$mist$master$execution$ContextFrontend$$gotoStartingConnector(ContextFrontend.scala:345)
	at io.hydrosphere.mist.master.execution.ContextFrontend$$anonfun$awaitRequest$1.applyOrElse(ContextFrontend.scala:70)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
	at io.hydrosphere.mist.master.execution.ContextFrontend.akka$actor$Timers$$super$aroundReceive(ContextFrontend.scala:39)
	at akka.actor.Timers$class.aroundReceive(Timers.scala:44)
	at io.hydrosphere.mist.master.execution.ContextFrontend.aroundReceive(ContextFrontend.scala:39)
	... 9 more

I have kept the conf folder same as present in the release, just modified the context to pick up emr_ctx context instead of default.

Any pointers?

@blvp
Copy link
Contributor

blvp commented May 9, 2020

Could you please show your context.conf here? I think there might be some issues with that.

@Avik1993
Copy link

Avik1993 commented May 9, 2020

Hi @blvp
Thanks for reply.

I am using following config:

model = Context
name = emr_ctx
data {
  launchData {
    type = "aws-emr",
    launcherSettingsName = "default_emr"
    releaseLabel = "emr-5.17.0"
    instances = [
      {
         instanceGroupType = "master"
         instanceType = "m4.large"
         name = "my-master"
         instanceCount = 1
         market = "onDemand"
      },
      {
         instanceGroupType = "core"
         instanceType = "m4.large"
         name = "my-core"
         instanceCount = 2
         market = "spot"
         bidPrice = "0.035"
      }
    ]
  }
  maxJobs = 1
  maxConnFailures = 1
  workerMode = "exclusive"
  precreated = false
  sparkConf {
    spark.executor.instances = 2
    spark.submit.deployMode = "cluster"
    spark.master = "yarn"
    spark.executor.memory = "1G"
    spark.executor.cores = 2
  }
  downtime = "1200s"
  streamingDuration = "1s"
}

and using 20func.conf as follows:

model = Function
name = hello-mist-scala
data {
    path = hello-mist-scala_0.0.1.jar
    class-name = "HelloMist$"
    #  context = default
    context = emr_ctx
    # context = emr_autoscale_ctx
}

This is creating the function correctly and picking up the required context as well but fails on triggering the function with above error.

@blvp
Copy link
Contributor

blvp commented May 9, 2020

Thank you. I will keep you in the loop. Also, could you please join gitter, so we communicate there istead of PR :)

@Avik1993
Copy link

Avik1993 commented May 9, 2020

Yes @blvp! Thanks

@Avik1993
Copy link

Hey @blvp @dos65
Did you guys get a chance to look at this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants