Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.ArrayIndexOutOfBoundsException #377

Closed
zurk opened this issue Apr 18, 2018 · 9 comments
Closed

java.lang.ArrayIndexOutOfBoundsException #377

zurk opened this issue Apr 18, 2018 · 9 comments
Assignees
Labels

Comments

@zurk
Copy link
Contributor

zurk commented Apr 18, 2018

I saw the similar issue with old engine version, but it seems to appear again in 0.5.7. Not sure about earlier versions.

I run this code on staging cluster

pyspark -c 'spark.local.dir=/spark-temp-data' --total-executor-cores 100 --driver-memory 16g --executor-memory 4g --master "spark://konst-spark-spark-master:7077" --packages "tech.sourced:engine:0.5.7"

(the same for --master "spark://p-spark-master:7077")
and in python

from sourced.engine import Engine
r = "hdfs://hdfs-namenode/apps/borges/10k/"
engine = Engine(spark, r, "siva")
head_blobs = engine.repositories.filter("is_fork = false").references.filter("is_remote = true").head_ref.commits.tree_entries.blobs.count()

Expected Behavior

return the result

Current Behavior

Fails at some moment with error:

18/04/18 12:38:55 WARN TaskSetManager: Lost task 143.0 in stage 2.0 (TID 441, 10.2.20.9, executor 0): java.lang.ArrayIndexOutOfBoundsException
	at org.eclipse.jgit.internal.storage.pack.BinaryDelta.apply(BinaryDelta.java:196)
	at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:887)
	at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:275)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:471)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:429)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:420)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:159)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:234)
	at tech.sourced.engine.iterator.BlobIterator$.readFile(BlobIterator.scala:95)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:59)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:18)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:111)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:17)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
	at tech.sourced.engine.iterator.CleanupIterator.next(CleanupIterator.scala:49)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

the worker logs about the same issue from one of experiments:
worker_logs.zip

@erizocosmico
Copy link
Contributor

Spark and pyspark version?

@zurk
Copy link
Contributor Author

zurk commented Apr 20, 2018

2.2.0

@erizocosmico erizocosmico self-assigned this Apr 20, 2018
@erizocosmico
Copy link
Contributor

erizocosmico commented Apr 20, 2018

I cannot reproduce this error. I got some java.lang.OutOfMemoryError: Java heap space so maybe it has something to do with that.

Can you reproduce it with what you pasted? If not, can you provide a reproduction case?

@r0mainK
Copy link
Contributor

r0mainK commented Apr 23, 2018

Got the same error when running apollo preprocess on subfolder hdfs://hdfs-namenode/siva/latest/0a of PGA after 15 seconds of exectuion, this is the command I ran:

apollo preprocess -r hdfs://hdfs-namenode/siva/latest/0a -o hdfs://hdfs-namenode/parquet/0a -s spark://l-spark-spark-master:7077 --bblfsh babel-bblfshd -l Go Java Python Bash JavaScript Ruby --persist MEMORY_AND_DISK --dep-zip --spark-local-dir /spark-temp-data --config spark.worker.cleanup.enabled=True spark.worker.cleanup.appDataTtl=1800 spark.default.parallelism=360 spark.executor.memory=2176m spark.driver.memory=10G spark.driver.maxResultSize=4G spark.executor.cores=6 spark.cores.max=90

You can find my config in this issue

@erizocosmico
Copy link
Contributor

@r0mainK can you paste the complete logs for your crash?

@r0mainK
Copy link
Contributor

r0mainK commented Apr 23, 2018

Sure, here you go:

[Stage 0:>                                                        (0 + 90) / 91] 18/04/23 09:35:58 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 3, 10.2.38.82, executor 9): java.lang.ArrayIndexOutOfBoundsException
	at org.eclipse.jgit.internal.storage.pack.BinaryDelta.apply(BinaryDelta.java:196)
	at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:887)
	at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:275)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:471)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:429)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:420)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:159)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:234)
	at tech.sourced.engine.iterator.BlobIterator$.readFile(BlobIterator.scala:90)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:56)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:16)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:109)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:16)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
	at tech.sourced.engine.iterator.CleanupIterator.next(CleanupIterator.scala:36)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:80)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

[Stage 0:====>                                                    (7 + 84) / 91]18/04/23 09:36:07 ERROR TaskSetManager: Task 2 in stage 0.0 failed 4 times; aborting job
18/04/23 09:36:07 ERROR FileFormatWriter: Aborting job null.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 93, 10.2.38.82, executor 5): java.lang.ArrayIndexOutOfBoundsException
	at org.eclipse.jgit.internal.storage.pack.BinaryDelta.apply(BinaryDelta.java:196)
	at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:887)
	at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:275)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:471)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:429)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:420)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:159)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:234)
	at tech.sourced.engine.iterator.BlobIterator$.readFile(BlobIterator.scala:90)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:56)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:16)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:109)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:16)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
	at tech.sourced.engine.iterator.CleanupIterator.next(CleanupIterator.scala:36)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:80)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
	at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:509)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at org.eclipse.jgit.internal.storage.pack.BinaryDelta.apply(BinaryDelta.java:196)
	at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:887)
	at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:275)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:471)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:429)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:420)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:159)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:234)
	at tech.sourced.engine.iterator.BlobIterator$.readFile(BlobIterator.scala:90)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:56)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:16)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:109)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:16)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
	at tech.sourced.engine.iterator.CleanupIterator.next(CleanupIterator.scala:36)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:80)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 45.0 in stage 0.0 (TID 46, 10.2.38.82, executor 5): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 25.0 in stage 0.0 (TID 26, 10.2.38.82, executor 7): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 30.0 in stage 0.0 (TID 31, 10.2.38.82, executor 5): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 15.0 in stage 0.0 (TID 16, 10.2.38.82, executor 5): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 69.0 in stage 0.0 (TID 70, 10.2.38.82, executor 6): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 39.0 in stage 0.0 (TID 40, 10.2.38.82, executor 6): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 70.0 in stage 0.0 (TID 71, 10.2.38.82, executor 7): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 24.0 in stage 0.0 (TID 25, 10.2.38.82, executor 6): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 89.0 in stage 0.0 (TID 90, 10.2.38.82, executor 9): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 47.0 in stage 0.0 (TID 48, 10.2.38.82, executor 9): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 18.0 in stage 0.0 (TID 19, 10.2.7.90, executor 0): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 33.0 in stage 0.0 (TID 34, 10.2.7.90, executor 0): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 11.0 in stage 0.0 (TID 12, 10.2.7.90, executor 2): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 12.0 in stage 0.0 (TID 13, 10.2.38.82, executor 8): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 81.0 in stage 0.0 (TID 82, 10.2.7.90, executor 4): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 71.0 in stage 0.0 (TID 72, 10.2.7.90, executor 2): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 52.0 in stage 0.0 (TID 53, 10.2.7.90, executor 4): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 57.0 in stage 0.0 (TID 58, 10.2.38.82, executor 8): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 9.0 in stage 0.0 (TID 10, 10.2.38.82, executor 6): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 86.0 in stage 0.0 (TID 87, 10.2.38.82, executor 8): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 72.0 in stage 0.0 (TID 73, 10.2.38.82, executor 8): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 75.0 in stage 0.0 (TID 76, 10.2.7.90, executor 3): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 44.0 in stage 0.0 (TID 45, 10.2.13.51, executor 11): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 59.0 in stage 0.0 (TID 60, 10.2.13.51, executor 11): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 6, 10.2.13.51, executor 10): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 20.0 in stage 0.0 (TID 21, 10.2.13.51, executor 10): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 53.0 in stage 0.0 (TID 54, 10.2.13.51, executor 12): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 68.0 in stage 0.0 (TID 69, 10.2.13.51, executor 12): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 6.0 in stage 0.0 (TID 7, 10.2.13.51, executor 14): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 34.0 in stage 0.0 (TID 35, 10.2.7.90, executor 1): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 82.0 in stage 0.0 (TID 83, 10.2.13.51, executor 12): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 13.0 in stage 0.0 (TID 14, 10.2.13.51, executor 13): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 58.0 in stage 0.0 (TID 59, 10.2.13.51, executor 13): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 87.0 in stage 0.0 (TID 88, 10.2.13.51, executor 13): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 80.0 in stage 0.0 (TID 81, 10.2.13.51, executor 14): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 74.0 in stage 0.0 (TID 75, 10.2.13.51, executor 11): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 14.0 in stage 0.0 (TID 15, 10.2.13.51, executor 11): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 32.0 in stage 0.0 (TID 33, 10.2.38.82, executor 9): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 43.0 in stage 0.0 (TID 44, 10.2.13.51, executor 13): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 21.0 in stage 0.0 (TID 22, 10.2.13.51, executor 14): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 16.0 in stage 0.0 (TID 17, 10.2.7.90, executor 3): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 73.0 in stage 0.0 (TID 74, 10.2.13.51, executor 13): TaskKilled (stage cancelled)
Traceback (most recent call last):
  File "/usr/local/bin/apollo", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.4/dist-packages/apollo/__main__.py", line 236, in main
    return handler(args)
  File "/usr/local/lib/python3.4/dist-packages/sourced/ml/utils/engine.py", line 70, in wrapped_pause
    return func(cmdline_args, *args, **kwargs)
  File "/usr/local/lib/python3.4/dist-packages/apollo/bags.py", line 84, in preprocess_source
    .execute()
  File "/usr/local/lib/python3.4/dist-packages/sourced/ml/transformers/transformer.py", line 95, in execute
    head = node(head)
  File "/usr/local/lib/python3.4/dist-packages/sourced/ml/transformers/basic.py", line 171, in __call__
    df.write.parquet(self.save_loc)
  File "/usr/local/lib/python3.4/dist-packages/pyspark/sql/readwriter.py", line 691, in parquet
    self._jwrite.parquet(path)
  File "/usr/local/lib/python3.4/dist-packages/py4j/java_gateway.py", line 1133, in __call__
    18/04/23 09:36:07 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 1, 10.2.38.82, executor 5): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 85.0 in stage 0.0 (TID 86, 10.2.7.90, executor 2): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 63.0 in stage 0.0 (TID 64, 10.2.7.90, executor 0): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 36.0 in stage 0.0 (TID 37, 10.2.13.51, executor 14): TaskKilled (stage cancelled)
answer, self.gateway_client, self.target_id, self.name)
  File "/usr/local/lib/python3.4/dist-packages/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/local/lib/python3.4/dist-packages/py4j/protocol.py", line 319, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o164.parquet.
: org.apache.spark.SparkException: Job aborted.
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217)
	at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:509)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:214)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 93, 10.2.38.82, executor 5): java.lang.ArrayIndexOutOfBoundsException
	at org.eclipse.jgit.internal.storage.pack.BinaryDelta.apply(BinaryDelta.java:196)
	at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:887)
	at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:275)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:471)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:429)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:420)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:159)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:234)
	at tech.sourced.engine.iterator.BlobIterator$.readFile(BlobIterator.scala:90)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:56)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:16)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:109)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:16)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
	at tech.sourced.engine.iterator.CleanupIterator.next(CleanupIterator.scala:36)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:80)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)
	... 45 more
Caused by: java.lang.ArrayIndexOutOfBoundsException
	at org.eclipse.jgit.internal.storage.pack.BinaryDelta.apply(BinaryDelta.java:196)
	at org.eclipse.jgit.internal.storage.file.PackFile.load(PackFile.java:887)
	at org.eclipse.jgit.internal.storage.file.PackFile.get(PackFile.java:275)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedObject(ObjectDirectory.java:471)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openPackedFromSelfOrAlternate(ObjectDirectory.java:429)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:420)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:159)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:234)
	at tech.sourced.engine.iterator.BlobIterator$.readFile(BlobIterator.scala:90)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:56)
	at tech.sourced.engine.iterator.BlobIterator.mapColumns(BlobIterator.scala:16)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:109)
	at tech.sourced.engine.iterator.ChainableIterator.next(ChainableIterator.scala:16)
	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
	at tech.sourced.engine.iterator.CleanupIterator.next(CleanupIterator.scala:36)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:80)
	at org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:108)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

18/04/23 09:36:07 WARN TaskSetManager: Lost task 67.0 in stage 0.0 (TID 68, 10.2.7.90, executor 4): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 8.0 in stage 0.0 (TID 9, 10.2.13.51, executor 12): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 38.0 in stage 0.0 (TID 39, 10.2.13.51, executor 12): TaskKilled (stage cancelled)
18/04/23 09:36:07 WARN TaskSetManager: Lost task 88.0 in stage 0.0 (TID 89, 10.2.13.51, executor 11): TaskKilled (stage cancelled)

@erizocosmico
Copy link
Contributor

Managed to reproduce it, I'm trying to figure out the siva file and the blob that are causing this to see if it's a problem in the siva file or something.

@erizocosmico
Copy link
Contributor

erizocosmico commented Apr 23, 2018

Opening the blob with the offending siva file triggers the error, but not with go-git, so it must be some kind of error in jgit. I'm gonna make a reproduction case and report it to jgit.

UPDATE: siva file contains corrupted objects.

@erizocosmico
Copy link
Contributor

I'm closing this, since the problem is in the siva file and not in the engine. I opened an issue on borges to track this issue: src-d/borges#264

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants