Bound total memory used by HiveSplitSource #9119

haozhun · 2017-10-06T22:24:31Z

No description provided.

electrum

See Travis failures

electrum · 2017-10-09T14:53:45Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitSource.java

-        CompletableFuture<List<ConnectorSplit>> future = queue.getBatchAsync(maxSize);
+        CompletableFuture<List<ConnectorSplit>> future = queue.getBatchAsync(maxSize).thenApply(internalSplits -> {
+            ImmutableList.Builder<ConnectorSplit> result = ImmutableList.builder();
+            for (InternalHiveSplit internalSplit : internalSplits) {


Why not use a stream?

The next commit introduces side effect to the for-loop

electrum · 2017-10-09T14:56:09Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveTypeName.java

+public final class HiveTypeName
+{
+    private static final int INSTANCE_SIZE = ClassLayout.parseClass(HivePartitionKey.class).instanceSize() +
+            ClassLayout.parseClass(String.class).instanceSize() * 2;


Why times two? There is only one String in the class.

electrum · 2017-10-09T14:57:42Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveTypeName.java

+        this.value = requireNonNull(value, "value is null");
+    }
+
+    public String toString()


Missing @Override

electrum · 2017-10-09T14:58:52Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java

@@ -264,7 +264,7 @@ public HiveWriter createWriter(Page partitionColumns, int position, OptionalInt
                        .collect(joining(",")));
                schema.setProperty(META_TABLE_COLUMN_TYPES, dataColumns.stream()
                        .map(DataColumn::getHiveType)
-                        .map(HiveType::getHiveTypeName)
+                        .map(hiveType -> hiveType.getHiveTypeName().toString())


Just add

.map(Object::toString)

electrum · 2017-10-09T15:00:10Z

presto-hive/src/test/java/com/facebook/presto/hive/benchmark/FileFormat.java

@@ -463,7 +462,7 @@ private static Properties createSchema(HiveStorageFormat format, List<String> co
                .collect(joining(",")));
        schema.setProperty(META_TABLE_COLUMN_TYPES, columnTypes.stream()
                .map(type -> toHiveType(typeTranslator, type))
-                .map(HiveType::getHiveTypeName)
+                .map(hiveType -> hiveType.getHiveTypeName().toString())


.map(Object::toString)

electrum · 2017-10-09T15:01:31Z

presto-hive/src/main/java/com/facebook/presto/hive/HivePartitionKey.java

@@ -51,6 +55,11 @@ public String getValue()
        return value;
    }

+    public int getEstimatedSizeInBytes()
+    {
+        return INSTANCE_SIZE + name.length() * Character.BYTES + value.length() * Character.BYTES;


This is long and hard to read, add parentheses

I disagree on this one. I find it harder to read with parenthesis.

I will address all other comments. I noticed the build failure and fixed them myself. Apparently, I forgot to push it.

electrum · 2017-10-09T15:04:13Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitSource.java

@@ -62,6 +73,7 @@
        this.tableName = requireNonNull(tableName, "tableName is null");
        this.compactEffectivePredicate = requireNonNull(compactEffectivePredicate, "compactEffectivePredicate is null");
        this.queue = new AsyncQueue<>(maxOutstandingSplits, executor);
+        this.maxOutstandingSplitsBytes = Ints.checkedCast(maxOutstandingSplitsSize.toBytes());


Use toIntExact

Using Ints.saturatedCast() might be better here. It means the memory limit will be lower than the very large limit that was specified.

electrum · 2017-10-09T15:07:02Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitSource.java

@@ -84,9 +96,16 @@ int getOutstandingSplitCount()
    CompletableFuture<?> addToQueue(InternalHiveSplit split)
    {
        if (throwable.get() == null) {
+            if (estimatedSplitSizeInBytes.addAndGet(split.getEstimatedSizeInBytes()) > maxOutstandingSplitsBytes) {


This can overflow if the limit is close to MAX_INT, and/or if there are many threads incrementing at once

We should probably make this a long, just to be safe

electrum · 2017-10-09T15:09:48Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveSplitSource.java

+                // This limit should never be hit given there is a limit of maxOutstandingSplits.
+                // If it's hit, it means individual splits are huge.
+                throw new PrestoException(GENERIC_INTERNAL_ERROR, format(
+                        "Split buffering for %s.%s takes too much memory (%s bytes limit). %s splits are buffered.",


format( "Split buffering for %s.%s exceeded memory limit (%s). %s splits are buffered.", ..., succinctBytes(maxOutstandingSplitsBytes), ...)

InternalHiveSplit avoids unnecessary duplicate information, and is more friendly to memory accounting. It is used for buffering discovered splits inside Hive connector.

wenleix · 2019-04-06T21:12:49Z

Related PR: #9175, #9232

haozhun assigned electrum Oct 6, 2017

facebook-github-bot added the CLA Signed label Oct 6, 2017

electrum approved these changes Oct 9, 2017

View reviewed changes

haozhun force-pushed the hive-split branch 4 times, most recently from 924793b to 51fc023 Compare October 12, 2017 02:04

haozhun added 2 commits October 16, 2017 15:41

Add InternalHiveSplit

7e38ef2

InternalHiveSplit avoids unnecessary duplicate information, and is more friendly to memory accounting. It is used for buffering discovered splits inside Hive connector.

Bound total memory used by HiveSplitSource

1e49d9b

haozhun force-pushed the hive-split branch from 51fc023 to 1e49d9b Compare October 16, 2017 22:42

haozhun merged commit 1e49d9b into prestodb:master Oct 16, 2017

wenleix unassigned electrum Apr 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bound total memory used by HiveSplitSource #9119

Bound total memory used by HiveSplitSource #9119

haozhun commented Oct 6, 2017

electrum left a comment

electrum Oct 9, 2017

haozhun Oct 9, 2017

electrum Oct 9, 2017

electrum Oct 9, 2017

electrum Oct 9, 2017

electrum Oct 9, 2017

electrum Oct 9, 2017

haozhun Oct 9, 2017

electrum Oct 9, 2017

electrum Oct 9, 2017

electrum Oct 9, 2017

electrum Oct 9, 2017

electrum Oct 9, 2017

wenleix commented Apr 6, 2019

Bound total memory used by HiveSplitSource #9119

Bound total memory used by HiveSplitSource #9119

Conversation

haozhun commented Oct 6, 2017

electrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wenleix commented Apr 6, 2019