Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConsList Performance Slowdown #188

Open
julien-truffaut opened this issue Feb 26, 2015 · 34 comments
Open

ConsList Performance Slowdown #188

julien-truffaut opened this issue Feb 26, 2015 · 34 comments

Comments

@julien-truffaut
Copy link

Benchmark with miniboxing

[info] Benchmark                     (size)  Mode  Cnt   Score   Error  Units
[info] ConsListBench.appendConsList      10  avgt   30   0.689 ± 0.022  us/op
[info] ConsListBench.appendConsList     100  avgt   30   5.862 ± 0.101  us/op
[info] ConsListBench.appendConsList    1000  avgt   30  56.735 ± 1.560  us/op
[info] ConsListBench.appendList          10  avgt   30   0.157 ± 0.010  us/op
[info] ConsListBench.appendList         100  avgt   30   1.328 ± 0.025  us/op
[info] ConsListBench.appendList        1000  avgt   30  12.906 ± 0.218  us/op

without miniboxing

[info] Benchmark                     (size)  Mode  Cnt   Score   Error  Units
[info] ConsListBench.appendConsList      10  avgt   30   0.130 ± 0.004  us/op
[info] ConsListBench.appendConsList     100  avgt   30   1.294 ± 0.041  us/op
[info] ConsListBench.appendConsList    1000  avgt   30  13.498 ± 0.585  us/op
[info] ConsListBench.appendList          10  avgt   30   0.149 ± 0.006  us/op
[info] ConsListBench.appendList         100  avgt   30   1.117 ± 0.024  us/op
[info] ConsListBench.appendList        1000  avgt   30  12.934 ± 0.201  us/op

How to reproduce:
clone brique
run benchmark in branch master (non miniboxing) and branch miniboxing (with miniboxing)

sbt
project bench
 run -i 10 -wi 3 -f3 .*ConsListBench.append.*
@julien-truffaut
Copy link
Author

I tried in master to implement foldLeft like in miniboxing branch (with an inner method) but I don't see an impact on performance

@VladUreche
Copy link
Member

I'll have to understand how the ConsList is transformed in detail, but in principle the expected speedups are around 30% compared to the generic version (at least that's what we got for the Scala linked list with this source code.

@VladUreche VladUreche changed the title Performance ConsList Performance Slowdown Feb 26, 2015
@VladUreche VladUreche added this to the 0.4 milestone Feb 26, 2015
@VladUreche VladUreche self-assigned this Feb 26, 2015
@julien-truffaut
Copy link
Author

I refactored all tailrec methods with while loops in order to be sure they were not the cause of the performance drop. I benchmarked sum to be closer to how you benchmarked miniboxing and it seems to give a similar improvement:

// with miniboxing
[info] Benchmark                  (size)  Mode  Cnt  Score   Error  Units
[info] ConsListBench.sumConsList      10  avgt   30  0.055 ± 0.001  us/op
[info] ConsListBench.sumConsList     100  avgt   30  0.277 ± 0.003  us/op
[info] ConsListBench.sumConsList    1000  avgt   30  2.438 ± 0.041  us/op
[info] ConsListBench.sumList          10  avgt   30  0.045 ± 0.000  us/op
[info] ConsListBench.sumList         100  avgt   30  0.596 ± 0.012  us/op
[info] ConsListBench.sumList        1000  avgt   30  6.322 ± 0.079  us/op

// without miniboxing
[info] Benchmark                  (size)  Mode  Cnt  Score   Error  Units
[info] ConsListBench.sumConsList      10  avgt   30  0.046 ± 0.001  us/op
[info] ConsListBench.sumConsList     100  avgt   30  0.365 ± 0.004  us/op
[info] ConsListBench.sumConsList    1000  avgt   30  3.926 ± 0.062  us/op
[info] ConsListBench.sumList          10  avgt   30  0.046 ± 0.001  us/op
[info] ConsListBench.sumList         100  avgt   30  0.589 ± 0.005  us/op
[info] ConsListBench.sumList        1000  avgt   30  6.373 ± 0.068  us/op

However, I found out a case where performance drop by a factor of 10, here is a link to the code: https://github.com/julien-truffaut/brique/blob/bbb28176cd8310a2ccf2c01a8939f210604898bb/core/src/main/scala/brique/ConsList.scala#L154-L168

// with miniboxing
[info] Benchmark                       (size)  Mode  Cnt   Score   Error  Units
[info] ConsListBench.reverse2ConsList      10  avgt   30   0.071 ± 0.001  us/op
[info] ConsListBench.reverse2ConsList     100  avgt   30   0.487 ± 0.007  us/op
[info] ConsListBench.reverse2ConsList    1000  avgt   30   4.919 ± 0.072  us/op
[info] ConsListBench.reverseConsList       10  avgt   30   0.546 ± 0.007  us/op
[info] ConsListBench.reverseConsList      100  avgt   30   5.086 ± 0.033  us/op
[info] ConsListBench.reverseConsList     1000  avgt   30  50.056 ± 1.153  us/op
[info] ConsListBench.reverseList           10  avgt   30   0.039 ± 0.001  us/op
[info] ConsListBench.reverseList          100  avgt   30   0.347 ± 0.003  us/op
[info] ConsListBench.reverseList         1000  avgt   30   3.918 ± 0.044  us/op

@julien-truffaut
Copy link
Author

I run the entire benchmark with and without miniboxing, you can find the results here: https://docs.google.com/spreadsheets/d/1tTPKvkjqKVxXm6BeO0yLm1cFbTtkLGKrLg6LWrVyDZc/edit?usp=sharing

@VladUreche
Copy link
Member

Very strange, this is what I'm getting:

With miniboxing:

[info] Benchmark                      (size)  Mode  Cnt      Score      Error  Units
[info] ConsListBench.appendConsList       10  avgt   15      0.198 ±    0.013  us/op
[info] ConsListBench.appendConsList      100  avgt   15      1.350 ±    0.054  us/op
[info] ConsListBench.appendConsList     1000  avgt   15     13.756 ±    0.661  us/op
[info] ConsListBench.appendConsList    10000  avgt   15    111.328 ±    4.386  us/op
[info] ConsListBench.appendConsList   100000  avgt   15   1345.934 ±   34.768  us/op
[info] ConsListBench.appendConsList  1000000  avgt   15  20270.947 ± 1249.934  us/op
[info] ConsListBench.appendList           10  avgt   15      0.156 ±    0.009  us/op
[info] ConsListBench.appendList          100  avgt   15      1.251 ±    0.060  us/op
[info] ConsListBench.appendList         1000  avgt   15     11.972 ±    0.197  us/op
[info] ConsListBench.appendList        10000  avgt   15    244.344 ±    6.984  us/op
[info] ConsListBench.appendList       100000  avgt   15   2464.826 ±  171.291  us/op
[info] ConsListBench.appendList      1000000  avgt   15  11816.217 ± 1596.650  us/op

without miniboxing:

[info] # Run complete. Total time: 00:06:20
[info] 
[info] Benchmark                      (size)  Mode  Cnt      Score      Error  Units
[info] ConsListBench.appendConsList       10  avgt   15      0.090 ±    0.002  us/op
[info] ConsListBench.appendConsList      100  avgt   15      0.832 ±    0.031  us/op
[info] ConsListBench.appendConsList     1000  avgt   15      8.634 ±    0.510  us/op
[info] ConsListBench.appendConsList    10000  avgt   15     81.850 ±    5.412  us/op
[info] ConsListBench.appendConsList   100000  avgt   15   1031.133 ±   33.813  us/op
[info] ConsListBench.appendConsList  1000000  avgt   15  15298.762 ± 1205.139  us/op
[info] ConsListBench.appendList           10  avgt   15      0.143 ±    0.006  us/op
[info] ConsListBench.appendList          100  avgt   15      1.246 ±    0.044  us/op
[info] ConsListBench.appendList         1000  avgt   15     11.854 ±    0.792  us/op
[info] ConsListBench.appendList        10000  avgt   15    236.025 ±    6.038  us/op
[info] ConsListBench.appendList       100000  avgt   15   2481.521 ±  165.411  us/op
[info] ConsListBench.appendList      1000000  avgt   15  11137.584 ± 1358.864  us/op

@DarkDimius
Copy link
Member

@julien-truffaut can you tell a bit more about your setup? Are you running JVM in client mode?

@julien-truffaut
Copy link
Author

@DarkDimius I am not sure, how can I check? I am running the benchmarks on my mac book pro

@VladUreche
Copy link
Member

I think I see the problem. Running with -P:minibox:warn-all to get all warnings, including for the Scala library, I get:

warn] /mnt/data1/Work/Workspace/dev/brique/core/src/main/scala/brique/ConsList.scala:272: 
The class scala.Tuple2 would benefit from miniboxing type parameter T1, since it is 
instantiated by miniboxed type parameter A of method unapply.
[warn]   final case class Cons[@miniboxed A](head: A, tail: ConsList[A]) extends ConsList[A]
[warn]                    ^

Translated to English: In every match/case such as case Cons(head, tail), the head value is boxed and then unboxed, since it is passed in a Tuple2, which isn't miniboxed. So the key to the problem is fixing #155 -- from there, we can debug the issue further.

I moved #155 to milestone 0.4, so I fix it by the end of March. But at least we have a lead :)

@VladUreche
Copy link
Member

An alternative approach, which we used in the linked lists, was to expose the head and tail explicitly. So the code would be transformed from:

  final def foldLeft[@miniboxed B](b: B)(f: (B, A) => B): B = {
    var acc = b
    var l = this
    while(true){
      l match {
        case Cons(h, t) =>
          acc = f(acc, h)
          l = t
        case CNil() => return acc
      }
    }
    acc
  }

to:

  final def foldLeft[@miniboxed B](b: B)(f: (B, A) => B): B = {
    var acc = b
    var l = this
    while(true){
      l match {
        case cons: Cons =>
          acc = f(acc, cons.head)
          l = cons.tail
        case CNil() => return acc
      }
    }
    acc
  }

Notice the two cases:

        case Cons(h, t) =>
          acc = f(acc, h)
          l = t

and

        case cons: Cons =>
          acc = f(acc, cons.head)
          l = cons.tail

@julien-truffaut
Copy link
Author

ah it makes sense, thank you @VladUreche

@VladUreche
Copy link
Member

Seems even with this, miniboxing is still slowed down:

[info] Benchmark                      (size)  Mode  Cnt      Score      Error  Units
[info] ConsListBench.appendConsList       10  avgt   15      0.179 ±    0.006  us/op
[info] ConsListBench.appendConsList      100  avgt   15      1.147 ±    0.021  us/op
[info] ConsListBench.appendConsList     1000  avgt   15     11.635 ±    0.171  us/op
[info] ConsListBench.appendConsList    10000  avgt   15    102.367 ±    1.427  us/op
[info] ConsListBench.appendConsList   100000  avgt   15   1238.197 ±   63.866  us/op
[info] ConsListBench.appendConsList  1000000  avgt   15  19298.500 ± 1295.293  us/op
[info] ConsListBench.appendList           10  avgt   15      0.154 ±    0.017  us/op
[info] ConsListBench.appendList          100  avgt   15      1.212 ±    0.065  us/op
[info] ConsListBench.appendList         1000  avgt   15     11.223 ±    0.074  us/op
[info] ConsListBench.appendList        10000  avgt   15    229.220 ±    1.908  us/op
[info] ConsListBench.appendList       100000  avgt   15   2269.490 ±  161.668  us/op
[info] ConsListBench.appendList      1000000  avgt   15   9624.083 ± 1374.385  us/op

The commits are: VladUreche/brique@a13de7b + VladUreche/brique@1f7389c

I enabled the Apples test and it works fine :)

@VladUreche
Copy link
Member

I guess the only solution is to take one of the benchmarks and see the bytecode, since I can't tell where the slowdown is coming from...

@julien-truffaut
Copy link
Author

I am not familiar with byte code, could you show me how to output the byte code of a particular method?

@non
Copy link

non commented Mar 4, 2015

If you do javap -c -private com.mydomain.MyClass you will get the full bytecode for a class (which will include the particular method.

@VladUreche
Copy link
Member

@julien-truffaut, if you have some time, the best thing you could do would be to take one of the benchmarks (one that exhibits the slowdown) and minimize it to a single file, eliminating all external dependencies -- this way, it's going to be much easier to see the bytecode.

@julien-truffaut
Copy link
Author

I created the following minimal example:
https://github.com/julien-truffaut/brique/blob/miniboxing/core/src/main/scala/brique/TList.scala

here are the results of the benchmarks with miniboxing:

[info] Benchmark                (size)  Mode  Cnt  Score   Error  Units
[info] TListBench.reverseList       10  avgt  100  0.047 ± 0.002  us/op
[info] TListBench.reverseList      100  avgt  100  0.375 ± 0.014  us/op
[info] TListBench.reverseList     1000  avgt  100  4.449 ± 0.152  us/op
[info] TListBench.reverseTList      10  avgt  100  0.082 ± 0.001  us/op
[info] TListBench.reverseTList     100  avgt  100  0.616 ± 0.023  us/op
[info] TListBench.reverseTList    1000  avgt  100  5.933 ± 0.133  us/op

and here is the byte code generated by TListJ:

public abstract class brique.TList_J<Asp> implements brique.TList<Asp> {
  private final byte brique|TList_J|A_TypeTag;

  public scala.collection.Iterator<java.lang.Object> productIterator();
    Code:
       0: aload_0       
       1: invokestatic  #17                 // Method scala/Product$class.productIterator:(Lscala/Product;)Lscala/collection/Iterator;
       4: areturn       

  public java.lang.String productPrefix();
    Code:
       0: aload_0       
       1: invokestatic  #24                 // Method scala/Product$class.productPrefix:(Lscala/Product;)Ljava/lang/String;
       4: areturn       

  public final brique.TList<Asp> reverse();
    Code:
       0: new           #28                 // class brique/TList$TNil_J
       3: dup           
       4: aload_0       
       5: getfield      #30                 // Field "brique|TList_J|A_TypeTag":B
       8: invokespecial #34                 // Method brique/TList$TNil_J."<init>":(B)V
      11: astore_1      
      12: aload_0       
      13: astore_2      
      14: aload_2       
      15: astore_3      
      16: aload_3       
      17: instanceof    #36                 // class brique/TList$TCons
      20: ifeq          69
      23: aload_3       
      24: checkcast     #36                 // class brique/TList$TCons
      27: astore        4
      29: new           #38                 // class brique/TList$TCons_J
      32: dup           
      33: aload_0       
      34: getfield      #30                 // Field "brique|TList_J|A_TypeTag":B
      37: aload         4
      39: aload_0       
      40: getfield      #30                 // Field "brique|TList_J|A_TypeTag":B
      43: invokeinterface #42,  2           // InterfaceMethod brique/TList$TCons.head_J:(B)J
      48: aload_1       
      49: invokespecial #45                 // Method brique/TList$TCons_J."<init>":(BJLbrique/TList;)V
      52: astore_1      
      53: aload         4
      55: invokeinterface #48,  1           // InterfaceMethod brique/TList$TCons.tail:()Lbrique/TList;
      60: astore_2      
      61: getstatic     #54                 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;
      64: astore        5
      66: goto          14
      69: aload_1       
      70: areturn       

  public brique.TList_J(byte);
    Code:
       0: aload_0       
       1: iload_1       
       2: putfield      #30                 // Field "brique|TList_J|A_TypeTag":B
       5: aload_0       
       6: invokespecial #60                 // Method java/lang/Object."<init>":()V
       9: aload_0       
      10: invokestatic  #64                 // Method scala/Product$class.$init$:(Lscala/Product;)V
      13: aload_0       
      14: invokestatic  #69                 // Method brique/TList$class.$init$:(Lbrique/TList;)V
      17: return        
}

I am not sure what is this line corresponding to:

61: getstatic     #54                 // Field scala/runtime/BoxedUnit.UNIT:Lscala/runtime/BoxedUnit;

@VladUreche
Copy link
Member

Thanks a lot @julien-truffaut! I will look into this by Saturday evening (SF time).

@julien-truffaut
Copy link
Author

@VladUreche did you get some time to look into this?

@VladUreche
Copy link
Member

@julien-truffaut no, sorry, I've been working full-time on the ScalaDays presentation. Let me have a quick look now.

@VladUreche
Copy link
Member

@julien-truffaut, I was able to isolate the problem down to a single file:

$ cat gh-bug-188.scala
package brique


import scala.miniboxed

/**
 * shortest example to exhibit slowdown with miniboxing
 */
sealed abstract class TList[@miniboxed A] extends scala.Product with scala.Serializable {
  import brique.TList._

  final def reverse: TList[A] = {
    var acc: TList[A] = TNil[A]()
    var l = this
    while(true){
      l match {
        case c: TCons[A] => acc = TCons(c.head , acc); l = c.tail
        case _ => return acc
      }
    }
    acc
  }

}

object TList {
  final case class TNil[@miniboxed A]() extends TList[A]
  final case class TCons[@miniboxed A](head: A, tail: TList[A]) extends TList[A]
}

object Test {

  import TList._

  def main(args: Array[String]): Unit = {
    var list = (1 to 1000000).foldRight[TList[Int]](TNil())(TCons(_, _))
    var i = 0
    var j = 0
    while (j < 30) {
      val start = System.currentTimeMillis
      while (i < 100) {
        list.reverse
        i+= 1
      }
      val end = System.currentTimeMillis
      println((end - start) + " ms for " + i + " iterations")
      i = 0
      j += 1
    }
    println("done")
  }
}

$ mb-scalac gh-bug-188.scala 

$ mb-scala brique.Test
1354 ms for 100 iterations
1119 ms for 100 iterations
1174 ms for 100 iterations
1023 ms for 100 iterations
1177 ms for 100 iterations
1166 ms for 100 iterations
1109 ms for 100 iterations
1123 ms for 100 iterations
1216 ms for 100 iterations
1143 ms for 100 iterations
1220 ms for 100 iterations
1239 ms for 100 iterations
1169 ms for 100 iterations
1193 ms for 100 iterations
1158 ms for 100 iterations
1200 ms for 100 iterations
1168 ms for 100 iterations
1198 ms for 100 iterations
1255 ms for 100 iterations
1156 ms for 100 iterations
1192 ms for 100 iterations
1167 ms for 100 iterations
1227 ms for 100 iterations
1147 ms for 100 iterations
1239 ms for 100 iterations
1142 ms for 100 iterations
1242 ms for 100 iterations
1217 ms for 100 iterations
1150 ms for 100 iterations
1184 ms for 100 iterations
done

$ mb-scalac gh-bug-188.scala -P:minibox:off
Miniboxing plugin warning: Turning off all minboxing specialization!
Miniboxing plugin warning: Optimizing `MbArray` is disabled, thus `MbArray`-s will be generic and will box.

$ mb-scala brique.Test
2123 ms for 100 iterations
1828 ms for 100 iterations
1126 ms for 100 iterations
1014 ms for 100 iterations
1021 ms for 100 iterations
1098 ms for 100 iterations
1134 ms for 100 iterations
1097 ms for 100 iterations
1053 ms for 100 iterations
1106 ms for 100 iterations
1033 ms for 100 iterations
1081 ms for 100 iterations
1049 ms for 100 iterations
1071 ms for 100 iterations
1029 ms for 100 iterations
1030 ms for 100 iterations
999 ms for 100 iterations
1049 ms for 100 iterations
991 ms for 100 iterations
1048 ms for 100 iterations
1037 ms for 100 iterations
1116 ms for 100 iterations
1134 ms for 100 iterations
1047 ms for 100 iterations
1102 ms for 100 iterations
1036 ms for 100 iterations
1035 ms for 100 iterations
1050 ms for 100 iterations
1079 ms for 100 iterations
979 ms for 100 iterations
done

@VladUreche
Copy link
Member

And yes, I can see the 10-20% slowdown. Looking into it now :)

@VladUreche
Copy link
Member

I looked at the bytecode and the JIT compilation process and they're identical across the generic and miniboxed versions of the code. The problem seems to be related to GC cycles: the miniboxed version of the code triggers more GC cycles, which in turn cause the 10-20% performance drop. I tried reasoning about the object size on the heap for miniboxed and generic TCons cells, but (at least on paper) they seem identical, despite the slightly different layout.

dsc04843

Right now, I would leave this pending since I don't see an obvious direction. You should get a much better performance when you start mapping over the TList, since there you'll have a clear benefit.

@julien-truffaut, WDYT?

@VladUreche
Copy link
Member

Since there was too much text, here's the link: https://gist.github.com/VladUreche/9faa054d78c00fd66f33

@VladUreche
Copy link
Member

Also the -verbose:gc results: https://gist.github.com/VladUreche/e8ce83f419534f378699

@julien-truffaut
Copy link
Author

@VladUreche thanks for looking into that. A few points, I would not rely on benchmarking with loops and System.currentTimeMillis it is very easy to get biased result. I will rerun the bench but I think the difference I observed is closer to 40-50%. Even if it was only a 20% slow down, I think it is a major problem because it is not limited to reverse, most List methods are impacted.

@VladUreche
Copy link
Member

@julien-truffaut, I have a proof that the slowdown is indeed caused by the different layout in miniboxed classes and its interaction with the GC.

Here's the proof: We can easily create a list using the miniboxed object-based versions of TNil and TCons, which are identical to the generic versions of TNil and TCons:

$ diff gh-bug-188-any.scala gh-bug-188.scala 
36c36
<     var list = (1 to 1000000).foldRight[TList[Any]](TNil())(TCons(_, _))
---
>     var list = (1 to 1000000).foldRight[TList[Int]](TNil())(TCons(_, _))

Then, running the benchmark on the generic version of the code (with -P:minibox:off) and the miniboxed object-based version produces the exact same numbers. Hence, the slowdown can be attributed to the miniboxed class layout compared to the generic and miniboxed but object-based TNil and TCons classes.

Now, I think you could use the same technique to debug slowdowns in the other benchmarks: you can change the input to generate the object miniboxed versions of the class in https://github.com/julien-truffaut/brique/blob/miniboxing/bench/src/main/scala/brique/bench/input/TListInput.scala:

  @Setup
  def setup(): Unit =
    tList = (range(size).foldRight[TList[Any]](TNil())(TCons(_, _))).asInstanceOf[TList[Int]]

I will leave the bug open until I have some time to look into all the slowdowns -- some may be legitimate, so I really want to understand the problem.

@VladUreche
Copy link
Member

A few points, I would not rely on benchmarking with loops and System.currentTimeMillis it is very easy to get biased result.

Sure, that makes sense. I wanted a self-contained (literally zero-dependency) benchmark to run in the debug build of the JVM.

I will rerun the bench but I think the difference I observed is closer to 40-50%.

Can you please run with the Any version as well for the input (described in the previous message)? I would be curious to see how much of the slowdown can be attributed to the in-memory class layout.
Maybe there are other slowdowns we can address.

Even if it was only a 20% slow down, I think it is a major problem because it is not limited to reverse, most List methods are impacted.

Yes, that's exactly why I left the bug open. Now I'm a bit pressed to write a paper (deadline is on the 26th of March, in 4 days from now) but as soon as I'm done with that I will look into the other slowdowns as well.

One more point: Keep in mind that Scala collections do all kinds of tricks to improve performance, for example the tail in List is a variable that can only be set by the builder. This does improve the performance when map-ping, filter-ing and flatMap-ping, since it removes the need to reconstruct the list: https://github.com/scala/scala/blob/2.11.x/src/library/scala/collection/immutable/List.scala#L439

@julien-truffaut
Copy link
Author

@VladUreche no problem take your time, I just wanted to be sure that we agreed on the importance of this issue.

Indeed scala list uses some tricks but we can outperform them almost everywhere (except map and take atm). I tried to use miniboxing in order to get even better performance but for the moment, it only improves sum, almost all other methods are negatively impacted :/

@VladUreche
Copy link
Member

Based on the numbers you gathered, I prepared a version of the spreadsheet comparing generic and miniboxed times:

https://docs.google.com/spreadsheets/d/1tTPKvkjqKVxXm6BeO0yLm1cFbTtkLGKrLg6LWrVyDZc/edit?usp=sharing

  • The last column shows miniboxed time divided by generic time
  • I color-coded the cells with green if the performance of miniboxed ops is better, yellow if the slowdown is at most 50% and red if it's above, to help us prioritize
  • I removed benchmarks with size = 10, because those tend to be too jittery to be useful (and are outliers most of the time)

Given the red areas, I think the main priority is to check drop, eq and flatMap.

And yes, I agree it's important to have a good idea of what is happening -- for some slowdowns we may not be able to do much, since they're not under our control, but at the very least I must provide a proper explanation of each slowdown and why it occurs.

Btw, I'm really looking forward to seeing the numbers gathered with TList[Any]. Those should lie in the middle between generic and miniboxing times and should give us an idea of how we stand on each benchmark.

@VladUreche
Copy link
Member

One more thing that came to mind: could you also prepare macro-benchmarks of the TList operations? What I've seen in some cases is that for microbenchmarks the JVM inlines methods aggressively and removes all overheads thanks to "clean" type profiles. But once you use the collection with multiple functions, types of values etc, the performance suddenly drops. So it would be really nice to have an idea of if/how the performance changes in macrobenchmarks.

@julien-truffaut
Copy link
Author

I will try to prepare some banchmarks this week or over the weekend.

@VladUreche
Copy link
Member

Thanks a lot @julien-truffaut!

@julien-truffaut
Copy link
Author

sorry pretty busy atm, I will have to postpone this.

@VladUreche
Copy link
Member

@julien-truffaut, no worries. I will have more time now, so I can isolate the benchmarks myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants