Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FromCacheAsync not async all the way through #98

Closed
nphmuller opened this issue Dec 12, 2016 · 10 comments
Closed

FromCacheAsync not async all the way through #98

nphmuller opened this issue Dec 12, 2016 · 10 comments

Comments

@nphmuller
Copy link
Contributor

First of all: sorry for all the issues and comments lately. Been thoroughly checking EFPlus out to see if we can incorporate it into our projects.

I checked out the FromCache async extension methods (on EF6 net462), and I noticed two things:

  1. The cache call is wrapped in Task.Run, so the result is not retrieved async.
    This can be expected, because in netfull ObjectCache is used, which does have async overloads. And the only equivalent that does have async overloads is IDistributedCache, which is .NET Core only. Sadly this would make it impossible to implement, for example, Redis in an async way.
  2. If the cache hit misses, the actual database call is also not async (Example: ToList() is used instead of ToListAsync().

Am I correct here, or missing something? If I'm correct you can see this issue as a feature request. 😀

@zzzprojects zzzprojects self-assigned this Dec 12, 2016
@zzzprojects
Copy link
Collaborator

Hello @nphmuller ,

Firstly, don't be sorry for opening issues ;)

I believe you are partially right, but we will explain why we did it this way.

Task.Run
Using Task.Run make the code to run asynchronously. A task is created, and the main thread continues the job while the task is trying to retrieve the result in another thread.

Using Task.Run is similar to using an async method. So yes, the code retrieve the result asynchronously.

ToList vs ToListAsync
We did it on purpose. Async may look cool, but it also comes with some issue which can severely impact the application performance.

Since the call is using inside a Task.Run, that make the ToList method to run asynchronously but without having to use ToListAsync() which can cause performance issue.

It doesn't matter if "Async" method are called or not, as long as the method is invoked in another thread (using Task.Run) to make it asynchronously.

Here is why we don't want to use ToListAsync:
http://stackoverflow.com/questions/28543293/entity-framework-async-operation-takes-ten-times-as-long-to-complete/28619983#28619983

Let me know if I answered correctly to your both questions.

Best Regards,

Jonathan

@nphmuller
Copy link
Contributor Author

Sure, the calls will still be ran in another thread, and that's really handy for a frontend application (freeing up the gui thread). For a backend application however it does not really matter if the call is executed on thread 1 or thread 2. The depletion of the threadpool is way more important.

If I understand it correctly, Task.Run still uses a thread on the threadpool, but just another one. This has basically no benefit and only adds to overhead of creating the task objects and running the function on another thread. If used correctly (via ToListAsync or GetKeyAsync) the call would free up the threadpool, until the database or redis is ready and then use another thread (could be the same or a different one) to return the result. This is the true benefit of async/await, IMO. And, for me, it is a more important benefit than the downside of a small overhead.

Another benefit is that EF supports cancellation, which is also great for performance, since you can stop a call halfway-through if it turns out the client does not want the result anymore.

The link on performance you mentioned is an edge case in where data types are returned from the database that are pretty big (images, huge texts, etc). (At least, if I understand it correctly) Normally async's overhead is actually pretty small (en it can even be faster than sync on some insert or update queries).
See this link for the comments of EF's team on the issue. And this link for a performance test with smaller amounts of data (varchar(256), int).

And in the edge cases where there's this perfomance issue, the developer can choose to use the sync version of the call. I think it's only confusing if the async version actually behaves like the sync version under the hood.

Taking these point in account, do the async cache extension methods actually add any benefit for a backend application? It looks like I could just use the sync methods. Or, if I really want async all-the-way, retrieve my own results and add them to the cache with QueryCacheManager.Add, etc.

By the way, I'm not hating on your library. Again: it's looking great and could save me a LOT of time. Just trying to understand it correctly and to get it as best as possible. :)

@zzzprojects
Copy link
Collaborator

zzzprojects commented Dec 13, 2016

I wish asynchronous was easier to understand and explain.

I will re-read your post later today and make sure before answering you.

Meanwhile, here is a small example why I did it this way. I believe Async method (from some Microsoft method, not your) have so much overhead that's not worth using them and Task.Run give better performance.

Here is the result from the code below when using with VARCHAR(MAX) on 6000 entities
-ToList(): 97ms
-ToListAsync(): 1064ms
-Task.Run + ToList(): 88ms

ToList and Task.Run take essentially the same time. It's random which one will be the faster!

Here is the result from the code below (small modification) when using with VARCHAR(50) and 50 characters on 6000 entities

  • ToList(): 28ms
  • ToListAsync(): 45ms
  • Task.Run + ToList(): 17ms

I may have done a mistake, but it seems ToListAsync is already the worse method to use and become worse as the packet size growth.

The problem is NVARCHAR(MAX) is used by default for Code First user, and generally a table contains multiples columns, not only an Id and an NVARCHAR columns.

I will make more test but it seems the Task.Run overhead is way better/faster than using ToListAsync

using System;
using System.Collections.Generic;
using System.Data.Entity;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace Z.EntityFramework.Plus.Lab
{
    public partial class Form_Request_Async : Form
    {
        public Form_Request_Async()
        {
            InitializeComponent();
            button1_Click(null, null);
        }

        private void button1_Click(object sender, EventArgs e)
        {
            var clock1 = new Stopwatch();
            var clock2 = new Stopwatch();
            var clock3 = new Stopwatch();

            // CLEAN
            using (var ctx = new EntityContext())
            {
                ctx.EntitySimples.RemoveRange(ctx.EntitySimples);
                ctx.BulkSaveChanges();
            }

            // SEED
            using (var ctx = new EntityContext())
            {
                var list = new List<EntitySimple>();

                for (var i = 0; i < 6000; i++)
                {
                    // list.Add(new EntitySimple() {Test = new string('z', 50) }); // Async method take twice the time even with a varchar(50) columns
                    list.Add(new EntitySimple {Test = new string('z', i)}); // Async method take more than ten the times of non async method or Task.Run
                }

                ctx.EntitySimples.AddRange(list);
                ctx.BulkSaveChanges();
            }

            // TEST
            using (var ctx = new EntityContext())
            {
                // JIT Compile everything...
                {
                    var list = ctx.EntitySimples.ToList();
                    var list2 = ctx.EntitySimples.ToListAsync().Result;
                    var list3 = Task.Run(() => ctx.EntitySimples.ToList()).Result;
                }

                {
                    clock1.Start();
                    var list = ctx.EntitySimples.ToList();
                    clock1.Stop();

                    clock2.Start();
                    var list2 = ctx.EntitySimples.ToListAsync().Result;
                    clock2.Stop();

                    clock3.Start();
                    var list3 = Task.Run(() => ctx.EntitySimples.ToList()).Result;
                    clock3.Stop();
                }
            }

            StringBuilder sb = new StringBuilder();

            sb.AppendLine("ToList(): " + clock1.ElapsedMilliseconds);
            sb.AppendLine("ToListAsync(): " + clock2.ElapsedMilliseconds);
            sb.AppendLine("Task.Run + ToList(): " + clock3.ElapsedMilliseconds);

            this.richTextBox1.Text = sb.ToString();
        }


        public class EntityContext : DbContext
        {
            public EntityContext() : base("CodeFirstEntities")
            {
            }

            public DbSet<EntitySimple> EntitySimples { get; set; }


            protected override void OnModelCreating(DbModelBuilder modelBuilder)
            {
                modelBuilder.Types().Configure(x => x.ToTable(GetType().DeclaringType != null ? GetType().DeclaringType.FullName.Replace(".", "_") + "_" + x.ClrType.Name : ""));

                base.OnModelCreating(modelBuilder);
            }
        }

        public class EntitySimple
        {
            public int Id { get; set; }
            public string Test { get; set; }
        }
    }
}

@nphmuller
Copy link
Contributor Author

ToList and Task.Run take essentially the same time. It's random which one will be the faster!

That's because Task.Run is the same as ToList. It executes the same call. Wrapping it in Task.Run has basically no additional benefits (from a backend perspective), and actually introduces some efficiency problems. (see the 2nd link)

My suggestion at this point would be this:
-Offer sync methods, which you're already doing.
-Offer async methods, if they are truly async. Or don't offer them at all.

The async methods currently are only confusing for the user, because they imply it's truly async, while it's actually not. Leave the choice for picking the more efficient way (sync vs async) to the user, because there are cases where the async way has some big benefits. (like the threadpool, which doesn't deplete as quick, which I mentioned previously)

These might be some interesting reads:

PS: If you're looking at (at least for me) more daily use cases, you don't usually retrieve 6000 records at once. When you're only returning a few records, the difference in execution time is (imo) negligible.

Thanks for looking into it! :)

@zzzprojects
Copy link
Collaborator

Thank you for your link.

I will for sure read them. I just need to convince myself and make sure I understand correctly the difference and why I should always use async task over Task.Run.

Btw, this may not be your case but most applications I have seen, developers I have talked, stack overflow questions I have answered often use way more than 6000 records for the caching. But hey! They were already using ToListAsync before, so they will not really be penalized anyway

Well, I guess I have some reading to do now ;)

@nphmuller
Copy link
Contributor Author

I meant retrieving 6000 records at once, btw. Not caching 6000 records (which should be a lot more common). But you're right. I shouldn't assume my use case is the main use case. ;)
Looking forwards to your conclusions!

@zzzprojects
Copy link
Collaborator

I have read a lot about async vs. Task.Run, I still want to read some more, but so far, I agree with you.

I believe Task.Run is sometimes fine when too much complexity is involved. By example, Entity Framework Extensions also use Task.Run for BulkInsertAsync. Making all the code truly async is way too more complex. It works, but not optimized for the thread pool.

Even more, using truly async method often introduce some UI deadlock because one method with await miss the call to ConfigureAwait(false).

However, for simple functionality, like EF+ Query Cache, I agree with you. If a truly async method is already coded and the code will stay simple, better to use it!

That still bug me:

If I use ToListAsync method, the thread pool will be better managed. But some query may take 10x the times to execute... So have we really improved or decreased the thread management since a thread will take more time to be completed?


Async method can be faster than sync method

http://www.thereformedprogrammer.net/analysis-of-entity-framework-6-async-performance/

Be careful when reading posts with benchmark test. Like you know, the code often contains many major mistakes.

Using the same code as him, I can prove that my:

  • sync method is faster than my sync method…
  • async method is faster than my async method…

Only by calling the same method twice! The test didn’t take into consideration JIT Compilation, QueryCacheManager and other optimization done by Entity Framework. So obviously, sync method called first was penalized versus async method called last!

At the end, it makes no senses an async method go faster than the sync method. Unless the sync method is poorly coded since async method can only add some overhead!

With the same code as him, calling the same method twice I dramatically "improved" the performance:

EF, with 10 in database -----------------------
Ran List all, Ef Direct: total time = 18 ms (1.8 ms per action)
Ran List all, Ef Dto: total time = 188 ms (18.8 ms per action)
Ran Create, Ef Direct: total time = 25 ms (2.5 ms per action)
Ran Update, Ef Direct: total time = 124 ms (12.4 ms per action)
Ran Delete, Ef Direct: total time = 23 ms (2.3 ms per action)

EF, with 10 in database -----------------------
Ran List all, Ef Direct: total time = 2 ms (0.2 ms per action)
Ran List all, Ef Dto: total time = 10 ms (1.0 ms per action)
Ran Create, Ef Direct: total time = 19 ms (1.9 ms per action)
Ran Update, Ef Direct: total time = 28 ms (2.8 ms per action)
Ran Delete, Ef Direct: total time = 15 ms (1.5 ms per action)

@zzzprojects
Copy link
Collaborator

zzzprojects commented Dec 18, 2016

Hello @nphmuller ,

As you suggested, we modified the code for EF6 and EFCore to use async method instead. EF5 doesn't have async method, so we keep using Task.Run

The v1.4.17 has been released.

We will try to check other features as well when time permit.

Let us know if we can close this issue.

Best Regards,

Jonathan

@nphmuller
Copy link
Contributor Author

Yeah, you can close it.
Thanks for the effort! ;)

@zzzprojects
Copy link
Collaborator

Closing Comment: Confirmed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant