Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow synchronization with Mailkit for yahoo, hotmail #650

Closed
ekalchev opened this issue Feb 23, 2018 · 18 comments
Closed

Slow synchronization with Mailkit for yahoo, hotmail #650

ekalchev opened this issue Feb 23, 2018 · 18 comments
Labels
enhancement New feature or request

Comments

@ekalchev
Copy link
Contributor

ekalchev commented Feb 23, 2018

We notice significant performance drop when using Mailkit IMAP, compared to Outlook or other email clients. I believe it is related to latency between issuing imap command and actual response from the server for some servers (yahoo, hotmail). For example yahoo imap - there is 1-3 sec delay between executing the command and getting back the response(tag FETCH 1 (BODY[])). We are trying to reduce amount of imap commands to speed up synchronization. Currently we use
GetStream method to download full messages and we execute this method for each message in the mail box. For mailbox with 200 emails that emit 200 imap fetch commands which stack up to 200-400sec delay only for getting response from the server(not accounting the actual data stream).

I can see commercial email clients like Outlook can achieve 10-20 times faster synchronization (when they are configured to download full messages (not only message summaries). I believe they download message bodies using batch like
tag FETCH 1:20 (BODY[])

Some performance figures for yahoo imap using console commands

tag FETCH 1 (BODY[]) - 6 sec to complete
tag FETCH 1:2 (BODY[]) - 7 sec to complete
tag FETCH 1:20 (BODY[]) - 11 sec to complete
tag FETCH 1:50 (BODY[]) - 25 sec to complete

It is clear it is much faster to download bodies on batch but I can't find a way to do this with Mailkit. I tripple checked the mailkit metods and can't find overload that allows me to do that. I would like to know how hard is to implement such overload myself or any advice how to speed up the synchronization with high latency servers?

@ekalchev ekalchev changed the title Slow synchronization with Mailkit Slow synchronization with Mailkit for yahoo, hotmail Feb 23, 2018
@jstedfast
Copy link
Owner

What makes you think they are downloading full messages? You don't need to download all of the messages to show a message-list in an email client.

Use the Fetch() method to batch request the summary information.

@ekalchev
Copy link
Contributor Author

ekalchev commented Feb 23, 2018

We are implementing a mode where the user can choose to download full messages not only the summaries. Most of the email clients have this option. We already have synchronization that download message bodies on request (when the user try to open the actual message)

For outlook it is easy to see that the full message is downloaded by disconnecting from internet and check if you have access to the attachments and email text. Almost all email clients have this option.

@jstedfast
Copy link
Owner

They download using multiple connections, not just 1.

The problem with offering an API to batch download infinite full messages is, well, you don't have infinite memory and most users of my library do not understand this.

I've had complaints from users who download 1 message at a time, adding them to a List<MimeMessage> and complain that they eventually get an OutOfMemoryException.

I'm sure you can understand my hesitation with implementing this feature :)

@ekalchev
Copy link
Contributor Author

Multiple connections was my first guess but I looked at the Outlook connections with TCPView and it was clear there was only one imap connection open. I believe i'll manage to add the functionality I need by myself.

@jstedfast
Copy link
Owner

it should be trivial to do. I modified the internal code to make this easier a few weeks ago.

@ekalchev
Copy link
Contributor Author

This seems to be working ekalchev@af7cd36
I'll be happy if you take a look. It is POC so the interface is bad but I need to know if I am on the right path and I am not missing something.

I am getting 20-50 times faster synchronization speed with yahoo when downloading full messages.

@jstedfast
Copy link
Owner

Yep, looks like your code is functionally correct.

I thought about this a bit last night and realized that returning an IList<MimeMessage> would be terrible because there'd be no way to know which message mapped to which UID and/or index which is pretty important for IMAP clients to be able to do (unless the "client" is just using IMAP like a POP3 server).

I also got to thinking that it may be a good idea, when requesting a list of messages by index, that you also provide the caller with the UID for each of those messages since the client will most likely want to cache these messages by the UID (useless to cache by index and somewhat racey if you use the index to do a UID lookup based on a previously cached IMessageSummary list if you aren't correctly listening to all events and/or if the API did return a list of messages rather than using a real-time callback approach).

@ekalchev
Copy link
Contributor Author

ekalchev commented Feb 24, 2018

Yeah returning IList is not a good idea. Additional to what you said that is also a lot of bytes allocated. However with streams it is different - if you don't return the streams but raise event when each stream is available the user can read the stream(save it to file or database) and dispose it. (or re-use the same memory with Microsoft.IO.RecyclableMemoryStream). This way you won't waste much memory for large number of messages and still have the flexibility fetching message parts with single FETCH command. Yeah you'll need to pair those stream with UID or index somehow.

I can give you another use case that cannot be done efficiently with Mailkit and it doesn't involve download of a full message.

image

See the message preview with red - that is part in most email clients. This preview is actually the first N characters of the TEXT body part of the message. So if I want to implement this with Mailkit I need to pull IMessageSummary on batch of 30 messages - that will be very efficient. However, to obtain the TEXT part of each message I don't have interface to pull those 30 TEXT parts as a single batch and I'll need to execute 30 commands FETCH index (BODY.PEEK[TEXT]). I need to wait those 30 fetches and with yahoo that is about ~40-60 sec. This way the experience of the user will be very slow synchronization even if I am not downloading the entire message. I understand the interface for such functionality is challenging but there are very common use cases which cannot be implemented efficiently with Mailkit . I hope in the future you can come up with something to cover this.

@jstedfast
Copy link
Owner

You probably don't want to use BODY.PEEK[TEXT] for that. I think what you actually want for that is to fetch the IMessageSummary.TextBody part's .TEXT part.

I've been trying to think of a way to do that and have been somewhat waiting for the IMAP working group to standardize a way to get this. There was talk a few years back about an IMAP extension for this but I have yet to see a draft/rfc for it.

Maybe it's time to give up waiting...

jstedfast added a commit that referenced this issue Feb 24, 2018
@jstedfast jstedfast added the enhancement New feature or request label Feb 24, 2018
@jstedfast
Copy link
Owner

I've just committed a patch to add support for batch requesting message streams using a callback approach. I think that will satisfy your needs.

About the message text blurb: how many characters do you actually need? I was thinking about adding a MessageSummaryItem that I could use to mean "get the message text blurb".

It would be a less-awkward API than providing a GetStreams() method that takes a list of uids & BodyPart specifiers.

The problem is that it means I either need a new set of Fetch() methods that now also take a "blurbLength" argument or else I hard-code it to something (with the possibility of making it overridable if you subclass ImapClient or something?).

@jstedfast
Copy link
Owner

I was thinking of just requesting 1024 bytes, but that seems overkill. Maybe 256 bytes?

@ekalchev
Copy link
Contributor Author

ekalchev commented Feb 24, 2018

Thanks for addressing this.

My opinion is that you shouldn't try to plug that inside Fetch methods. It won't be clear that Fetch method is actually doing 2 imap fetch commands.

Why don't you add a new method GetBlurbs that do that? That method could take IMessageSummary and populate it or something like that - that will solve the problem with index,uid mapping.

@jstedfast
Copy link
Owner

Code was buggy but is now fixed and I've got unit tests as well for it now.

@jstedfast
Copy link
Owner

I've just committed a way to get the "preview text" of a message using the Fetch() API's by passing in MessageSummaryItems.PreviewText.

@ekalchev
Copy link
Contributor Author

ekalchev commented Feb 27, 2018

Great! I am glad you group FETCH commands for mail summaries that share the same part specifier for the TEXT part. That will be very efficient.

My concern is the usage of UID FETCH. As you know FETCH is different from UID FETCH

https://tools.ietf.org/html/rfc3501#page-73

Note: UID FETCH, UID STORE, and UID SEARCH are different
commands from FETCH, STORE, and SEARCH. An EXPUNGE
response MAY be sent during a UID command.

Until now, ImapFolder.Fetch overload that accepts index was guaranteeing that index for the existing items in will not be changed after the call of the method. Now if you pass of MessageSummaryItems.PreviewText that will make hidden UID FETCH which can emit EXPUNGE. That will require to handle EXPUNGE which is unclear from the signature of the method and will lead to many issues with the existing code that rely that index will be preserved after Fetch(index... call

@jstedfast
Copy link
Owner

Yea, I noticed this myself yesterday. It's something that I will need to fix even w/o this new feature since untagged FETCH responses could come in the middle of existing UID FETCH commands as well.

I think what I'll have to do is add an EXPUNGE event listener to the FetchSummaryContext and connect it for the UID FETCH requests.

In some ways I wish I had designed the Fetch() methods to take a callback instead of tying them to returning a list of summaries (which I did for simplicity of use).

ImapFolder does have a MessageSummaryFetched event which could be used (altho I need to make a slight adjustment to when I emit those to make it immune to the EXPUNGE problem, but that's just 1 line of code).

@jstedfast
Copy link
Owner

Ok, I think my latest commit should deal with that scenario.

I'll write unit tests later tonight when I have some free time.

@jstedfast
Copy link
Owner

MailKit 2.0.2 has been released with these features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants