Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential performance decrease in Safari #82

Closed
digitalmoksha opened this issue Oct 2, 2015 · 15 comments
Closed

Exponential performance decrease in Safari #82

digitalmoksha opened this issue Oct 2, 2015 · 15 comments
Assignees
Labels
remark 🦋 type/enhancement This is great to have
Milestone

Comments

@digitalmoksha
Copy link

I'm using mdast (and mdast-ranges) to generate positional information in order to do syntax highlighting. In general it works really well.

However, as the file gets larger, both processing time and memory consumption seems to grow significantly, almost exponentially.

For example, if I take the text from the latest OS X developer release notes (https://developer.apple.com/library/mac/releasenotes/AppKit/RN-AppKit/) and run it through Markify to get Markdown (http://heckyesmarkdown.com), I can take that and run it through mdast.parse to get the ast. That take 1.3 secs.

If I paste another copy of the source, it takes 4.63 secs. Doing that again (so that there are now 3 copies of the original text), it jumps to 10.5 secs. I start seeing the same exponential increase in memory usage (on OS X).

Do you have any ideas? I thought about creating a plugin that converts into a minimal ast - with the absolute position information, you only need the leaf nodes. But I'm not sure if this would actually reduce the computation time or not.

@wooorm
Copy link
Member

wooorm commented Oct 2, 2015

Oh interesting! No I have no idea, I’d be interesting in knowing why though: I can imagine there being several points where mdast could be optimised.

If you’d like to check out some of this, you could be able to use your dev-tools’s timeline recorder (that’s what it’s called in Safari at least) to do some tests on http://mdast.js.org/demo.

@wooorm wooorm changed the title Syntax highlighting and building ast for large files... Exponential performance decrease Oct 2, 2015
@digitalmoksha
Copy link
Author

It's the strangest thing. In Safari (on the demo page), pasting my text in there, building the ast takes over 30 secs, pegs a cpu core, and uses over 1.6gb of memory.

Both Chrome and Firefox complete in under 1s.

Profiling on Safari shows time spent mostly in later() in debounce(). But doesn't look it has anything to do with the parsing.

@wooorm
Copy link
Member

wooorm commented Oct 3, 2015

Weird, Safari is indeed hugely bottlenecked; in fact, I don’t even see an exponential performance decrease in Chrome.

@digitalmoksha
Copy link
Author

Any ideas where it might be hanging up in Safari?

@wooorm
Copy link
Member

wooorm commented Oct 3, 2015

Nope, currently don’t have the time to look deeply into it 😦

@wooorm wooorm changed the title Exponential performance decrease Exponential performance decrease in Safari Oct 3, 2015
@digitalmoksha
Copy link
Author

So I've done some investigation on this. It looks like the slow down is from the regular expressions. Profiling in Safari, I found that the majority of time is in the RexExp.exec() call. From a 14sec run, 13.6 were spent in exec().

I took the gfm table regex, and put it into jsperf. That is showing the significant difference between Safari and Chrome.

I found one possible explanation (and a reported bug in webkit):

http://stackoverflow.com/questions/19680266/firefox-bad-regex-performance

https://bugs.webkit.org/show_bug.cgi?id=122891

Not being a regex ninja, I don't know if there is anyway to optimize or fix this. Unfortunately, I'm using mdast in a shipping OS X product (from a webview), and this issue is starting to cause me a lot of problems.

wooorm added a commit that referenced this issue Oct 6, 2015
@wooorm
Copy link
Member

wooorm commented Oct 6, 2015

I removed the variable capturing groups ()* and )+), but these didn’t occur in the tables regex.
I still see a large time in exec, which isn’t weird per sé as it’s the tokeniser. The exponential performance decrease in just Safari is however still very weird and I still have no clue why that is :/

@digitalmoksha
Copy link
Author

Cool, I'll check it out and see if I notice any performance difference in general.

btw, I forgot to mention in my previous comment that I created a JSPerf that illustrated the performance issue.

http://jsperf.com/complex-regexp-exec-between-safari-chrome

@digitalmoksha
Copy link
Author

I tested with a 1.x version, 2.0, and the current master, with Safari 9.0.1. For the data set I used, they all took 50s and 1.7gb (except master, which took 49secs). Chrome, nearly instantaneous. Ugh.

If you're really able to rewrite not using regex, as you mentioned in #42, it may very well solve this issue. But I can't imagine that would be trivial.

@wooorm
Copy link
Member

wooorm commented Oct 6, 2015

Thanks for the testing. Yes, that’s what I’m intending to do, and no, it won’t be easy (or fast) 😦

@wooorm
Copy link
Member

wooorm commented Dec 14, 2015

Thanks for waiting, @digitalmoksha. I did a lot of work recently on the parser, completely rewrote it. Could you try if you’re still seeing performance issues (especially on Safari?).

You can download the alpha version of the 3.0.0 release from GH, and npm: npm install mdast@next.

@wooorm wooorm self-assigned this Dec 14, 2015
@wooorm wooorm added this to the 3.0.0 milestone Dec 14, 2015
@wooorm
Copy link
Member

wooorm commented Dec 18, 2015

@digitalmoksha Ping!

@digitalmoksha
Copy link
Author

@wooorm Oh very cool! I'll try to test it out today or tomorrow. It's a busy time, but should be able to carve out some time over next couple of days.

@digitalmoksha
Copy link
Author

@wooorm , wow, night and day! It's blazingly fast now in Safari. Even a really large markdown file, it's incredibly fast, almost instantaneous. Fantastic job.

I do notice that memory seems to keep increasing after each run in the browser, like there is a leak or something. Like the entire ast is leaking.

And when incorporating the minified version in my OS X app, I get a "SyntaxError: Unexpected keyword 'var'. Parse error." when it's parsing the js file. non-minified doesn't have a problem, and I don't see this problem at all when using from a web page.

But in general, fantastic. Makes a huge difference.

Thanks!!

@wooorm
Copy link
Member

wooorm commented Dec 22, 2015

Cool, great to hear about the improvements! 😄

Weird about the memory leak, what code did you use to benchmark?

Also, regarding the SyntaxError, no idea either, hard to test, maybe that you didn’t specify utf-8 encoding? If that’s not the case, I must see the code to be able to help you debug.

Again, thanks so much for bringing this to light, and checking whether it was fixed! 👌

@wooorm wooorm closed this as completed in 25a26f2 Dec 24, 2015
@wooorm wooorm added 🦋 type/enhancement This is great to have remark labels Jan 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
remark 🦋 type/enhancement This is great to have
Development

No branches or pull requests

2 participants