Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static compile part 6: automated opt-in compilation #12462

Closed
stevengj opened this issue Aug 5, 2015 · 98 comments
Closed

static compile part 6: automated opt-in compilation #12462

stevengj opened this issue Aug 5, 2015 · 98 comments
Labels
compiler:precompilation Precompilation of modules needs decision A decision on this change is needed

Comments

@stevengj
Copy link
Member

stevengj commented Aug 5, 2015

Given automated recompilation (#12259, hopefully coming soon via #12458), it would be good to eliminate the manual call to Base.compile for most users, which is both awkward and dangerous (since users won't know whether a module is compilable). I think we want a way for a module author to mark it as compilable so that using Foo will automatically call Base.compile if it has not been compiled yet.

It will be easy to implement this, once we decide on a syntax. Some options:

  • a magic comment (e.g. #@compilable #pragma compilable) at the top of the file. Pro: backwards compatible with Julia 0.3, easy to check before parse and eval of the file, is per-file like compilation, simple and robust to implement (add "#pragma compile [true]" for opt-in to automatic compilation #12475). Con: not very Julian.
  • @compilable module Foo ... end: Pro: more Julian, can be checked after parsing but before eval. Con: hard to make backwards-compatible (a major problem), requires parsing (non-compiled files will be parsed twice every time they are included!), and per-module rather than per file.
  • @compilable statement inside the module. Pro: can be made backward-compatible via Compat. Con: must eval the module to execute it, which means that if it is compiled too then it gets evalled twice, and the compiled version is not the one that is loaded (since it has already been imported), and is per-module whereas compilation is per-file.

Updates:

  • via Pkg: e.g. a COMPILABLE file in the package directory would cause the package to be compiled when it is added (automated recompilation could handle the rest). Pro: backward compatible, no parse/eval. Con: auto-compilation only available for registered packages, does not allow compilation to be disabled for a module unless Base.compile hooks into Pkg.
  • @compilable; module Foo ... end: as @compilable module but per file, otherwise same pros and cons.

Any other options? I have to admit that I lean toward the magic comment, since avoiding parse/eval is attractive enough to me to compensate for the slight syntactical ugliness.

@stevengj stevengj added needs decision A decision on this change is needed compiler:precompilation Precompilation of modules labels Aug 5, 2015
@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 5, 2015

a magic comment (e.g. #@compilable) at the top of the file. Pro: backwards compatible with Julia 0.3, easy to check before parse and eval of the file. Con: not very Julian.

#pragma compilable?

@compilable module Foo ... end: Pro: more Julian, can be checked after parsing but before eval. Con: not backwards-compatible (a major problem), requires parsing.

i did this for some time in the original #8745 PR. (at the time, I was also passing the parsed version directly to create_expr_cache, instead of passing a filepath). I dropped it after adding support in the deserializer to handle an arbitrary number of modules in one .ji file.

Any other options?

another option is to couple this step to the Pkg manager, so that Pkg.add / Pkg.update / Pkg.checkout / etc will manage some metadata for whether to call Base.compile. Arguably, this would make Pkg feel much slower, but make Julia feel much faster (since the next logical command after Pkg.add is using).
-- this assumes automated recompilation (#12259, hopefully coming soon via #12458) will take over from there so that Pkg is not integrally coupled to Base.require.

@wildart
Copy link
Member

wildart commented Aug 5, 2015

As for package manager, it should not be a problem to put a compilation step after package update, somewhere here.

@stevengj
Copy link
Member Author

stevengj commented Aug 5, 2015

Magic comments (I like #pragma compilable) have the advantage of working just as well for local modules as for Pkg ones, although I suppose you could always call Base.compile manually for the former.

Yes, a COMPILE file in the package directory sounds like the best alternative to a magic comment. Pkg would only need to call Base.compile on add and when the COMPILE file changes, not on every Pkg.update(), since automated recompilation would handle the rest.

@wildart
Copy link
Member

wildart commented Aug 5, 2015

Or extend a package description, #11955. I would see it as a part of requirements description in REQUIRE file rather then a separate empty file. You would need to touch COMPILE every time when a new version tag is created, but with an explicit compilation requirement there is no need to timestamp check. Package manager will know better when to compile on a package version update.

@stevengj
Copy link
Member Author

stevengj commented Aug 5, 2015

@wildart, I disagree with the notion of using the package manager to manage _re_compilation; I think it would be brittle and inflexible (e.g. it wouldn't recompile when a developer is hacking on the package, or when you install a new Julia version). But if you want to talk about recompilation strategy, please go to #12259 ... this issue is about opting in for the initial compilation.

Obviously, if the structure of the package metadata changes, where you would put the compilation opt-in would change. But since I doubt #11955 will be resolved in time for 0.4, I would prefer not to rely on it here.

@IainNZ
Copy link
Member

IainNZ commented Aug 5, 2015

#11955 really would be great, but is definitely not 0.4

@stevengj, is there any reason you would not want to compile? Could Pkg.add not simply try to compile every package? Is there a scenario where a package is seemingly compiled, but compilation was dangerous had has negative side effects?

@wildart
Copy link
Member

wildart commented Aug 5, 2015

I would think of opt-out option rather then opt-in. Disabling static compilation for modules or functions looks more reasonable then any opt-in options. Don't we want to compile everything? Wouldn't a faster code loading improve user experience?
@stevengj As you try avoid manual compilation, you should consider involving the package manager. For majority of users, the best time to do compilation is during a package installation and later on a package update. I'm less concerned about developers because they can always recompile code or #12458.

@timholy
Copy link
Sponsor Member

timholy commented Aug 5, 2015

There are definitely packages that won't work if compiled. http://docs.julialang.org/en/latest/manual/modules/#module-initialization-and-precompilation

@timholy
Copy link
Sponsor Member

timholy commented Aug 5, 2015

I confess I kind of like the magic comment, too. As someone who has ~100 "packages" not managed by the package manager (i.e., lab-specific code), I'm a little less excited about coupling this to the package manager (but will cope if this is the outcome).

@rened
Copy link
Member

rened commented Aug 5, 2015

+1 for not tying this to the package manager. also +1 for making this opt-out: in my work on SystemImageBuilder.jl I found only few packages (of the 60+ I use) that can't be precompiled. My suggestion would be:

module MyModule
  #pragma nocompile
...

In a concerted effort we can quickly add this to the packages which can't be precompiled (yet), and it adds an incentive to actually make those packages precompilable.

@bicycle1885
Copy link
Member

I think most packages are compilable, so the opt-out pragma would be better.
But I love a @nocompile macro rather than any pragma because it's more Julian way.
Why not have both? If a package author decides to support only v0.4 or later, the @nocompile macro is a better choice.

@stevengj
Copy link
Member Author

stevengj commented Aug 5, 2015

Julia is a safe language by default, and making this opt-out would violate that contract and totally change the spirit of the language. Compilation has to be for people who know what they are doing; otherwise, this is a deathtrap for new developers.

@stevengj
Copy link
Member Author

stevengj commented Aug 5, 2015

@bicycle1885, the fact that a macro requires parsing is a big disadvantage.

This is a classic use for a pragma directive, because it is effectively a compiler option rather than part of the language per se.

@stevengj
Copy link
Member Author

stevengj commented Aug 5, 2015

(A long opt-in vs. opt-out discussion already took place in #8745.)

@IainNZ
Copy link
Member

IainNZ commented Aug 5, 2015

Ah OK. Comment works for me.

@bicycle1885
Copy link
Member

I see. Safe-default principle makes sense to me.

I think the pragma should take an argument (for example, yes/no) to make author's intuition explicit in both cases:

#pragma compile yes
#pragma compile no

@wildart
Copy link
Member

wildart commented Aug 5, 2015

Unfortunately, I missed first "opt-in vs opt-out" but I have a feeling that as soon as 0.4 will be released every package developer will put @compile macro or pragma in its package. I disagree that compilation should be a tool of skillful devs or somehow it isn't safe to use. If person is able to create a package then definitely one is able to follow static compilation guidelines. I see no problem in forcing a package compilation. So I strongly encourage to develop an opt-out option.

stevengj added a commit to stevengj/julia that referenced this issue Aug 5, 2015
stevengj added a commit to stevengj/julia that referenced this issue Aug 5, 2015
@tkelman
Copy link
Contributor

tkelman commented Aug 5, 2015

Eww, -1 to significant comments. Comments are for humans, and get thrown away pretty early in the Julia parsing-to-execution pipeline.

stevengj added a commit to stevengj/julia that referenced this issue Aug 5, 2015
@stevengj
Copy link
Member Author

stevengj commented Aug 5, 2015

@tkelman, the whole point of a pragma is that it occurs before the parsing-to-execution pipeline, and it is not part of the language per se — it is a compiler hint. Also, all the alternatives seem worse.

@tkelman
Copy link
Contributor

tkelman commented Aug 5, 2015

So far most of the ways of interacting with the Julia compiler have been part of the language - it feels wrong to suddenly buck that trend. And pretty much all Julia code, with the exception of comments, has usually been treated equally whether it came from a program or a file.

@compilable module can be worked-around in a backwards-compatible, though ugly way via conditional inclusion. This would be a poster case for #7449 - actual pragmas as part of the language, not magic comments.

@stevengj
Copy link
Member Author

stevengj commented Aug 5, 2015

@compilable module requires reading the entire file and invoking the parser (as opposed to just reading the first few lines), and backward compatibility would involve extremely ugly code in every single module that is going to be compiled (a lot of them) for as long as 0.3 is supported.

(Basically this means that every non-compilable module would get parsed twice every single time it is loaded: once to check for @compilable, and then again during include.)

Also, see @vtjnash's comment above: the compiler is technically per-file (it compiles every module in a file), not per-module, so @compilable module implies the wrong semantics.

@quinnj
Copy link
Member

quinnj commented Aug 5, 2015

I think we either need to do #7449 or rely on Compat here. If we want #7449 to look like #pragma [option], then that's fine, but I feel like I'd prefer the original #7449 syntax using %if pragma. I also don't think it's the end of the world to rely on Compat here and deal with the double parsing.

If we're also planning on a much shorter 0.5 release, then maybe it won't be as bad either.

@StefanKarpinski
Copy link
Sponsor Member

Would it make sense to put compilation opt-in into METADATA?

@stevengj
Copy link
Member Author

stevengj commented Aug 5, 2015

@quinnj, the problem with relying on Compat is that Compat is included inside the module, and here we need something outside. See also the problems with double-parsing, and per-module rather than per-file semantics.

@JeffBezanson
Copy link
Sponsor Member

I don't think compilability is significant enough to warrant a different file extension. Unless perhaps this veers off into, say, a statically-typed variant of the language.

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

@ScottPJones, there is still a problem with the short-circuiting @compilable macro, now that I think of it. Suppose you have a non-compilable module Foo that includes a compilable module Bar. Then the evaluation of Foo could be 90% done before it gets to the import Bar statement and barfs. I guess you could avoid this by disabling the short-circuiting except at the top-level eval. But I'm worried that this strategy is getting more and more complicated to implement (hence more fragile) the more I think about it.

@ScottPJones
Copy link
Contributor

Why would it barf at the import Bar? (sorry if it is a stupid question, or one from lack of sleep!)

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

@ScottPJones, because import Bar includes Bar.jl (if Bar hasn't been compiled yet) and Bar.jl includes your @compilable macro which short-circuits parse/eval.

@ScottPJones
Copy link
Contributor

but wouldn't it only short-circuit parse/eval of Bar.jl (and only if a cached version of Bar were found, and the timestamp/hash/whatever matched)?

@ScottPJones
Copy link
Contributor

Another thing, can the Julia compiler always correctly determine if something is "compilable" or not?
(shouldn't that really be "cacheable" also)?
If so, why doesn't it always write out a .ji file which is just a short special header, that indicates the file is not "compilable", with the same timestamp/hash/etc. info, so that if the file later becomes "compilable",
it can be detected and cached.

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

@ScottPJones, no, because it can't finish evaluating Foo if a module it depends on barfs on import.

And no, it can't determine automatically whether a module is (safely) compilable. (It can compile anything, but modules that don't follow certain rules will get compiled to incorrect code.) That's why it has to be a human opt-in.

@timholy
Copy link
Sponsor Member

timholy commented Aug 6, 2015

@timholy, given the number of packages you work with, do you have a sense of whether such functionality would be useful or whether the single-module .ji file is more aligned with your usage?

Somewhere in between, but single-module is a bit better. There's still quite a lot of mix-and-match going on.

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

@tkelman, include_string("@pragma compilable") looks like Julia code, but it isn't if you don't actually eval it when you are deciding whether to compile a module. (e.g. it can't be hidden inside another include). And I'm having a hard time believing that

if VERSION >= xxx
      include_string("@pragma compilable")
end

plus magic not-quite-julia parse-without-eval interpretation at require time, or relatively complicated short-circuiting semantics, is better than #pragma compile.

@timholy
Copy link
Sponsor Member

timholy commented Aug 6, 2015

Maybe the syntax should just be a bit further separated from normal comments, like #%pragma or #pragma. You always need more underscores.

We haven't done a sufficiently good job of emulating bash's outstanding syntax choices. So I'm all in favor of #! /bin/compile_this_file

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

#!@#$pragma!@#$ __CoMpILe__. Because #pragma is just too beautiful to use.

@ScottPJones
Copy link
Contributor

@stevengj I guess I just don't understand.
I would think that import would:

  1. Look and see if there is a .ji file for Bar
  2. If so, see if it's timestamp/hash/whatever match.
    a. If match, see if it was cacheable.
    I. If cacheable, load it and return.
    II. Otherwise, parse/evaluate it and return
    b. Otherwise, act as if no .ji file found, maybe print "recompiling Bar message" if at REPL.
  3. If no .ji file, parse/evaluate the file, see if it is cacheable, and save the information as a .ji file.

I also think that compilers are much better than humans usually in determining if code is "following the rules". Where are all the rules that need to be followed?
I'm not sure I'd trust a system that depended on people getting something like that right or else silently generated incorrect code.
If you did as I've outlined above, people could add opt-out macros or function calls at the beginning of their modules if they were sure that it was not safe to cache, and that would simply let the compiler avoid any extra overhead (is there any?) of attempting to see if the module were cacheable or not.

stevengj added a commit to stevengj/julia that referenced this issue Aug 6, 2015
@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

@ScottPJones, the problem is the "parse/evaluate the file, see if it is cacheable" step. That's what we're discussing here. A long discussion of why compilability can't be determined automatically already occurred in #8745, and the rules are described in the precompiling section of the manual.

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

Tomorrow, I'll try to put together an alternative PR based on @ScottPJones's short-circuit @compile true per-file macro suggestion. I think it can be made to work without major functional shortcomings, but I need to implement it to be sure that there aren't any gotchas. Then we can compare how it turns out with #pragma compile.

@ScottPJones
Copy link
Contributor

OK, so you can still do as I've outlined above, but instead of being able to determine cacheability automatically (I had missed all of the old conversation, I'll have to read it tomorrow), it could just use
the @Compilable macro (wrapped in if VERSION ... if you need 0.3 compatibility) to decide.

What other problems do you see with that technique? It avoids looking at the source code, even to do a regex, if the .ji file is present and valid, if it the .ji file says that it is not cacheable, it just does what it has always done, i.e. parse/eval the file.

@ScottPJones
Copy link
Contributor

That would be great, thanks for having patience with a sleep deprived old PITA!

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

@ScottPJones, it's just that the require code is more complicated than you think (it was more complicated than I expected, at least) once you take into account nested and parallel imports; there are subtleties.

@ScottPJones
Copy link
Contributor

OK, I hadn't looked at the require code yet, and I still have to wrap my mind around the way only a single "master" node does all the compilation (if I understand that correctly). Thanks!

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 6, 2015

i think the key is that compilation of a module always occurs in a separate process, and never as a side effect of running code in the local instance

I added a number of examples of failure cases to the end of the Modules documentation:
http://docs.julialang.org/en/latest/manual/modules/#module-initialization-and-precompilation [Other known potential failure scenarios include]

Note that Ptr objects turn into C_NULL when they hit the serializer (in most contexts), so you can use this to detect whether it has been initialized (or get a friendly SEGV instead of runtime memory corruption if you blindly attempt to use it).

Of those, I expect that #12010 may be the most serious and limiting. The others hopefully are fairly benign and expected. There are a few classes of failure scenarios that are pretty easy for the compiler to detect and complain about (and so it already does). Most of the others are broken only by determining the author's actual intent. There are many valid reasons to alter global state while compiling a module (reading files, opening libraries, creating counters, and manipulating shared structures like LOAD_PATH and ARGS), which makes it impossible for the compiler to guess at the user's intent.

For example, given the following script, what should the runtime value of baseversion give? (assume that Base.VERSION isn't defined as a constant for the purposes of this example)

module Z
  baseversion = Base.VERSION
end

There are two choices:

  1. the runtime value of Base.VERSION (create a reference)
  2. the compile time value of Base.VERSION (create a copy) – this is what it actually does
    The problem (for the compiler), is that it isn't exactly obvious which one you intended, since there are valid use cases for both. Indeed, in the Base.GMP module, there is a test to ensure that the compile-time and runtime versions of the dynamic library are the same (so you want both!).

parsing the file is a pretty negligible cost, but if you want to avoid doing it twice, but still wanted to have the benefits of parsed code, one option is to note that Base.compile is calling a function called Base.create_expr_cache, not create_cache_for_include_file. This is because it is actually taking a series of serialized expression objects, not a file, and compiling the result. One of the expressions just happens to be an include statement:

serialize(io, :(Base.include($(abspath(input)))))

In past versions of the code, this function actually just took an Expr (although, to be fair, in the past versions of the code, it only accepted a Expr(:module))

@stevengj
Copy link
Member Author

stevengj commented Aug 6, 2015

Okay, no macro required, and no magic non-Julia syntax. Just compilethis() at the top of the file, or VERSION >= xxx && compilethis() if you want backward compatibility. Thanks to @ScottPJones for pointing out that this short-circuits the parser. See #12491.

@ScottPJones
Copy link
Contributor

Cool! I hope this satisfies @tkelman's objections (it does mine). Great stuff no matter what the syntax, Steve!

@carnaval
Copy link
Contributor

carnaval commented Aug 6, 2015

@vtjnash It is true that the compile/run time distinction is unclear right now. To be honest, I feel that for the next release we should aim at making it more explicit. I don't think the goal should be to try to transparently cache module and pretend that using X is going to work the same, exactly for the reason you highlighted.

Instead, maybe we could have a compile_time block inside modules, so by default everything is defered to runtime (=> __init__) and you cache an empty module. This is backward compatible. Then you can selectively add some code to the compile time like method definitions and some const things/precomputation.

Something like that ?

module Foo
const runtime_v = Base.VERSION
cached
   const compiled_v = Base.VERSION
end
end

I didn't think this through but I believe my general point stands : we should not be scared of changing module semantic to make this feature first class, instead of this "dangerous but everyone will still use it because load times are unbearable".

As for rehashing, it may be hairy to make it work but maybe we should use the julia serializer at least for user types to let them handle this ?

stevengj added a commit to stevengj/julia that referenced this issue Aug 6, 2015
@tkelman
Copy link
Contributor

tkelman commented Aug 17, 2015

(another disadvantage of the #pragma syntax that I didn't predict in advance is it would have been more complicated to enable or disable precompilation selectively depending on platform, as in JuliaImages/Images.jl#338)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests