Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nim: improve build time by 1.5x #11

Merged
merged 1 commit into from
Apr 24, 2021

Conversation

timotheecour
Copy link
Contributor

@timotheecour timotheecour commented Apr 24, 2021

/cc @xflywind

note

the good news is that it's mostly the backend cgen that takes time:

clang -o /tmp/z01 generated/c/main.c
2s

XDG_CONFIG_HOME= nim c --compileonly -o:/tmp/z04x --nimcache:/tmp/c07f --checks:off --stacktrace:off --opt:none --hints:off generated/nim/main.nim
1.18s

XDG_CONFIG_HOME= nim c -o:/tmp/z05 --nimcache:/tmp/c07f --checks:off --stacktrace:off --opt:none --hints:off generated/nim/main.nim
4.1s

note 1

with --hint:cc --listcmd it shows:
clang -c -w -ferror-limit=3 -I/Users/timothee/git_clone/nim/Nim_devel/lib -I/Users/timothee/git_clone/nim/temp/compiler-benchmark/generated/nim -o /tmp/c08d/@mmain.nim.c.o /tmp/c08d/@mmain.nim.c
2.8s

note 2

#8 (comment)

I've added support for check and debug builds support. Feel free to modify and propose pull requests as you wish. compiler-benchmark currenly only tests debug build performance.

the benchmark should allow reporting not just debug builds; for eg nim enables lots of checks by default which are optimized for development speed/improved debugging, but can slow down compilation times; languages shouldn't be penalized for having more debugging checks on by default :)

note 3

--gc:arc is about 1.2x slower; the cgen contains this:

N_LIB_PRIVATE N_NIMCALL(NI64, add_int64_n77_h5_main_23116)(NI64 x) {
	NI64 result;
	NI64 T1_;
NIM_BOOL* nimErr_;
{nimErr_ = nimErrorFlag();
	result = (NI64)0;
	T1_ = (NI64)0;
	T1_ = add_int64_n77_h4_main_23113(x);
	if (NIM_UNLIKELY(*nimErr_)) goto BeforeRet_;
	result = (NI64)((NI64)(x + T1_) + IL64(47509));
	goto BeforeRet_;
	}BeforeRet_: ;
	return result;
}

which has more instructions compared to gc:refc:

N_LIB_PRIVATE N_NIMCALL(NI64, add_int64_n77_h5_main_23116)(NI64 x) {
	NI64 result;
	NI64 T1_;
{	result = (NI64)0;
	T1_ = (NI64)0;
	T1_ = add_int64_n77_h4_main_23113(x);
	result = (NI64)((NI64)(x + T1_) + IL64(47509));
	goto BeforeRet_;
	}BeforeRet_: ;
	return result;
}

note 4

codegen generates:

N_LIB_PRIVATE N_NIMCALL(int, add_cint_n68_h73_main_20620)(int x) {
	int result;
	int T1_;
{	result = (int)0;
	T1_ = (int)0;
	T1_ = add_cint_n68_h72_main_20617(x);
	result = (NI32)((NI32)(x + T1_) + ((NI32) 42695));
	goto BeforeRet_;
	}BeforeRet_: ;
	return result;
}

if it generated the following simplified code isntead:

N_LIB_PRIVATE N_NIMCALL(int, add_cint_n68_h73_main_20620)(int x) {
	int T1_ = add_cint_n68_h72_main_20617(x);
	return (NI32)((NI32)(x + T1_) + ((NI32) 42695));
}

it would bring down clang compilation time by ~1.15x, not sure whether it's worth it since in practice other factors dominate (eg nim VM, IC etc)

@ringabout
Copy link
Contributor

I can confirm the speedup.

@timotheecour timotheecour marked this pull request as ready for review April 24, 2021 07:47
@nordlow nordlow merged commit 83f7bfe into nordlow:master Apr 24, 2021
@nordlow
Copy link
Owner

nordlow commented Apr 24, 2021

Thanks

@timotheecour
Copy link
Contributor Author

timotheecour commented Apr 24, 2021

@nordlow i don't understand the numbers in the benchmark:

Nim | Build | No | 1051.5 | 613.0 [C] | 38 | 1.4.6 | nim

i can't run the benchmark locally because of https://github.com/nordlow/compiler-benchmark/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc (i'm on osx), but by doing it manally i get a 2X slowdown in terms of compilation times, not 613X, compared to C:

clang -o /tmp/z01 generated/c/main.c
2s

rm -rf /tmp/c07f # start from empty cache
XDG_CONFIG_HOME= nim c -o:/tmp/z05 --nimcache:/tmp/c07f --checks:off --stacktrace:off --opt:none --hints:off generated/nim/main.nim
4.1s

what am i missing?

@ringabout
Copy link
Contributor

yeah on my PC

C++

| Lang-uage | Temp-lated | Check Time [us/func] | Build Time [us/func] | Run Time [us/func] | RSS Mem Max Usage | Exec Version | Exec Path | 
| :-------: | ---------- | :------------------: | :------------------: | :----------------: | :---------------: | :----------: | :-------: | 
| C++       | No         |  133.8 (  1.8 C++)   |    N/A               |    N/A             |       62,517,248  | 9.3.0        | `g++`     | 
| C++       | No         |   79.4 (  1.0 C++)   |    N/A               |    N/A             |       60,014,592  | 9.3.0        | `g++-9`   | 
| C++       | No         |  125.5 (  1.7 C++)   |    N/A               |    N/A             |       58,777,600  | 10.2.0       | `g++-10`  | 
| C++       | No         |   75.8 (  1.0 C++)   |    N/A               |    N/A             |       63,897,600  | 10.0.0-4ubuntu1 | `clang++-10` | 
| C++       | Yes        |  156.6 (  2.1 C++)   |    N/A               |    N/A             |       93,147,136  | 9.3.0        | `g++`     | 
| C++       | Yes        |  118.8 (  1.6 C++)   |    N/A               |    N/A             |       93,028,352  | 9.3.0        | `g++-9`   | 
| C++       | Yes        |  117.8 (  1.6 C++)   |    N/A               |    N/A             |       91,037,696  | 10.2.0       | `g++-10`  | 
| C++       | Yes        |  134.0 (  1.8 C++)   |    N/A               |    N/A             |       83,329,024  | 10.0.0-4ubuntu1 | `clang++-10` | 
| C++       | No         |    N/A               |  860.4 (  2.1 C++)   | 296 (  1.1 C++)    |      240,738,304  | 9.3.0        | `g++`     | 
| C++       | No         |    N/A               |  838.4 (  2.0 C++)   | 356 (  1.4 C++)    |      241,700,864  | 9.3.0        | `g++-9`   | 
| C++       | No         |    N/A               |  879.9 (  2.1 C++)   | 258 (  1.0 C++)    |      241,025,024  | 10.2.0       | `g++-10`  | 
| C++       | No         |    N/A               |  414.4 (  1.0 C++)   | 2757 ( 10.7 C++)   |      183,558,144  | 10.0.0-4ubuntu1 | `clang++-10` | 
| C++       | Yes        |    N/A               |  939.4 (  2.3 C++)   | 267 (  1.0 C++)    |      280,985,600  | 9.3.0        | `g++`     | 
| C++       | Yes        |    N/A               |  943.1 (  2.3 C++)   | 281 (  1.1 C++)    |      281,518,080  | 9.3.0        | `g++-9`   | 
| C++       | Yes        |    N/A               | 1010.1 (  2.4 C++)   | 281 (  1.1 C++)    |      278,781,952  | 10.2.0       | `g++-10`  | 
| C++       | Yes        |    N/A               |  494.9 (  1.2 C++)   | 4288 ( 16.6 C++)   |      230,973,440  | 10.0.0-4ubuntu1 | `clang++-10` | 

C

| Lang-uage | Temp-lated | Check Time [us/func] | Build Time [us/func] | Run Time [us/func] | RSS Mem Max Usage | Exec Version | Exec Path | 
| :-------: | ---------- | :------------------: | :------------------: | :----------------: | :---------------: | :----------: | :-------: | 
| C         | No         |   11.0 (  1.0 C)     |    N/A               |    N/A             |        6,529,024  | 0.9.27       | `tcc`     | 
| C         | No         |   55.0 (  5.0 C)     |    N/A               |    N/A             |       45,576,192  | 9.3.0        | `gcc`     | 
| C         | No         |   28.2 (  2.6 C)     |    N/A               |    N/A             |       37,679,104  | 7.5.0        | `gcc-7`   | 
| C         | No         |   23.2 (  2.1 C)     |    N/A               |    N/A             |       42,545,152  | 9.3.0        | `gcc-9`   | 
| C         | No         |   25.9 (  2.4 C)     |    N/A               |    N/A             |       43,794,432  | 10.2.0       | `gcc-10`  | 
| C         | No         |   66.5 (  6.1 C)     |    N/A               |    N/A             |       62,603,264  | 10.0.0-4ubuntu1 | `clang-10` | 
| C         | No         |    N/A               |    7.9 (  1.0 C)     | 243 (  1.0 C)      |        9,224,192  | 0.9.27       | `tcc`     | 
| C         | No         |    N/A               |  815.7 (103.7 C)     | 252 (  1.0 C)      |      225,079,296  | 9.3.0        | `gcc`     | 
| C         | No         |    N/A               |  686.3 ( 87.2 C)     | 242 (  1.0 C)      |      216,461,312  | 7.5.0        | `gcc-7`   | 
| C         | No         |    N/A               |  767.9 ( 97.6 C)     | 243 (  1.0 C)      |      225,349,632  | 9.3.0        | `gcc-9`   | 
| C         | No         |    N/A               |  799.5 (101.6 C)     | 245 (  1.0 C)      |      223,019,008  | 10.2.0       | `gcc-10`  | 
| C         | No         |    N/A               |  348.7 ( 44.3 C)     | 2245 (  9.3 C)     |      183,029,760  | 10.0.0-4ubuntu1 | `clang-10` | 

Nim

| Lang-uage | Temp-lated | Check Time [us/func] | Build Time [us/func] | Run Time [us/func] | RSS Mem Max Usage | Exec Version | Exec Path | 
| :-------: | ---------- | :------------------: | :------------------: | :----------------: | :---------------: | :----------: | :-------: | 
| Nim       | No         |  134.9 (  1.0 Nim)   |    N/A               |    N/A             |       62,787,584  | 1.5.1        | `nim`     | 
| Nim       | No         |    N/A               | 1252.4 (  5.0 Nim)   | 361 (  1.0 Nim)    |      382,427,136  | 1.5.1        | `nim`     | 
| Nim       | Yes        |    N/A               |  252.9 (  1.0 Nim)   | 367 (  1.0 Nim)    |      115,929,088  | 1.5.1        | `nim`     | 

@ringabout
Copy link
Contributor

ringabout commented Apr 25, 2021

BTW Nim can use tcc, gcc, clang, js as backend too

@nordlow
Copy link
Owner

nordlow commented Apr 25, 2021

The Tiny C Compiler is being used as reference for C in the benchmark. Its incredibly fast.

@nordlow
Copy link
Owner

nordlow commented Apr 27, 2021

Nim now uses tcc as build backend when its found in the exe path.

@ringabout
Copy link
Contributor

Cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants