Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

platform.txt: instruct GCC to perform more aggressive optimization #7770

Merged
merged 2 commits into from
Dec 22, 2020

Conversation

jjsuwa-sys3175
Copy link
Contributor

@jjsuwa-sys3175 jjsuwa-sys3175 commented Dec 14, 2020

  • add -free -fipa-pta to GCC options (generates a bit smaller binary)

  • cosmetics

see arendst/Tasmota#9749 and arendst/Tasmota@e7cff92.

jjsuwa-sys3175 and others added 2 commits December 15, 2020 06:55
* add `-free -fipa-pta` to GCC options (generates a bit smaller binary)

* cosmetics
Copy link
Collaborator

@d-a-v d-a-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@d-a-v
Copy link
Collaborator

d-a-v commented Dec 22, 2020

gcc describes -fipa-pta option as

   -fipa-pta
      Perform interprocedural pointer analysis and interprocedural modification and
      reference analysis.  This option can cause excessive memory and compile-time
      usage on large compilation units.  It is not enabled by default at any
      optimization level.

(as @earlephilhower pointed it)

IHere's the numbers with arduino-cli and FSBrowser on my host:
Without:

Executable segment sizes:
ICACHE : 16384           - flash instruction cache 
IROM   : 327952          - code in flash         (default or ICACHE_FLASH_ATTR) 
IRAM   : 29388   / 49152 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...) 
DATA   : 1300  )         - initialized variables (global, static) in RAM/HEAP 
RODATA : 2068  ) / 81920 - constants             (global, static) in RAM/HEAP 
BSS    : 27584 )         - zeroed variables      (global, static) in RAM/HEAP 
Sketch uses 360708 bytes (34%) of program storage space. Maximum is 1044464 bytes.
Global variables use 30952 bytes (37%) of dynamic memory, leaving 50968 bytes for local variables. Maximum is 81920 bytes.
real    0m51.596s
user    1m44.004s
sys     0m9.164s

With:

Executable segment sizes:
ICACHE : 16384           - flash instruction cache 
IROM   : 327920          - code in flash         (default or ICACHE_FLASH_ATTR) 
IRAM   : 29384   / 49152 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...) 
DATA   : 1300  )         - initialized variables (global, static) in RAM/HEAP 
RODATA : 2068  ) / 81920 - constants             (global, static) in RAM/HEAP 
BSS    : 27584 )         - zeroed variables      (global, static) in RAM/HEAP 
Sketch uses 360672 bytes (34%) of program storage space. Maximum is 1044464 bytes.
Global variables use 30952 bytes (37%) of dynamic memory, leaving 50968 bytes for local variables. Maximum is 81920 bytes.
real    0m50.575s
user    1m43.626s
sys     0m9.216s

What are the benefits when compiling Tasmota @jjsuwa-sys3175 @Jason2866 ?

@jjsuwa-sys3175
Copy link
Contributor Author

What are the benefits when compiling Tasmota @jjsuwa-sys3175 @Jason2866 ?

generally improving compiled output, of course not limited to Tasmota.
(arendst/Tasmota@e7cff92 reported that it saved a few tens of bytes)

-IROM   : 327952          - code in flash         (default or ICACHE_FLASH_ATTR) 
-IRAM   : 29388   / 49152 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...) 
+IROM   : 327920          - code in flash         (default or ICACHE_FLASH_ATTR) 
+IRAM   : 29384   / 49152 - code in IRAM          (ICACHE_RAM_ATTR, ISRs...) 

it seems IROM/IRAM savings to me, don't it?

@earlephilhower
Copy link
Collaborator

32 bytes over an entire 320K app is not a worthwhile optimization IMHO, especially if the compiler might need lots more memory or run slower as mentioned by the man pages. Not everyone is building on a top-spec box.

@jjsuwa-sys3175
Copy link
Contributor Author

jjsuwa-sys3175 commented Dec 22, 2020

from GCC manpage:

This option can cause excessive memory and compile-time usage on large compilation units.

just exaggerated expression :) on my 10-years-old laptop PC, there is almost no difference. so to speak, both are slow likewise. and of course, most arduino users don't write so "large" (as GCC is called) sketches.

32 bytes over an entire 320K app is not a worthwhile optimization IMHO

i guess that smallness regarding the executable binaries is virtue, especially in embedded systems.

@earlephilhower
Copy link
Collaborator

earlephilhower commented Dec 22, 2020

i guess that smallness regarding the executable binaries is virtue, especially in embedded systems.

I tried an example that I thought might show some better results, BearSSL_Validation.

On the one hand, this has a lot of smaller functions both inside the .A and the WiFi core, so gives more chance for savings. On the other hand, it uses virtuals (both c++ and emulated C) which means that inter-procedure analysis can't be done since you don't know at build-time what code will be called.

Results are as follows:

master

IROM | 364600
IRAM | 27392
DATA | 1356
RODATA | 3844
BSS | 25640

With this patch

IROM | 364488
IRAM | 27396
DATA | 1356
RODATA | 3832
BSS | 25640

112-4 byte savings (IRAM 4 bytes bigger)

Rebuilding BearSSL .a with same settings + this patch

IROM | 364424
IRAM | 27396
DATA | 1356
RODATA | 3832
BSS | 25640

176-4 byte savings (IRAM again is 4 bytes larger w/the patch).

I'm running a 1st gen Xeon E5 system and I didn't notice any time difference in builds, FWIW.

So, @jjsuwa-sys3175 , I'll approve this and add it to BearSSL's build. We'll update the versionj when Newlib 4.0 comes in #7708.

@earlephilhower earlephilhower merged commit 8add1fd into esp8266:master Dec 22, 2020
@jjsuwa-sys3175
Copy link
Contributor Author

176-4 byte savings

nice saveings, despite without any code modification :)

(IRAM again is 4 bytes larger w/the patch).

RODATA; 3844 -> 3832 :)

@jjsuwa-sys3175 jjsuwa-sys3175 deleted the gcc-opts-free-fipa-pta branch December 22, 2020 05:17
davisonja added a commit to davisonja/Arduino that referenced this pull request Dec 28, 2020
…lash

* upstream/master: (72 commits)
  Typo error in ESP8266WiFiGeneric.h (esp8266#7797)
  lwip2: use pvPortXalloc/vPortFree and "-free -fipa-pta" (esp8266#7793)
  Use smarter cache key, cache Arduino IDE (esp8266#7791)
  Update to SdFat 2.0.2, speed SD access (esp8266#7779)
  BREAKING - Upgrade to upstream newlib 4.0.0 release (esp8266#7708)
  mock: +hexdump() from debug.cpp (esp8266#7789)
  more lwIP physical interfaces (esp8266#6680)
  Rationalize File timestamp callback (esp8266#7785)
  Update to LittleFS v2.3 (esp8266#7787)
  WiFiServerSecure: Cache SSL sessions (esp8266#7774)
  platform.txt: instruct GCC to perform more aggressive optimization (esp8266#7770)
  LEAmDNS fixes (esp8266#7786)
  Move uzlib to master branch (esp8266#7782)
  Update to latest uzlib upstream (esp8266#7776)
  EspSoftwareSerial bug fix release 6.10.1: preciseDelay() could delay() for extremely long time, if period duration was exceeded on entry. (esp8266#7771)
  Fixed OOM double count in umm_realloc. (esp8266#7768)
  Added missing check for failure on umm_push_heap calls in Esp.cpp (esp8266#7767)
  Fix: cannot build after esp8266#7060 on Win64 (esp8266#7754)
  Add the missing 'rename' method wrapper in SD library. (esp8266#7766)
  i2s: adds i2s_rxtxdrive_begin(enableRx, enableTx, driveRxClocks, driveTxClocks) (esp8266#7748)
  ...
@valeros
Copy link
Contributor

valeros commented May 18, 2021

Hi @jjsuwa-sys3175 @earlephilhower ! Sorry to bring it back, but is there any reason why these flags weren't added to the PlatformIO build script?

@earlephilhower
Copy link
Collaborator

@valeros I think it was just an oversight. Can you open an issue on it to track for 3.0.1? Closed PRs don't get much visibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants