Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Windows at least building with a -j > 1 scons parameter fails with a file not found even though the file exists #5042

Closed
corekase opened this issue Jun 5, 2016 · 83 comments · Fixed by #17595

Comments

@corekase
Copy link

corekase commented Jun 5, 2016

Operating system or device - Godot version:

Windows 10 Pro 64-bit, Godot source 64-bit build (tested with master branch)

Visual Studio 2015 Community, Windows 10 SDK Kit 10586.

Python 2.7.11, Scons 2.5.0, and Pywin32 220 all these tools are 32-bit.

Issue description (what happened, and what was expected):

When I issue the command:

scons platform=windows target=release_debug -jx

Where "x" is a number > 1 then the build process will eventually give a file not found error. Going to where it says the file is not found it is present. Specifying -j1 on the build line works fine.

Steps to reproduce:

scons platform=windows target=release_debug -j2

@bojidar-bg
Copy link
Contributor

Can you give the build log, or at least the final part of it?

@corekase
Copy link
Author

corekase commented Jun 5, 2016

I didn't see an obvious log file anywhere in the directory so I copied the last bit of the command-prompt output:

http://pastebin.com/7wCqdF5B

Build line was:

scons platform=windows target=release_debug -j4

One thing to note: it never happens with the same file, always a different file so it might be some kind of race condition?

@SuperUserNameMan
Copy link
Contributor

@headkase : it might be a stupid question, but does it change something if you add an space between -j and 4 ?

@corekase
Copy link
Author

corekase commented Jun 5, 2016

Nope, just did this from a clean clone:

scons -j 4 platform=windows target=release_debug

Still errored on a random file.

@volzhs
Copy link
Contributor

volzhs commented Jun 5, 2016

I compile godot with this command on cmd as administrator permission.

"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\vcvarsall.bat" amd64 && call C:\Python27\Scripts\scons.bat -j 4 platform=windows target=release_debug

how about this?

@corekase
Copy link
Author

corekase commented Jun 5, 2016

Nope, that errored right away too.

@corekase
Copy link
Author

corekase commented Jun 5, 2016

And I re-ran it with Adminstrator privileges, still errored with a file not found.

@corekase
Copy link
Author

corekase commented Jun 5, 2016

@volzhs What versions of Python, SCons, and Pywin32 do you have installed? If different than mine I'll try those.

@Calinou
Copy link
Member

Calinou commented Jun 5, 2016

I can reproduce this with Python 2.7.11, SCons 2.5.0 and PyWin32 build 220, all in 64-bit, when compiling with VS 2015.

@volzhs
Copy link
Contributor

volzhs commented Jun 5, 2016

@headkase Python 2.7.11, SCons 2.4.1 and PyWin32 build 220

@corekase
Copy link
Author

corekase commented Jun 5, 2016

@volzhs @Calinou

I downgraded SCons to 2.4.1 and compiled from a clean clone each of two times. They built fine using -j4 as a parameter. So, SCons 2.5.0 looks to be the culprit.

@Griefchief
Copy link
Contributor

Griefchief commented Jun 5, 2016

@headkase No, it happens on 2.4.0 for me when compiling MS VisualC++ sometimes (very rarely though).

It's a race condition most probably, some other compiler process is accessing/locking the same file that cannot be accessed... I have not looked more into it though, not an expert on multithreaded race conditions.

I remmeber there being a setting (or scons using by default) to randomize build order so that race conditions do not happen where multiple jobs are trying to build the same file...

This random building is probably the reason why sometimes we hit the race condition, and sometimes we do not.

@corekase
Copy link
Author

corekase commented Jun 5, 2016

Well, with 2.5.0 I had 100% failure with more than one job. With 2.4.1 I had it complete successfully two times in a row.

@Griefchief
Copy link
Contributor

@headkase Random luck/unluck or 2.5.0 being really bad at randomizing build order...

@corekase
Copy link
Author

corekase commented Jun 5, 2016

As the issue has been demonstrated to be with SCons, not Godot itself, if there is nothing more to add then it may be closed. Thank you for the help everyone.

@volzhs
Copy link
Contributor

volzhs commented Jun 5, 2016

@Griefchief @headkase I have compiled hundreds times with this environment for months, and I never got failure compiling.

@Griefchief
Copy link
Contributor

Ah, OK, i'm wrong, SCons is not randomizing dependencies on build order...

@headkase can you add --random to your scons builds in 2.5.0 and test if it will error? Also without?

http://scons.org/doc/HTML/scons-user.html#idp1392274268

@corekase
Copy link
Author

corekase commented Jun 5, 2016

@Griefchief I have a working system now. Can you upgrade to 2.5.0 and try a multi-job compile with the --random option? Then add your results here.

@Griefchief
Copy link
Contributor

@headkase Hm, OK, I can probably do that, will be in a few hours, or a day or two though, got some work currently. Thanks.

@corekase
Copy link
Author

corekase commented Jun 5, 2016

Additional note: Rémi Verschelde on Facebook indicated that it should still be considered a bug in the build system as SCons 2.5.0 works correctly under Linux.

https://www.facebook.com/groups/godotengine/permalink/785635728239690/

So I recall the request to close this issue at this time.

@akien-mga
Copy link
Member

Well since the error is tools\editor\editor_fonts.cpp(33): fatal error C1083: Cannot open include file: 'builtin_fonts.h': No such file or directory, it's a buildsystem bug. This file is one of the first ones that the buildsystem should generate from the TTF files. It should not start building the editor before this one was done.

@corekase
Copy link
Author

corekase commented Jun 5, 2016

@akien-mga If there is a concrete test then I will gladly re-upgrade to 2.5.0 and build the test - that's the least I can do for the Godot project.

@Griefchief
Copy link
Contributor

Griefchief commented Jun 5, 2016

It seems that MSVC has it's own multithreaded build... maybe paralel builds arent thread safe without? maybe we can use this to avoid the race condition with -j by redirecting -j into the compiler itself (if -j can be overriden)...

I'm just pasting this link here for reference, i have no idea if this will aleviate the race condition... or help with speed at all... (I guess it might, a bit)

https://msdn.microsoft.com/en-us/library/bb385193.aspx

@gau-veldt
Copy link
Contributor

gau-veldt commented Jun 5, 2016

Investigation needs to be done to determine if scons uses process forking (creates separate copied environments similar to using & in sh in linux) or threading (runs in same process thus sharing resources and all classic threading issues apply) to implement -j.

My hunch is we are getting collisions on the algorithm or system call generating tempfile names because in Windows command lines need to be passed via files due to the system's limit on the length of command lines. However if scons uses threads then the potential error space expands substantially.

However with processes (distinct environments) the error space is much reduced and a weak tempfile naming algorithm that generates a high number of collisions in tempfile names would generate the very race situation where a later job overwrites an (shared due to opening same file on account of naming collision) argument file with different include specs before a prior job using the same file finishes.

I suspect this reason due to past experience a couple years back where the problem with building (in mingw at the time usually during links where the argument size is quite large) was due to the build process not using temporary argument files and this functionality had to be added to the project.

@Griefchief
Copy link
Contributor

Griefchief commented Jun 5, 2016

Agents, I have a hunch that maybe none of us have installed pywin32 properly into python? My SCons complains about python even though i think i installed it earlier (maybe into the wrong version)...

Is somebody not receiving a SCons warning related to pywin32 immediately when running a build (just after SCons stops processing SConstruct and starts to build targets)and getting these missing .h files?

1>scons : warning : you do not seem to have the pywin32 extensions installed;
1>      parallel (-j) builds may not work reliably with open Python files.
1>  File "c:\Python27\Scripts\scons.py", line 199, in <module>
1>  scons: Building targets ...

@corekase
Copy link
Author

corekase commented Jun 5, 2016

@Griefchief I am not receiving such an error at all. I would suggest you correct your Pywin32 installation.

Edit: Also, I have successfully built parallel builds about 4 more times with SCons 2.4.1.

@Griefchief
Copy link
Contributor

@headkase Thank you, this rules out Pywin32 probably... One fine folk from freenode # SCONS says that pywin32 is mostly about python file objects (generated by open() and file())... I guess it's probably not pywin32 related, we'll probably need investigate what gau-veldt mentioned.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 11, 2018

During the investigation I made changes to the build code (Python) to properly close all files everywhere in the Python code base by using the Python with statement for file context wherever it did not need too much refactoring. Explicit close calls are added in all other cases. Unfortunately this does not fix the original issue, but it is good practice anyway.

Please find the fixes mentioned above in the PR below.

Fixing this race condition on Windows by waiting for the dependencies to be actually readable would be the task of SCons, so the relevant fix should go into their code base.

Based on all the investigations above I suggest closing this ticket.

@garyo
Copy link
Contributor

garyo commented Mar 12, 2018

Great work tracking this down in so much detail, viktor-ferenczi! If you would like to go a little further down this rabbit hole, try running a build while using Procmon: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon -- that should show you exactly why the file isn't openable. Maybe virus scanner has it open, or backup, or something else. Or maybe as you suggest, python isn't closing it. On Windows, only one process can have a file open at a time (in most situations).

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

Godot build log (with timestamps):

2018-03-17T11:38:35.204000 [27392]: SHELL: u'C:\\WINDOWS\\System32\\cmd.exe'Compiling ==> main\default_controller_mappings.gen.cpp
2018-03-17T11:38:35.249000 [25892]: CMD: cl /Fomain\default_controller_mappings.gen.windows.tools.64.obj /c "main\default_controller_mappings.gen.cpp" /TP /nologo /Z7 /Od /EHsc /MT /Gd /GR /w /EHsc -DZSTD_STATIC_LINKING_ONLY -DFT2_BUILD_LIBRARY -DZLIB_DEBUG -DFREETYPE_ENABLED -DFT_CONFIG_OPTION_USE_PNG -DRECAST_ENABLED -DSVG_ENABLED -DGLAD_ENABLED -DGLES_OVER_GL /DDEBUG_MEMORY_ALLOC /DSCI_NAMESPACE /DDEBUG_ENABLED /DDEBUG_MEMORY_ENABLED /DD3D_DEBUG_INFO /DWINDOWS_ENABLED /DOPENGL_ENABLED /DRTAUDIO_ENABLED /DWASAPI_ENABLED /DTYPED_METHOD_BIND /DWIN32 /DMSVC "/D{'_WIN32_WINNT': '0x0601', 'WINVER': '0x0601'}" /D_WIN64 /DPTRCALL_ENABLED /DTOOLS_ENABLED /DGDSCRIPT_ENABLED /DMINIZIP_ENABLED /DXML_ENABLED /Icore /Icore\math /Ieditor /Idrivers /I. /Iplatform\windows "/IC:\Program Files (x86)\Windows Kits\10\Include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\ATLMFC\include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\...
default_controller_mappings.gen.cpp
c1xx: fatal error C1083: Cannot open source file: 'main\default_controller_mappings.gen.cpp': Permission denied
2018-03-17T11:38:37.749000 [1720]:   |   | | +-main\default_controller_mappings.gen.cpp
2018-03-17T11:39:06.805000 [1720]:   | +-[main\default_controller_mappings.gen.cpp]

Process Monitor events filtered on default_controller_mappings.gen.cpp:

11:38:34.7608493	python.exe	24660	QueryOpen	G:\godot\main\default_controller_mappings.gen.cpp	NAME NOT FOUND	
11:38:34.7727775	python.exe	24660	QueryOpen	G:\godot\main\default_controller_mappings.gen.cpp	NAME NOT FOUND	
11:38:34.8151715	python.exe	24660	CreateFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
11:38:34.8153579	python.exe	24660	QueryInformationVolume	G:\godot\main\default_controller_mappings.gen.cpp	BUFFER OVERFLOW	VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramৎ
11:38:34.8153733	python.exe	24660	QueryAllInformationFile	G:\godot\main\default_controller_mappings.gen.cpp	BUFFER OVERFLOW	CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, FileAttributes: A, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000000001a70, EaSize: 0, Access: Generic Write, Read Attributes, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
11:38:34.8234762	python.exe	24660	WriteFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 0, Length: 4,111, Priority: Normal
...
11:38:34.8995967	python.exe	24660	WriteFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 258,846, Length: 4,111, Priority: Normal
11:38:34.8996605	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	END OF FILE	Offset: 262,144, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
11:38:34.9006556	python.exe	24660	WriteFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 262,957, Length: 4,113
...
11:38:34.9123225	python.exe	24660	WriteFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 320,515, Length: 1,875
11:38:34.9716933	python.exe	24660	QueryOpen	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, AllocationSize: 327,680, EndOfFile: 322,390, FileAttributes: A
11:38:34.9718483	python.exe	24660	QueryOpen	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, AllocationSize: 327,680, EndOfFile: 322,390, FileAttributes: A
11:38:34.9721116	python.exe	24660	CreateFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
11:38:34.9721715	python.exe	24660	QueryInformationVolume	G:\godot\main\default_controller_mappings.gen.cpp	BUFFER OVERFLOW	VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramৎ
11:38:34.9721837	python.exe	24660	QueryAllInformationFile	G:\godot\main\default_controller_mappings.gen.cpp	BUFFER OVERFLOW	CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, FileAttributes: A, AllocationSize: 327,680, EndOfFile: 322,390, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000000001a70, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
11:38:34.9722100	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 0, Length: 65,536, Priority: Normal
11:38:34.9723989	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 65,536, Length: 65,536
11:38:34.9725565	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 131,072, Length: 65,536
11:38:34.9727488	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 196,608, Length: 65,536
11:38:34.9729153	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 262,144, Length: 60,246
11:38:34.9729327	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	END OF FILE	Offset: 322,390, Length: 4,096
11:38:34.9730683	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	END OF FILE	Offset: 322,390, Length: 65,536
11:38:34.9731062	python.exe	24660	CloseFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	
11:38:35.1611191	python.exe	24660	CreateFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
11:38:35.1612588	python.exe	24660	QueryInformationVolume	G:\godot\main\default_controller_mappings.gen.cpp	BUFFER OVERFLOW	VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramৎ
11:38:35.1612716	python.exe	24660	QueryAllInformationFile	G:\godot\main\default_controller_mappings.gen.cpp	BUFFER OVERFLOW	CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, FileAttributes: A, AllocationSize: 327,680, EndOfFile: 322,390, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000000001a70, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
11:38:35.1612898	python.exe	24660	QueryInformationVolume	G:\godot\main\default_controller_mappings.gen.cpp	BUFFER OVERFLOW	VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramৎ
11:38:35.1612972	python.exe	24660	QueryAllInformationFile	G:\godot\main\default_controller_mappings.gen.cpp	BUFFER OVERFLOW	CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, FileAttributes: A, AllocationSize: 327,680, EndOfFile: 322,390, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000000001a70, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
11:38:35.1613175	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 0, Length: 319,488, Priority: Normal
11:38:35.1613876	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Offset: 319,488, Length: 2,902
11:38:35.1614067	python.exe	24660	ReadFile	G:\godot\main\default_controller_mappings.gen.cpp	END OF FILE	Offset: 322,390, Length: 4,096
11:38:35.1614622	python.exe	24660	CloseFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	
11:38:35.3236414	cl.exe	28964	QueryDirectory	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	Filter: default_controller_mappings.gen.cpp, 1: default_controller_mappings.gen.cpp
11:38:35.3237172	cl.exe	28964	CreateFile	G:\godot\main\default_controller_mappings.gen.cpp	SHARING VIOLATION	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, AllocationSize: n/a
11:38:35.5890745	cmd.exe	24148	CloseFile	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	
11:38:35.6872005	python.exe	24660	QueryOpen	G:\godot\main\default_controller_mappings.gen.cpp	SUCCESS	CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:35, ChangeTime: 17/03/2018 11:38:35, AllocationSize: 323,584, EndOfFile: 322,390, FileAttributes: A

G: is a ramdisk with no backup. G:\godot build folder was excluded from all Bitdefender Antivirus scanning, it was verified by seeing the lack of access from its vsserv.exe process.

BUFFER OVERFLOW is normal and harmless. Some repetitive WriteFile operations were removed for clarity. The very last QueryOpen comes from my extra check for the readability of the file.

Next steps:

  • What does the SHARING VIOLATION result exactly mean in this context?
  • There is a CloseFile from cmd.exe after cl.exe receives the above error, but there seem to be no related CreateFile or QueryOpen above. Why does cmd.exe opens the file?

@garyo
Copy link
Contributor

garyo commented Mar 17, 2018

Sharing violation just means that by default, only one process can have a file open on Windows. Python still has it open (for reading), and the compiler is trying to open it.
(https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx may help.)

So the question is why Python (SCons) still has it open. I know SCons has had this kind of problem in the past; here's a (very!) old discussion of something quite similar: http://scons-users.scons.narkive.com/pjMTzf5v/race-condition-in-scons-scheduler - you can see it could be a number of things, from python not closing the file until it collects garbage, to a bug in the python function that's creating default_controller_mappings.gen.cpp (could you print a timestamp when that finishes, to correlate with the procmon log?), to a problem with the include-file scanner.
Here's another old report of a similar thing: http://scons.1086193.n5.nabble.com/MSVC-Parallell-Builds-td23950.html

So if with your help we could nail this one, it would be amazing. I'm going to mention this on the SCons irc as well.

One other question: I know python made significant changes to the IO system in 3.x. I wonder if this problem would go away if you run SCons with python3? Oh, in any case you should use SCons 3.0.1. It's still compatible with Python 2.7.

@Calinou
Copy link
Member

Calinou commented Mar 17, 2018

One other question: I know python made significant changes to the IO system in 3.x. I wonder if this problem would go away if you run SCons with python3? Oh, in any case you should use SCons 3.0.1. It's still compatible with Python 2.7.

I'm still running into this issue with Python 3.6.4 and SCons 3.0.1 on Windows 10.

@viktor-ferenczi
Copy link
Contributor

@garyo

python not closing the file until it collects garbage

Fix for that was in my PR, thanks for merging it. I use latest master for my tests.

One other question: I know python made significant changes to the IO system in 3.x.

Python 2.7.14 is used in my tests to avoid any such issues. I think it is best to separate the problems as much as possible. When we nail this with Python 2.7, then I will give it a test on latest 3.x as well.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

This time it happened with: splash.gen.h

Build log (with scons log timestamp hack):

2018-03-17T14:19:33.240000 [43524]: scons: building `main\splash.gen.h' because it doesn't exist
2018-03-17T14:19:33.240000 [43524]: Preparing target main\splash.gen.h...
2018-03-17T14:19:33.241000 [38480]: make_splash(["main\splash.gen.h"], ["main\splash.png"])
2018-03-17T14:19:34.264000 [38480]: Command execution time: main\splash.gen.h: 1.023000 seconds
main\main.cpp(45): fatal error C1083: Cannot open include file: 'splash.gen.h': No such file or directory

Process monitor events:

14:19:32.3982270	python.exe	52576	QueryOpen	G:\godot\main\splash.gen.h	NAME NOT FOUND	
14:19:32.3985416	python.exe	52576	QueryOpen	G:\godot\main\splash.gen.h	NAME NOT FOUND	
14:19:32.6686124	python.exe	52576	QueryOpen	G:\godot\main\splash.gen.h	NAME NOT FOUND	
14:19:33.2427370	python.exe	52576	CreateFile	G:\godot\main\splash.gen.h	SUCCESS	Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
14:19:33.2429277	TGitCache.exe	7324	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 0, EndOfFile: 0, FileAttributes: A
14:19:33.2429297	python.exe	52576	QueryInformationVolume	G:\godot\main\splash.gen.h	BUFFER OVERFLOW	VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ram8胆뷛
14:19:33.2429414	python.exe	52576	QueryAllInformationFile	G:\godot\main\splash.gen.h	BUFFER OVERFLOW	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, FileAttributes: A, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x4000000001a6e, EaSize: 0, Access: Generic Write, Read Attributes, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
14:19:33.2431568	TGitCache.exe	7324	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 0, EndOfFile: 0, FileAttributes: A
14:19:33.3142694	python.exe	52576	WriteFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 0, Length: 5,119, Priority: Normal
14:19:33.3143777	python.exe	52576	WriteFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 5,119, Length: 99
14:19:33.3690732	python.exe	52576	WriteFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 5,218, Length: 4,992, Priority: Normal
14:19:33.3930376	TGitCache.exe	7324	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 16,384, EndOfFile: 10,210, FileAttributes: A
14:19:33.4052584	TGitCache.exe	7324	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 16,384, EndOfFile: 10,210, FileAttributes: A
14:19:33.4236712	python.exe	52576	WriteFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 10,210, Length: 4,999
...
14:19:34.2640760	python.exe	52576	WriteFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 115,173, Length: 2,212
14:19:34.5245007	python.exe	52576	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 196,608, EndOfFile: 117,385, FileAttributes: A
14:19:34.5247093	python.exe	52576	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 196,608, EndOfFile: 117,385, FileAttributes: A
14:19:34.5250763	python.exe	52576	CreateFile	G:\godot\main\splash.gen.h	SUCCESS	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
14:19:34.5251513	python.exe	52576	QueryInformationVolume	G:\godot\main\splash.gen.h	BUFFER OVERFLOW	VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ram᭲
14:19:34.5251670	python.exe	52576	QueryAllInformationFile	G:\godot\main\splash.gen.h	BUFFER OVERFLOW	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, FileAttributes: A, AllocationSize: 196,608, EndOfFile: 117,385, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x4000000001a6e, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
14:19:34.5251994	python.exe	52576	ReadFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 0, Length: 65,536, Priority: Normal
14:19:34.5253904	python.exe	52576	ReadFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 65,536, Length: 51,849
14:19:34.5254283	python.exe	52576	ReadFile	G:\godot\main\splash.gen.h	END OF FILE	Offset: 117,385, Length: 12,288
14:19:34.5256021	python.exe	52576	ReadFile	G:\godot\main\splash.gen.h	END OF FILE	Offset: 117,385, Length: 65,536
14:19:34.5256326	python.exe	52576	CloseFile	G:\godot\main\splash.gen.h	SUCCESS	
14:19:34.5393288	python.exe	52576	CreateFile	G:\godot\main\splash.gen.h	SUCCESS	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
14:19:34.5394374	python.exe	52576	QueryInformationVolume	G:\godot\main\splash.gen.h	BUFFER OVERFLOW	VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ram᮷
14:19:34.5394542	python.exe	52576	QueryAllInformationFile	G:\godot\main\splash.gen.h	BUFFER OVERFLOW	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, FileAttributes: A, AllocationSize: 196,608, EndOfFile: 117,385, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x4000000001a6e, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
14:19:34.5394938	python.exe	52576	QueryInformationVolume	G:\godot\main\splash.gen.h	BUFFER OVERFLOW	VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramᮧ
14:19:34.5395311	python.exe	52576	QueryAllInformationFile	G:\godot\main\splash.gen.h	BUFFER OVERFLOW	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, FileAttributes: A, AllocationSize: 196,608, EndOfFile: 117,385, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x4000000001a6e, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
14:19:34.5395981	python.exe	52576	ReadFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 0, Length: 114,688, Priority: Normal
14:19:34.5396528	python.exe	52576	ReadFile	G:\godot\main\splash.gen.h	SUCCESS	Offset: 114,688, Length: 2,697
14:19:34.5396742	python.exe	52576	ReadFile	G:\godot\main\splash.gen.h	END OF FILE	Offset: 117,385, Length: 4,096
14:19:34.5397086	python.exe	52576	CloseFile	G:\godot\main\splash.gen.h	SUCCESS	
14:19:35.2264735	cl.exe	3956	QueryDirectory	G:\godot\main\splash.gen.h	SUCCESS	Filter: splash.gen.h, 1: splash.gen.h
14:19:35.2265730	cl.exe	3956	CreateFile	G:\godot\main\splash.gen.h	SHARING VIOLATION	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, AllocationSize: n/a
14:19:35.2266291	cl.exe	3956	QueryDirectory	G:\godot\core\math\splash.gen.h	NO MORE FILES	Filter: splash.gen.h
14:19:35.2267001	cl.exe	3956	QueryOpen	G:\godot\core\math\splash.gen.h	NAME NOT FOUND	
14:19:35.2267691	cl.exe	3956	QueryDirectory	C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\splash.gen.h	NO MORE FILES	Filter: splash.gen.h
14:19:35.2270577	cl.exe	3956	CreateFile	C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\splash.gen.h	NAME NOT FOUND	Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a
14:19:35.3311607	cl.exe	3956	QueryDirectory	C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\splash.gen.h	NO MORE FILES	Filter: splash.gen.h
14:19:35.3314719	cl.exe	3956	CreateFile	C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\splash.gen.h	NAME NOT FOUND	Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a
14:19:35.4549549	cmd.exe	1900	CloseFile	G:\godot\main\splash.gen.h	SUCCESS	
14:19:35.4552071	TGitCache.exe	7324	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.4555465	TGitCache.exe	7324	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.6042221	TGitCache.exe	7324	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.6182531	TGitCache.exe	7324	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.6784076	python.exe	52576	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.6795939	python.exe	52576	QueryOpen	G:\godot\main\splash.gen.h	SUCCESS	CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A

CL.EXE experiences the SHARING VIOLATION about 700ms later than Python closed the file. Time difference should not even count, only the order of events is important here.

14:19:35.2265730	cl.exe	3956	CreateFile	G:\godot\main\splash.gen.h	SHARING VIOLATION	Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, AllocationSize: n/a

Why on Earth cmd.exe is keeping the splash.gen.h file open?

Also, there is no CreateFile event for cmd.exe with Path splash.gen.h, so it did not open that file, only closed it. Strange...

14:19:35.4549549	cmd.exe	1900	CloseFile	G:\godot\main\splash.gen.h	SUCCESS	

TortoiseGit's cache seems to find it as well, but that happens only after cmd.exe closes it.

Next step is to figure whether TortoiseGit's cache have anything to do with it.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

Turned off both TortoiseHg's and TortoiseGit's icon overlay cache and verified that their processes are not running anymore. Repeated the test and it failed with splash.gen.h again the exact same way. So the icon overlay caches did not cause the issue.

@garyo
Copy link
Contributor

garyo commented Mar 17, 2018

Interesting! How is cmd.exe even involved here... and why is there no CreateFile. interesting questions. SCons will try to remove the old version before building it, but I'm pretty sure that's done directly via os.unlink(). Also note SCons opened (CREATE_FILE) it with ShareMode read and write. A bit unusual, but OK. And also maybe why it worked at all, if cmd.exe had it open already.

If you still have the whole procmon log, maybe seeing how cmd.exe (pid 3956 there) got started would give a hint. Was it started from python/SCons? Must have been, I'd think. SCons will run cmd.exe to execute shell commands. --taskmastertrace=- might show something.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

cmd.exe

Before each test I clean the build tree by running scons -c, so all the generated files are removed before the actual build. I've just verified it:

2018-03-17T15:20:35.963000 [51256]: Removed main\splash.gen.h

Also note SCons opened (CREATE_FILE) it with ShareMode read and write. A bit unusual, but OK.

That's caused by some extra code I hacked directly into SCons to explicitly verify that all target files generated can be opened and fully read. This is why python.exe appears to open the file again in ShareMode Read, Write, then read it all with a 64kB chunk size. So the splash.gen.h file was there and fully readable. But not for CL.EXE... Question is, why??

Also, if I run the exact same command line after the build fails, then CL.EXE can successfully read the file. So it might be something with the parent process (SCons's python.exe or some intermediate cmd.exe). It more and more looks like a generic SCons issue on Windows.

@garyo
Copy link
Contributor

garyo commented Mar 17, 2018

But not for CL.EXE... Question is, why??

One would assume it's because of cmd.exe having it open. But that would imply that whatever cmd.exe did to open it failed to show up in the procmon log. And it would imply that cmd.exe didn't have it open when SCons/python was generating it. (Oh, maybe you should run procmon as Adminstrator, if you're not already. Probably not the smoking gun, but best to be sure.)

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

Here is how cmd.exe gets into the picture:

SCons is apparently invoking CL.EXE using a modified (wrapped) version of os.spawnve to make it thread safe by using a lock. SCons is invoking os.spawnve with os.P_WAIT and passes over the full environment (too long to quote here).

See C:\Python27\scons-3.0.0\SCons\Platform\win32.py, search for: def spawnve(mode, file, args, env):

Pretty printed (space joined) command line of the failing CL.EXE invocation:

C:\WINDOWS\System32\cmd.exe /C "cl /Fomain\main.windows.tools.64.obj /c "main\main.cpp" /TP /nologo /Z7 /Od /EHsc /MT /Gd /GR /w /EHsc -DZSTD_STATIC_LINKING_ONLY -DFT2_BUILD_LIBRARY -DZLIB_DEBUG -DFREETYPE_ENABLED -DFT_CONFIG_OPTION_USE_PNG -DRECAST_ENABLED -DSVG_ENABLED -DGLAD_ENABLED -DGLES_OVER_GL /DDEBUG_MEMORY_ALLOC /DSCI_NAMESPACE /DDEBUG_ENABLED /DDEBUG_MEMORY_ENABLED /DD3D_DEBUG_INFO /DWINDOWS_ENABLED /DOPENGL_ENABLED /DRTAUDIO_ENABLED /DWASAPI_ENABLED /DTYPED_METHOD_BIND /DWIN32 /DMSVC "/D{'_WIN32_WINNT': '0x0601', 'WINVER': '0x0601'}" /D_WIN64 /DPTRCALL_ENABLED /DTOOLS_ENABLED /DGDSCRIPT_ENABLED /DMINIZIP_ENABLED /DXML_ENABLED /Icore /Icore\math /Ieditor /Idrivers /I. /Iplatform\windows "/IC:\Program Files (x86)\Windows Kits\10\Include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\ATLMFC\include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include" "/IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\ucrt" "/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\shared" "/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\um" "/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\winrt" /Ithirdparty\zstd /Ithirdparty\zstd\common /Ithirdparty\zlib /Ithirdparty\rtaudio /Ithirdparty\glad /Ithirdparty\freetype /Ithirdparty\freetype\include /Ithirdparty\libpng /Ithirdparty\recastnavigation\Recast /Ithirdparty\recastnavigation\Recast\Include /Ithirdparty\nanosvg"

This is originally passed as a list, which would be unreadable for us humans. Please note that it makes quoting incorrect above.

Invoking this same command line after the build does succeed, even with wrong quoting.

Since no generated header file is listed on the command line cmd.exe cannot know about it, unless it makes a directory listing (which I don't see on ProcMon).

Apparently cmd.exe keeps its working directory (G:\godot in my case) open and closes every single generated file (both cpp and h) we make.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

Summary on all the workarounds attempted:

Changing the execution environment

  1. Disabling Antivirus, TortoiseHg and TortoiseGit icon overlay caches, disabling backup: Build still fails
  2. Invoking the whole build with env var set PYTHONUNBUFFERED=1 set: Build still fails
  3. Invoking the build with -j1 (single process): Build still fails, but makes the problem less likely

Changing SCons source code

  1. Adding gc.collect() before and after the spawnve call: Build still fails
  2. Adding time.sleep(1.0) before each spawnve call: Build completes, but awful slow.

Changing Godot build scripts

  1. Writing the target file with a .new suffix in the generator, closing the file, then renaming it. (Tested only in main/SCsub, since the files generated there frequently cause this build failure.) It resulted in the following error right on renaming:
2018-03-17T16:54:44.859000 [21436]: scons: *** [main\default_controller_mappings.gen.cpp] G:\godot\main\default_controller_mappings.gen.cpp.new: The process cannot access the file because it is being used by another process

Both os.rename and shutil.move were attempted with the same result.

So files written cannot be renamed right after closing them.

I think this is a step in the right direction to figure out what's going on.

@garyo
Copy link
Contributor

garyo commented Mar 17, 2018

That sounds right, in terms of how SCons invokes subprocesses. But as you say, it's unlikely that cmd.exe is the one opening the generated file. It just gets /c "some command" and executes it.
There's one other possibility, which I thought was stamped out years ago but maybe not: inherited file handles. See this python bug: https://bugs.python.org/issue19575. Basically, in the presence of redirection, all open file handles get inherited by the subprocess. So if SCons has a file open, and invokes cmd.exe, maybe cmd.exe inherits that handle.
I'm not saying it's definitely that, but it's a possibility.

I wonder why this never happens for me? I have a big AMD CPU and always build with -j30 (it's very nice), and haven't seen this happen. I'm using Win10, SSD, VS2017, python 3.6.3, SCons master branch from January. I'm building in my C drive, you're on G: -- probably not relevant, but who knows...

@garyo
Copy link
Contributor

garyo commented Mar 17, 2018

Here's one thing to try:
In SCons Platform/win32.py there is code to turn off file-handle inheritance -- it monkey-patches BufferedReader and others. Search for HANDLE_FLAG_INHERIT. Maybe try to use that logic in main/SCsub when opening the generated files?

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

Thank you for the ideas. I'm trying to use _scons_open from SCons's Platform/win32.py as you suggested.

If that does not work, then we can try to run all of our Python build (make_*) functions in subprocesses, which would effectively decouple them from the SCons process making inheriting anything from there impossible. Based on former tickets this approach worked in case of shaders.

AMD Ryzen 7 1800X CPU here with 8 physical cores + HT, so the -j16 for me. Could be -j32 due to I/O wait, tough.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

HANDLE_FLAG_INHERIT seems to work.
UPDATE: It does not actually work. :(

Testing it in a loop many times, then creating a PR.

Higher build parallelism seems to decrease the likelihood of this issue. Reason can be the extra waiting time introduced between subsequent tasks by too much parallelism. Under-utilizing the CPU seems to maximize the likelihood of this issue. Going down to -j1 seems to decrease the probability, tough.

The less time is spent between finishing the Python generator and starting the CL.EXE compiler the more likely this problem is. It happens for sure if the gap is less than a few seconds. There seem to be a delay for closing files, even if we do it properly in Python.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

The fix is not reliable, unfortunately. I was just lucky while testing the first 2-3 times, likely because of trying to have -j32 parallelism. Now it is running in a loop with -j16 and the build failed 20 out of 20 times.

Next: Trying to run all generator functions in their own subprocesses.

@viktor-ferenczi
Copy link
Contributor

viktor-ferenczi commented Mar 17, 2018

Running all builders (make_* functions) in subprocesses works well. I've seen this suggestion way above in the chat, actually. Too bad we don't know the exact reason for this problem, but at least there is a reliable workaround now.

Running 20-30 test builds, expect 100% success rate. I will also try with -j1 and -j32. Then a PR will follow.

@viktor-ferenczi
Copy link
Contributor

Build benchmark

Average successful build times in seconds by build parallelism (-j option):

-j1:  794s
-j2:  416s
-j4:  228s
-j8:  166s
-j16: 154s
-j32: 152s

All builds started from clean state and running on a ramdisk.
Cleaning was not counted for the build time.

Build machine
CPU: Ryzen 7 1800X
RAM: 32GB 2400MHz (suboptimal, I know)
Motherboard: ASUS ROG Crosshair VI Hero, Socket-AM4
SSD: Samsung 850 Pro 1TB (SATA)
OS: Windows 10 Pro 64bit ENU

@Griefchief
Copy link
Contributor

Griefchief commented Mar 18, 2018 via email

@bdbaddog
Copy link

Take a look at this: (Python's implementation of close)
https://github.com/python/cpython/blob/f320be77ffb73e3b9e7fc98c37b8df3975d84b40/Modules/_io/fileio.c#L106

Note the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS around the actual C call to close() :

        Py_BEGIN_ALLOW_THREADS
        _Py_BEGIN_SUPPRESS_IPH
        err = close(fd);
        if (err < 0)
            save_errno = errno;
        _Py_END_SUPPRESS_IPH
        Py_END_ALLOW_THREADS

My belief is that the GIL gets released while the file is closing, and randomly the written (or copied) file get's open()'d either by python or a subprocess before the close() call actually completes..
(Note I'm the other SCons Co-Manager along with @garyo )

We've had patches and SCons users swapping to use win32's FileCopy API instead of shutil.copy() with good results. In those cases users were mainly seeing file handle open issue with files pulled in from a CacheDir and/or being installed or Copy()'d by scons.

Again my hypothesis would hold as the FileCopy call via ctypes (or pywin32) would hold the GIL until it returns which shouldn't happen until the file copy has completed.

@lawnjelly
Copy link
Member

lawnjelly commented Aug 18, 2019

Following #20373 I'm not sure whether this issue is completely solved, but maybe it is effectively so rare now as to not be a problem. However I have some potentially relevant info.

Following my work adding unity build #13096 option to Godot, I was getting these include errors for the generated files each clean / rebuild cycle. Whether it was the same mechanism for the error I don't know however, I can show how I cured it in the unity build...

Using the SCsub in main as an example, currently my replacement is:
old:

env.add_source_files(env.main_sources, "*.cpp")

new:

if env['unity']:
    env.add_source_files(env.main_sources, "#misc/scu/SCU_main.cc")
    env.Depends("#misc/scu/SCU_main.cc", "#main/app_icon.gen.h")
    env.Depends("#misc/scu/SCU_main.cc", "#main/splash.gen.h")
    env.Depends("#misc/scu/SCU_main.cc", "#main/splash_editor.gen.h")
else:
    env.add_source_files(env.main_sources, "*.cpp")

That is, as I understand it, instead of relying on Scons to find the dependency information (and determine the build order), I am explicitly saying that these generated files are needed as a dependency before SCU_main.cc is compiled.

I am of course assuming that we were currently relying on Scons to determine build order without hints such as this. As to why it was occurring more in the unity build, I don't know (perhaps the number of files in the compilation unit were overwhelming the parser Scons uses to auto detect dependencies), or it was limiting the multithreading of the compilation.

This leads me to suspect that the same problem may be existing in the main build (Scons not knowing order of dependencies) but it is being masked from showing up in most circumstances, and only showing up 'randomly' on some people's machines. If this is the case then the same steps to fix the unity build might fix the main build.

Other dependencies needed were:
In editor:

    if env['unity']:
        env.add_source_files(env.editor_sources, "#misc/scu/SCU_editor.cc")
        env.Depends("#misc/scu/SCU_editor.cc", "#core/authors.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "#core/donors.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "#core/license.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "#core/io/certs_compressed.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "builtin_fonts.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "doc_data_compressed.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "translations.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "editor_icons.gen.h")
    else:
        env.add_source_files(env.editor_sources, "*.cpp")

In modules/gdnative:

if env['unity']:
    env_gdnative.add_source_files(env.modules_sources, "#misc/scu/SCU_modules_gdnative.cc")
    env.Depends("#misc/scu/SCU_modules_gdnative.cc", "gdnative_api_struct.gen.cpp")
    env.Depends("#misc/scu/SCU_modules_gdnative.cc", "gdnative_api_struct.gen.h")
else:
    env_gdnative.add_source_files(env.modules_sources, "gdnative.cpp")
    env_gdnative.add_source_files(env.modules_sources, "register_types.cpp")
    env_gdnative.add_source_files(env.modules_sources, "android/*.cpp")
    env_gdnative.add_source_files(env.modules_sources, "gdnative/*.cpp")
    env_gdnative.add_source_files(env.modules_sources, "nativescript/*.cpp")
    env_gdnative.add_source_files(env.modules_sources, "gdnative_library_singleton_editor.cpp")
    env_gdnative.add_source_files(env.modules_sources, "gdnative_library_editor_plugin.cpp")

I think those were the main ones, I may have missed a couple. To modify this for the main build you might need to change the Depends sections so that the targets are cpp in the main build, or some approach like this.

Does this sound a plausible alternative explanation (in addition to the file locking thing)? If so I can have a think about how to apply the same dependencies to the main build. Has anyone still been having this problem since #17595 ?

@bdbaddog
Copy link

@lawnjelly - This shouldn't be necessary unless the names of the header files are defined via a preprocessor value like this:

#define MY_INCLUDE "abc.h"
#include MY_INCLUDE

My first guess would one of the following conditions apply:

  1. The builder creating those header files (assuming they are generated) doesn't properly inform scons that those files are being generated
  2. The CPPPATH doesn't include the path where those files are found
  3. Perhaps a #include "a/b/c/d.h"

In general if you have to add a Depends() for a header file something is wrong. This is not expected when using SCons.

BTW. You don't specify order to SCons, you specify dependencies and SCons determines order using that.

@lawnjelly
Copy link
Member

@lawnjelly - This shouldn't be necessary unless the names of the header files are defined via a preprocessor value like this:

#define MY_INCLUDE "abc.h"
#include MY_INCLUDE

Ah, then ignore me, this may be what is causing the issue on the unity build. I am doing exactly this (for easy changes, the path is currently specified in a macro). However it might potentially make things easier if I can remove the Depends() by specifying non-macrod paths, I will try and test this. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.