Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python execution very slow #461

Closed
agners opened this issue Jan 28, 2019 · 15 comments
Closed

Python execution very slow #461

agners opened this issue Jan 28, 2019 · 15 comments

Comments

@agners
Copy link
Contributor

agners commented Jan 28, 2019

Python scripts/programs appear to be very slow. Already running the Python interpreter is slower than usual. Using python3 -v shows that Python detects the bytecode files (pyc) as stale:

$ python3 -v
import _frozen_importlib # frozen
import _imp # builtin
import sys # builtin
import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
import '_io' # <class '_frozen_importlib.BuiltinImporter'>
import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
import _thread # previously loaded ('_thread')
import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
import _weakref # previously loaded ('_weakref')
import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
# installing zipimport hook
import 'zipimport' # <class '_frozen_importlib.BuiltinImporter'>
# installed zipimport hook
# bytecode is stale for 'encodings'
# code object from /usr/lib/python3.5/encodings/__init__.py
# could not create '/usr/lib/python3.5/encodings/__pycache__/__init__.cpython-35.pyc': OSError(30, 'Read-only file system')
# wrote '/usr/lib/python3.5/encodings/__pycache__/__init__.cpython-35.pyc'
# bytecode is stale for 'codecs'
# code object from /usr/lib/python3.5/codecs.py
# could not create '/usr/lib/python3.5/__pycache__/codecs.cpython-35.pyc': OSError(30, 'Read-only file system')
# wrote '/usr/lib/python3.5/__pycache__/codecs.cpython-35.pyc'
import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>
import 'codecs' # <_frozen_importlib_external.SourceFileLoader object at 0xb6b5c130>
# bytecode is stale for 'encodings.aliases'
# code object from /usr/lib/python3.5/encodings/aliases.py
# could not create '/usr/lib/python3.5/encodings/__pycache__/aliases.cpython-35.pyc': OSError(30, 'Read-only file system')
# wrote '/usr/lib/python3.5/encodings/__pycache__/aliases.cpython-35.pyc'
import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0xb6b669d0>
...
@agners
Copy link
Contributor Author

agners commented Jan 28, 2019

@ricardosalveti pointed me to the relevant OSTree issue:
ostreedev/ostree#1469

@lbonn
Copy link
Contributor

lbonn commented Jan 28, 2019

Interesting...

There is a relevant new feature in uptstream python (https://bugs.python.org/issue33499) but you'd have to wait a bit to appear here. Then, the easiest but imperfect fix would be to make it refer to a temporary directory so that it would at least be cached between runs.

Theoretically it could be back-ported and used on current versions but it probably requires some non-trivial effort + maintenance overhead. Failing that, working on a solution with ostree maintainers seems like the best way to go.

@OYTIS
Copy link
Contributor

OYTIS commented Jan 28, 2019

Part of it (judging from the logs) looks to be due to /usr being mounted read-only under OSTree. So together with fixing mtime issue we also need to make sure that all the *.pys that land under /usr are accompanied with the respective *.pycs during the build process. Or wait until PYTHONPYCACHEPREFIX is implemented and use that.

@agners
Copy link
Contributor Author

agners commented Jan 28, 2019

@lbonn it seems that OSTree tries to push that to its users, see cgwalters comment:

I think the ideal fix here is to change RPM to canonicalize those timestamps to 0 as well, basically bringing the libostree model there.

The Python community is working on a solution for reproducible builds, however this is available in Python 3.7 and newer only: https://www.python.org/dev/peps/pep-0552/

It allows use hash based comparison whether a source file has been changed. Timestamp will still be the default, so we would have to opt for the hash based (or the unchecked hash) mechanism then.

@agners
Copy link
Contributor Author

agners commented Jan 28, 2019

One way to work around is to just set the timestamp in the pyc files to what it will be for the source files. A python script doing this is rather straight forward: update_bytecode_timestamps.py

Another option would be to just not deploy the py source files. This seems a valid Python3 deployment method and saves space too...

@ricardosalveti
Copy link
Contributor

Another option would be to just not deploy the py source files. This seems a valid Python3 deployment method and saves space too...

That would be nice to see as well.

@agners
Copy link
Contributor Author

agners commented Jan 28, 2019

Just realized that thanks to the reproducible build effort OpenEmbedded core actually already carries a patch which allows to use SOURCE_DATE_EPOCH for that purpose:
http://git.openembedded.org/openembedded-core/tree/meta/recipes-devtools/python/python3/support_SOURCE_DATE_EPOCH_in_py_compile.patch

I think with this we just need to make sure to set SOURCE_DATE_EPOCH as we need it.

@agners
Copy link
Contributor Author

agners commented Jan 29, 2019

My first try was adding a recipes-devtools/python/python3_%.bbappend which directly sets SOURCE_DATE_EPOCH:

export SOURCE_DATE_EPOCH ?= "0"

This works for the core libraries. However, all libraries (e.g. python3-urllib3) still suffer from the issue. For my test program docker-compose, this cuts down stale bytecode warnings from 459 to 326 files. And execution time from 12.5s to 9.5s. With all files fixed (using the above script), I get an execution time of 4.5s...

I then tired setting export SOURCE_DATE_EPOCH ?= "0" in openembedded-core/meta/classes/python3-dir.bbclass. This uncovered another interesting issue: The recipe python3-vcversioner builds an Python egg file, which uses zip. Zip seems to have a restriction on mtime it supports:

|   File "/home/ags/torizoncore/build-colibri-imx7/tmp-torizon/work/x86_64-linux/python3-vcversio
ner-native/2.16.0.0-r0/recipe-sysroot-native/usr/lib/python3.5/zipfile.py", line 338, in __init__
|     raise ValueError('ZIP does not support timestamps before 1980')
| ValueError: ZIP does not support timestamps before 1980

I am not sure why python3-vcversioner needs/is building this egg file. Removing the do_compile_append and do_install_append from the recipe seems to work fine and builds the rootfs successfully. I am not sure how to solve this properly. I think it would be good if OSTree has a way to define the mtime it is using, so we could use a timestamp somewhat more recent...

With that, I have 0 bytecode warnings, and docker-compose starts in 4.2s.

However, this is not ideal since we cannot easily alter python3-dir.bbclass from meta-updater. If anybody has an idea how to hook into all Python recipe from meta-updater I would be interested to hear.

At this point we might as well just use reproducible builds for the whole rootfs. There are probably even more advantages: E.g. currently, when clearing tmp/sstate and rebuilding, the resulting OSTree contains a lot of differences to the earlier version, just due to timestamp etc. So for OSTree, using reproducible builds also helps keeping the deltas small even when not reusing sstate...

@pattivacek
Copy link
Collaborator

At this point we might as well just use reproducible builds for the whole rootfs. There are probably even more advantages: E.g. currently, when clearing tmp/sstate and rebuilding, the resulting OSTree contains a lot of differences to the earlier version, just due to timestamp etc. So for OSTree, using reproducible builds also helps keeping the deltas small even when not reusing sstate...

I like the sound of that! What are the drawbacks to that approach? I haven't read up on the topic in a while.

@OYTIS
Copy link
Contributor

OYTIS commented Jan 29, 2019

@patrickvacek
I don't think there can be any drawbacks, we should try to be as close to reproducible builds as possible when using OSTree. We already had a recommendation to use static UIDs at some point in our docs, but I believe it has been lost somehow.

@agners
By using reproducible builds for the whore rootfs you mean specifying SOURCE_DATE_EPOCH = "0" globally, or do you have something else in mind?

@agners
Copy link
Contributor Author

agners commented Jan 29, 2019

@OYTIS yeah OE/Yocto documentation is lacking in that domain. From how I understand reproducible_build.bbclass and reproducible_build_simple.bbclass, the following lines e.g. in local.conf should do the job:

INHERIT += "reproducible_build_simple"
                                      
export SOURCE_DATE_EPOCH ?= "0"       
REPRODUCIBLE_TIMESTAMP_ROOTFS ?= "0"  

I guess we could put this into classes/sota.bbclass or similar.

@OYTIS
Copy link
Contributor

OYTIS commented Jan 29, 2019

@agners Thank you. I'm totally for it. But the problem with python eggs is still going to stay as far as I understand. And we can't set SOURCE_DATE_EPOCH to anything after 1980 because it will not solve the problem with *.pyc and libostree.

@agners
Copy link
Contributor Author

agners commented Jan 29, 2019

@OYTIS yes, however, I think that egg generation is totally unnecessary. I sent a patch removing it upstream:

http://lists.openembedded.org/pipermail/openembedded-devel/2019-January/198220.html

Still, if there is a recipe which needs to pack some stuff into a zip file, we might run into troubles again. So I'd rather prefer to have OSTree supporting other timestamps than 0...

@OYTIS
Copy link
Contributor

OYTIS commented Feb 11, 2019

@agners Is there something else to do after #467 has been merged?

@agners
Copy link
Contributor Author

agners commented Feb 19, 2019

@OYTIS with #467 merged Python now makes use of the precompiled bytecode and is much faster again.

@agners agners closed this as completed Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants