Add instructions for creating small objects. #609

markshannon · 2023-07-28T15:49:36Z

When we use a small constant in Python code, we compile it as LOAD_CONST.
This looks efficient, but ignores the fact that the constant object has to be created by marshal, which is quite slow compared to the main interpreter.

This is fine for code in functions as they are expected to be executed many times, but for run once code, like module and class code, this is not so good.

Rather than have marshal create the object, to be put in a constants array to be used only once, we should create the objects in bytecode.

To do that, we want to add the following instructions:

LOAD_INT: Loads a small int (0-255)
MAKE_FLOAT: Makes a float.
MAKE_ASCII: Makes a string from ASCII data
MAKE_UNICODE: Makes a string from UTF-8 data
MAKE_LONG: Make a long object from an array of bytes.

These bytecodes are needed for #566, so this would be a useful halfway-house.

Where to put the data?
In #566 I suggest a separate array.
That won't work here because there is no separate array, but there is no reason why it cannot follow the instruction.
It would make disassembly harder, but shouldn't matter for performance, as we don't expect to be branching much in the run-once code.

For this to be efficient, we need to avoid copying and traversing the code of the code object too many times.
See ~~#608~~ ~~#462~~ #566 [Guido: I think that it's now #566, since #462 is closed] for how to deal with this.

This complements #583 which reduces code object size, and thus code object loading time.

The text was updated successfully, but these errors were encountered:

gvanrossum · 2023-08-23T00:32:31Z

I am definitely in favor of LOAD_INT, since the data fits in the oparg byte.

Let's look at LOAD_FLOAT next. Here marshal v1 writes a decimal string (basically "%17g" % x) but newer versions write a binary format -- however it's still a portable format, using PyFloat_Unpack8. If we could just write the binary data (as if it's an 8 byte binary string), the cost of unmarshalling would be reduced to just a call to PyFloat_FromDouble, which is presumably the minimal cost. (Note that the data bytes aren't copied but delivered as a char* pointer, at least when unmarshalling from an in-memory buffer -- as is always the case because that's how importlib works.) This seems a little easier to accomplish (just bump the marshal version) than adding infrastructure for variable-length instructions (which I've tried, and found doable but complicated -- it's different than caches, which are special-cased in many places).

For short strings and medium-sized integers, we can store up to 4 bytes of data using EXTENDED_ARG without making the instruction format nominally harder. Not sure if that's enough. :-)

jneb · 2023-08-25T09:11:34Z

Considering that there is already a table of frequent strings (I recall it containing the empty string and the one character strings of printable characters), there could be LOAD_FIXED_STR that gets an index into this table.
This would encourage putting more entries into the table.

markshannon · 2024-10-28T15:31:16Z

We've added LOAD_SMALL_INT in python/cpython#125972.

markshannon mentioned this issue Jul 28, 2023

Compact the co_code attribute of code objects. #608

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add instructions for creating small objects. #609

Add instructions for creating small objects. #609

markshannon commented Jul 28, 2023 •

edited by gvanrossum

Loading

gvanrossum commented Aug 23, 2023

jneb commented Aug 25, 2023

markshannon commented Oct 28, 2024 •

edited

Loading

Add instructions for creating small objects. #609

Add instructions for creating small objects. #609

Comments

markshannon commented Jul 28, 2023 • edited by gvanrossum Loading

gvanrossum commented Aug 23, 2023

jneb commented Aug 25, 2023

markshannon commented Oct 28, 2024 • edited Loading

markshannon commented Jul 28, 2023 •

edited by gvanrossum

Loading

markshannon commented Oct 28, 2024 •

edited

Loading