Compiling Python packages with LPython #992

czgdp1807 · 2022-08-19T13:17:54Z

Following is the plan to achieve the goal mentioned in issue title,

Write down small type annotated test packages in Python syntax. The nesting can vary from just 1 folder/package having an __init__.py file to 3-4 folders/subpackages one inside the other each level having an __init__.py. We can write packages for different sorting algorithms, graph algorithms and some practical stuff like that. We should try not to use workarounds but write code in a way that feels natural to us.
Try to compile those test packages starting from easy ones (with just 1 folder) to difficult ones. Do sprints to quickly make them work then send PRs each having a single fix.
Then proceed with small packages from PyPI (there must be many). Try to compile and fix LPython as described in step 2.
Once all of the above is done, move on to advanced Python packages.

Alongside we should also try to interface with CPython code (#703).

@certik What do you say?

The text was updated successfully, but these errors were encountered:

certik · 2022-08-19T16:15:27Z

Yes, we should get LPython working with actual Python packages, that we write ourselves to ensure we stick to the subset that LPython supports. And the goal here is to fix up the inevitable bugs that we either know about or that we will hit. After most things work, it will allow people to upload packages to PyPi and to have dependencies, etc. We can use regular Python tooling to install dependencies and download/upload from PyPi. We have to improve LPython to be able to compile such a package with dependencies.

We probably also need to add a mode to compile the library and all dependencies to some kind of a "mod" file that stores the ASR in it, for quick compilation, so that when you modify a script, only that script has to be parsed and compiled to ASR, all other modules can be just deserialized. LFortran works this way, and it works great.

Finally, we should then start to write libraries of reusable Python code:

NumPy (we'll ship with LPython)
All the packages from the Python standard library (we'll ship with LPython)
Matplotlib
SciPy

Effectively we need "lite" versions of such libraries, written in pure LPython. We might later venture into web servers too (Flask, etc.), but I would start with numerical / scientific libraries.

We can start creating packages in the Python standard library as external 3rd party packages, and then we'll simply download them / pull them into the LPython distribution, since we need to ship with them by default. The same with NumPy.

Ideall LPython itself (AST->ASR) has support for all the features that are needed, including a small subset of NumPy, such that all of NumPy and all of the Python Standard Library can be implemented as a regular LPython package, no magic / help from the compiler. Equivalent to user code.

czgdp1807 · 2022-08-19T16:55:10Z

I think we can re-use the serialization-deserialisation mechanism of LFortran and use it here to compile Python modules till ASR level. In fact, modules written by us right now compile (parsed + AST->ASR) only when we import them. However LFortran pre-compiles them till ASR and loads them if we import in our program. LPython should behave in a similar manner I think. Also, the question is how to integrate LPython with a build system (writing build files in simple words) so that LPython compiles all the intrinsic modules defined in it when its being built from source.

certik · 2022-08-20T06:35:13Z

Yes. I think CPython compiles every module into .pyc, so LPython would compile every module to .lpy (or even to .pyc to reuse the extension), and the compiled module would be just serialized ASR. Just like LFortran works with the .mod files.

czgdp1807 · 2022-08-20T11:12:19Z

I see. Let me try implementing this idea then. Let's see how far we can get.

czgdp1807 · 2022-08-24T10:29:45Z

In order to achieve first and second point in #992 (comment), I think we need to write CMakeLists.txt for the small test package. For now we can use add_custom_command to generate LLVM object files and then link those object files together. So at the end for a single sub-package a static/shared binary will be generated. Does this make sense?

certik · 2022-08-24T13:35:07Z

For end users the main mode that we want to support is automatic compilation, just like CPython works.

Is the CMake based mode just for testing that we can compile things by hand.

czgdp1807 · 2022-08-24T13:39:07Z

I think LPython won't be detected as a compiler by a CMake, right? So add_custom_command will help in compiling those .py files to .o. Basically I am saying that pipeline should be same as C++/C projects because we are generating binaries for Python files same as what Clang/GCC do for C/C++ files.

For end users the main mode that we want to support is automatic compilation, just like CPython works.

By automatic compilation you mean that a module will be compiled to pyc file only when its imported? AFAIK, CPython creates pyc files only when we run the file importing those modules. I am not sure about this though.

certik · 2022-08-24T13:50:27Z

We will probably support both modes:

use cmake
not use cmake, just do lpython a.py and if a.py imports a package b, the package b gets automatically compiled (and we can support some command line flag to compile just the package b into a list of .pyc files.)

I think most users would prefer the second approach.

Or to rephrase it -- the cmake build system can be automatically generated by LPython, so why bother and not just compile things directly? We know all the information.

czgdp1807 · 2022-08-24T13:57:17Z

I see. Makes sense. Second approach would be much easier. We will have to implement timestamp strategy to check whether the file has been changed or not. Do you know how to fetch the timestamp the last time a file was modified on disk a.k.a modification time? https://stackoverflow.com/a/40504396 seems to be a promising way to do get the modification time of a file but if you know of anything better then we can go for that as well.

czgdp1807 · 2022-08-29T08:55:47Z

So I am planning to implement the strategy in my above comment. However the problem is that when you auto-compile and re-use the auto-compiled modules then symbol table IDs depend on the order in which you compile those modules. This would have consequences on python run_tests.py -u since it doesn't follow any particular order of compilation. Now say test a.py and b.py use a module m.py then if a.py gets compiled first then m.py will be different b.py and hence ASR output of b.py will not match. Something similar will happen if b.py gets compiled first. So how to deal with this situation? Because once a module is compiled then it changes the Symbol Table IDs of the rest of the program as well.

certik · 2022-08-29T16:15:11Z

This is dealt with in the Fortran desearilization by constructing a new symbol table and resolving it correctly. So everything should just work, as long as we reuse all the code, which I strongly recommend. :)

certik · 2022-10-04T16:05:52Z

To move forward:

This is a big issue and we have to split it into smaller tasks and implement them
I don't know all the details what need to be done, so we simply have to start working towards the goal, discover issues as we go, and work on fixing them by creating a good design

Issues that we need to tackle:

Looks like there are two modes, one compiles everything in memory (no saving to disk), the other mode saves to .pyc
In the mode that saves to .pyc, we simply save every module to .pyc, then load it

certik · 2022-10-04T16:15:00Z

Here is our goal:

Imagine a project with 10 dependencies, each of which has 10 dependencies, so total of 200 packages have to be build
The first time we build everything (say it takes 20s)
Then when we modify one file in our project, the dependencies are not built again, only the file that changed is built and our project is rebuilt/linked (ideally this be immediate -- couple milliseconds; or as fast as possible)

certik · 2022-10-04T16:18:51Z

So what pieces do we need to get to the goal?

We must be able to build 200 dependencies, and then reuse the build.

What does that mean?

We need to reuse as much as possible.
Consequently, we have several options:
- compile each file to .pyc (ASR serialization) or (even better): compile each package to one file that contains serialized ASR for the whole package
- compile each file or (even better) each package into both serialized ASR and an object file(s)

The second option is more work for us, and we can tackle it later. We need something similar for LFortran as well, so we can work towards that as we go.

So far I like the idea compile each package to one file that contains serialized ASR for the whole package the most.

certik · 2022-10-04T16:22:33Z

So the first task can be:

Also what we need are nested modules. Say:

+-  test_a.py
+-  a (directory)
    |
    +- __init__.py
    +- b.py
    +- c.py

And we need to get the following working in test_a.py:

from a.b import x
import a.b
from a import b

plus the relative import syntax in Python.

We need to get lpython test_a.py working, and it needs to correctly import a etc.

One issue is how to tell LPython where to look for the "a" package if you do import a. There are several options. It seems one clean option is to use the -I compiler flag, so lpython -I/some/path test_a.py would look into /some/path to find the package a if you do import a from test_a.py.

certik · 2022-10-04T16:31:18Z

Once the above works, we can think about how to build dependencies. I suggest we use pyproject.toml to specify the dependencies (Cargo style), see here for more background and links to other resources: https://stackoverflow.com/questions/62983756/what-is-pyproject-toml-file-for

Thirumalai-Shaktivel · 2022-10-06T17:13:14Z

@Shaikh-Ubaid are you working on this issue?

certik · 2022-10-06T17:37:45Z

Yes, @Shaikh-Ubaid is working on this issue, but if you are interested @Thirumalai-Shaktivel you can go ahead and also work on it, there is plenty of tasks to do.

certik · 2022-11-16T22:00:47Z

I just tested it, here is the latest status on this:

#1305

Thirumalai-Shaktivel · 2022-11-24T05:21:39Z

Related issues:

certik · 2023-03-21T16:39:24Z

It's time to start writing LPython packages, upload to pypi, make them pip installable and see what breaks, see #992 (comment). Report issues that break and we'll fix it.

certik · 2023-03-21T16:44:29Z

Here are three ideas that we can do right away:

Gauss-Legendre integration package, it gives back the weights and points, and you can use it to integrate, same interface as https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.roots_legendre.html
A terminal package, that allows to use colors and that controls the terminal via escape codes (progress bar)
A very simple minimization package, you give it a user function and it finds a minimum using some method.

Make it pip installable (from github).

Smit-create · 2023-03-21T16:51:18Z

I'll start with the first one.

certik · 2023-05-12T01:58:48Z

@Shaikh-Ubaid I figured out the instructions how to install some package using pip, and use (compile) it with LPython:

conda create -n test1 python=3.11
conda activate test1
pip install lpynn lpdraw
lpython -I$CONDA_PREFIX/lib/python3.11/site-packages/ test_pkg_lnn_01.py

where test_pkg_lnn_01.py is taken from integration_tests, but the import was changed from lnn to lpynn. To run this via CPython, do:

conda install numpy
python test_pkg_lnn_01.py

It looks like we can do:

lpynn
lpdraw
nrp

Let's get a few more packages working.

Also let's iterate, perhaps something like this: lpython --include-conda-env a.py, or even a shorter option, perhaps just lpython --conda a.py. I would not allow environment variables by default yet, I want to get more experience using this first.

Shaikh-Ubaid · 2023-05-12T02:27:26Z

How many packages do we currently support? It looks like we can do:

There is also nrp (https://github.com/Shaikh-Ubaid/lpython_packages/tree/main/nrp_pkg, https://pypi.org/project/nrp/).

Let's get a few more packages working.

Also let's iterate, perhaps something like this: lpython --include-conda-env a.py, or even a shorter option, perhaps just lpython --conda a.py. I would not allow environment variables by default yet, I want to get more experience using this first.

Sure.

certik · 2023-05-12T02:37:18Z

It looks like the lpython emulation works perfectly!

certik · 2023-05-12T02:42:47Z

I would version lpython-emulation exactly with the same version as the released LPython version. That way it is clear which one you have to use, depending on which features of LPython the script uses, as well as to avoid any backwards incompatibilities.

certik · 2023-05-12T04:10:09Z

I am upgrading this package: certik/mathfn#5, I can see a few things:

We should finish the lpython conda package to make it easier to install at the CI
I want to depend on a few packages, maybe some terminal package, this mathfn package and then create a full end user application (binary)
Polish the experience, make everything very easy

Shaikh-Ubaid · 2023-05-12T07:14:26Z

I would version lpython-emulation exactly with the same version as the released LPython version.

We recently released LPython 0.14.0. Do you mean we should version lpython_emulation as 0.14.0? Do we update lpython_emulation only when LPython has a new release?

certik · 2023-05-12T20:00:54Z

That's what I would do. Do you see a problem with it?

Shaikh-Ubaid · 2023-05-13T05:25:34Z

That's what I would do. Do you see a problem with it?

Got it. Sure, we can do that. It seems lpython does not actually use/depend-on src/runtime/lpython/lpython.py. lpython handles the types using if-else logical constructs in AST->ASR. I wonder if there is a possibility to distribute src/runtime/lpython/lpython.py (that is lpython_emulation) independently. For example, a use case can be a user who likes the lpython_emulation package and is wanting to use the package to add typing information to his cpython codes.

certik · 2023-05-13T05:43:35Z

LPython depends on lpython.py via integration_tests. We often upgrade a test, then we change lpython.py and AST->ASR. If we separate lpython.py from LPython, then we have to keep them in sync, which I think is more tedious than the current approach. Maybe later once LPython stabilizes we can do that.

Shaikh-Ubaid · 2023-05-13T06:03:03Z

LPython depends on lpython.py via integration_tests. We often upgrade a test, then we change lpython.py and AST->ASR. If we separate lpython.py from LPython, then we have to keep them in sync, which I think is more tedious than the current approach. Maybe later once LPython stabilizes we can do that.

Got it, let's focus on delivering lpython first.

czgdp1807 added the roadmap label Aug 19, 2022

czgdp1807 self-assigned this Aug 19, 2022

certik mentioned this issue Aug 19, 2022

Roadmap #155

Closed

9 tasks

Shaikh-Ubaid self-assigned this Aug 19, 2022

czgdp1807 mentioned this issue Aug 20, 2022

Saving LPython's intrinsic modules as pyc files #999

Merged

czgdp1807 mentioned this issue Aug 30, 2022

Auto-saving and loading modules at compile time #1065

Draft

Thirumalai-Shaktivel mentioned this issue Oct 7, 2022

Compile Simple Python package #1185

Merged

certik unassigned Shaikh-Ubaid Nov 16, 2022

Thirumalai-Shaktivel mentioned this issue Nov 17, 2022

Move nested modules handling into a separate function #1306

Merged

certik mentioned this issue Mar 21, 2023

LPython Roadmap #1600

Open

38 tasks

This was referenced Mar 23, 2023

Supporting Callbacks in LPython #1608

Closed

Add test for a small newton raphson package #1609

Merged

certik mentioned this issue Apr 12, 2023

Deliver LPython MVP #1704

Closed

9 tasks

Shaikh-Ubaid mentioned this issue Apr 20, 2023

PKG: Add package lnn #1719

Merged

Shaikh-Ubaid mentioned this issue May 20, 2023

PKG: Add convexhull computation package #1840

Merged

certik added good first issue Good for newcomers PyCon India 2023 Devsprint labels Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling Python packages with LPython #992

Compiling Python packages with LPython #992

czgdp1807 commented Aug 19, 2022

certik commented Aug 19, 2022 •

edited

Loading

czgdp1807 commented Aug 19, 2022

certik commented Aug 20, 2022

czgdp1807 commented Aug 20, 2022

czgdp1807 commented Aug 24, 2022

certik commented Aug 24, 2022

czgdp1807 commented Aug 24, 2022

certik commented Aug 24, 2022

czgdp1807 commented Aug 24, 2022

czgdp1807 commented Aug 29, 2022

certik commented Aug 29, 2022

certik commented Oct 4, 2022 •

edited

Loading

certik commented Oct 4, 2022

certik commented Oct 4, 2022 •

edited

Loading

certik commented Oct 4, 2022 •

edited

Loading

certik commented Oct 4, 2022

Thirumalai-Shaktivel commented Oct 6, 2022

certik commented Oct 6, 2022

certik commented Nov 16, 2022

Thirumalai-Shaktivel commented Nov 24, 2022

certik commented Mar 21, 2023 •

edited

Loading

certik commented Mar 21, 2023 •

edited

Loading

Smit-create commented Mar 21, 2023

certik commented May 12, 2023 •

edited

Loading

Shaikh-Ubaid commented May 12, 2023 •

edited

Loading

certik commented May 12, 2023

certik commented May 12, 2023

certik commented May 12, 2023

Shaikh-Ubaid commented May 12, 2023

certik commented May 12, 2023

Shaikh-Ubaid commented May 13, 2023

certik commented May 13, 2023

Shaikh-Ubaid commented May 13, 2023

Compiling Python packages with LPython #992

Compiling Python packages with LPython #992

Comments

czgdp1807 commented Aug 19, 2022

certik commented Aug 19, 2022 • edited Loading

czgdp1807 commented Aug 19, 2022

certik commented Aug 20, 2022

czgdp1807 commented Aug 20, 2022

czgdp1807 commented Aug 24, 2022

certik commented Aug 24, 2022

czgdp1807 commented Aug 24, 2022

certik commented Aug 24, 2022

czgdp1807 commented Aug 24, 2022

czgdp1807 commented Aug 29, 2022

certik commented Aug 29, 2022

certik commented Oct 4, 2022 • edited Loading

certik commented Oct 4, 2022

certik commented Oct 4, 2022 • edited Loading

certik commented Oct 4, 2022 • edited Loading

certik commented Oct 4, 2022

Thirumalai-Shaktivel commented Oct 6, 2022

certik commented Oct 6, 2022

certik commented Nov 16, 2022

Thirumalai-Shaktivel commented Nov 24, 2022

certik commented Mar 21, 2023 • edited Loading

certik commented Mar 21, 2023 • edited Loading

Smit-create commented Mar 21, 2023

certik commented May 12, 2023 • edited Loading

Shaikh-Ubaid commented May 12, 2023 • edited Loading

certik commented May 12, 2023

certik commented May 12, 2023

certik commented May 12, 2023

Shaikh-Ubaid commented May 12, 2023

certik commented May 12, 2023

Shaikh-Ubaid commented May 13, 2023

certik commented May 13, 2023

Shaikh-Ubaid commented May 13, 2023

certik commented Aug 19, 2022 •

edited

Loading

certik commented Oct 4, 2022 •

edited

Loading

certik commented Oct 4, 2022 •

edited

Loading

certik commented Oct 4, 2022 •

edited

Loading

certik commented Mar 21, 2023 •

edited

Loading

certik commented Mar 21, 2023 •

edited

Loading

certik commented May 12, 2023 •

edited

Loading

Shaikh-Ubaid commented May 12, 2023 •

edited

Loading