Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python bindings #197

Merged
merged 4 commits into from
Aug 25, 2024

Conversation

zacikpa
Copy link
Contributor

@zacikpa zacikpa commented May 2, 2024

This PR adds Python bindings in the form of a complete Python package.

Features

The bindings are implemented via ctypes and currently include a subset of libcpuid functionality (mainly cpu_identify and get_cpu_list). I did not intend to provide a one-to-one API; instead, I aimed for a more Pythonic one.

Documentation

The documentation for the package can be build using Sphinx. Currently, it is live at https://libcpuid.readthedocs.io/en/latest/index.html (built from my fork). Ideally, the live docs could be built from this repository (once a repo is registered and linked to a readthedocs account, the build process of live docs requires no maintenance).

PyPi

I intend to publish the package also at PyPi. Currently, it is published at TestPyPi (https://test.pypi.org/project/libcpuid) and can be installed using pip install -i https://test.pypi.org/simple/ libcpuid.

Why?

As developers of TuneD (https://github.com/redhat-performance/tuned), we would like to use libcpuid functionality from Python to be able to restrict certain system tunings to e.g., systems with specific CPU models or their successors. In other words, we want to utilize cpu_identify and also the fact that get_cpu_list claims to return the list of CPU models in the order in which they were introduced.

@yarda
Copy link

yarda commented May 2, 2024

We would like to rely on the fact that the cpuid_get_cpu_list output is "in approximate chronological introduction order of the parts" [1].

[1] https://libcpuid.sourceforge.net/doxy/group__libcpuid.html#ga1b0131f6d12aca2d002175abcd9a1d20

@zacikpa zacikpa force-pushed the python-binding branch 3 times, most recently from 71b2bf0 to a29cf2d Compare May 3, 2024 13:59
@zacikpa zacikpa marked this pull request as draft May 3, 2024 14:01
@zacikpa
Copy link
Contributor Author

zacikpa commented May 3, 2024

I converted the PR to a draft because I'm still working on the tests, but feedback would be appreciated nonetheless: especially whether the binding could actually be accepted into this repository once polished.

@zacikpa
Copy link
Contributor Author

zacikpa commented Jun 10, 2024

Ping @TheTumultuousUnicornOfDarkness since there is no feedback yet: is it realistic to expect that this addition will eventually be reviewed and merged?

An alternative is maintaining the binding in a separate repository, but having it incorporated in libcpuid would make the packaging process much simpler in Fedora/RHEL (where the binding will be used by TuneD).

@yarda
Copy link

yarda commented Jun 10, 2024

The PR is still marked as a draft. I think the merge/review doesn't hurry, but it would be great to have a feedback from the libcpuid upstream maintainer whether we are on a good path with the PR and whether it's realistic to expect that the binding will be accepted into libcpuid.

@anrieff
Copy link
Owner

anrieff commented Jun 10, 2024

@zacikpa I cannot truly review the python-side of the work, but I will get someone I trust involved so we can provide more comprehensive feedback. Bearing that in mind,

  • the work is already quite polished and will onboard more developers into using libcpuid, and, importantly, providing CPUID dumps of unknown systems. In particular, be sure that you add bindings for cpuid_serialize_all_raw_data

  • in terms of the usage suggested: all seem very nice, except for CPUID.has_feature. The given example:

    print(cpuid.has_feature(libcpuid.enums.CPUFeature.FPU))

    is quite wordy. I would have liked print(cpuid.has_feature("fpu")) or even print("fpu" in cpuid.features) better.
    This is also quite in line with what the Linux kernel supplies in /proc/cpuinfo and what we print via cpuid_tool - see cpu_feature_str().

  • as per what the TuneD guys want: yes, the list returned by get_cpu_list is in approximately chronological order, but:

    • both Intel and AMD have models fit for different markets, an old example could be intel's Atom vs Nehalem architectures, which were introduced around the same years, but are radically different in terms of performance/caches and internal microarchitecture. In a sense, the models have different lineages. A list is not a
      very good representation of these historical developments: a tree would have been better. I'd suggest that you put more thought about what are you really after, and probably submit an issue so we can help you more.
    • both the Intel and AMD lists are quite verbose, have you seen what cpuid_tool --cpulist outputs? You'll likely want to filter that at least a bit
    • there are some idiosyncratic cases where the chronological order is insanely broken. E.g. Itanium and Itanium 2 are the last two entries in the Intel list, despite being introduced 20+ years ago.
  • before merging the python bindings with the main cpuid library, I'd probably want to automate some of the work that the future maintainers will need to do. E.g. the very long list of CPU flags (the CPUFeature IntEnum in your code) will probably need to be autogenerated from parsing libcpuid.h. We already do this for consistency checks - see libcpuid/check-consistency.py. Also, probably all C functions that the python bindings expose should be marked clearly as being "python-bound", i.e. changes to the API there should require attention to the Python-side too.

All in all, this is great work and I'd love seeing it in the main project.

@TheTumultuousUnicornOfDarkness
Copy link
Collaborator

Ping @TheTumultuousUnicornOfDarkness since there is no feedback yet: is it realistic to expect that this addition will eventually be reviewed and merged?

I see you edited your message. anrieff is the project leader, I am just a contributor. Personally, I am not against this feature, but I do not have experience in Python bindings so I do not think I am legitimate to review this PR.
Like anrieff, I think some stuff needs to be generated automatically to ensure that nothing is missing in bindings. For example, I am working on ARM CPU support, and I am adding a lot of new values (vendors, features...), meaning that I would have to update consts.py and enums.py files. So at least we need something to check discrepancies (maybe by updating libcpuid/check-consistency.py?).

@yarda
Copy link

yarda commented Jun 10, 2024

ARM support? This would be really great. It's a real mess with ARM :)

@TheTumultuousUnicornOfDarkness
Copy link
Collaborator

@zacikpa maybe there are programs that can generate Python bindings from C, like SWIG (I never tried).

@zacikpa zacikpa force-pushed the python-binding branch 2 times, most recently from 1a2bcc1 to da0de29 Compare July 1, 2024 10:07
@zacikpa
Copy link
Contributor Author

zacikpa commented Jul 1, 2024

@anrieff @TheTumultuousUnicornOfDarkness Thanks to both of you for the positive feedback.

I've just converted the bindings from ctypes (a built-in Python library) to cffi, which is a more powerful library for building C interfaces. Among other things, it allows me to address your concerns about the maintainability of the bindings. As an example, if you now look at enums.py, you see that its contents are auto-generated from the corresponding C enums.

in terms of the usage suggested: all seem very nice, except for CPUID.has_feature. The given example:

print(cpuid.has_feature(libcpuid.enums.CPUFeature.FPU))

is quite wordy. I would have liked print(cpuid.has_feature("fpu")) or even print("fpu" in cpuid.features) better.
This is also quite in line with what the Linux kernel supplies in /proc/cpuinfo and what we print via cpuid_tool - see cpu_feature_str().

As per your feedback, I updated the function to a property called CPUID.flags which corresponds to the C API, but returns a list (we could also make it a set). I would, however, stick to using the enums as compared to strings - it is a more modern Pythonic approach. Regarding the wordiness problem: you can resolve that using a more specific import:

import libcpuid
from libcpuid.enums import CPUFeature

cpuid = libcpuid.cpu_identify()
print(CPUFeature.FPU in cpuid.flags)

In particular, be sure that you add bindings for cpuid_serialize_all_raw_data.

I'm now working on extending the binding further, so I'll surely include it. However, I believe that the bindings do not have to be complete to be reviewed and merged. What's your opinion on that? I would like to make the PR ready to review after I finally add those tests 😄.

@zacikpa zacikpa force-pushed the python-binding branch 3 times, most recently from 8bd1c35 to 820b903 Compare July 24, 2024 07:01
@zacikpa
Copy link
Contributor Author

zacikpa commented Jul 24, 2024

The bindings now cover the entire functionality of libcpuid and there are some simple sanity tests, so I'm converting this PR to review-ready. I also did some refactoring of the interface, see the live docs for reference.

The PR includes two commits that slightly change the C library itself. Commit c3eeb48 may be breaking, so I'm open to discussing whether it's appropriate (if not, I can implement a workaround).

For now, manual testing can be done by installing the libcpuid C library at 9eb3cf3 normally (make sure that pkg-config can find it) and then running:

python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ libcpuid

This installs the Python package from the TestPyPI repository.

@zacikpa zacikpa marked this pull request as ready for review July 24, 2024 07:25
@zacikpa
Copy link
Contributor Author

zacikpa commented Jul 24, 2024

I just realized I did no respond to this comment:

@zacikpa maybe there are programs that can generate Python bindings from C, like SWIG (I never tried).

Yes, such tools exist, but I believe the resulting API would be very C-like. For example, you would have to manually free instances of cpu_raw_data_array_t. The CPURawDataArray class in the binding (which is basically a wrapper around the C structure) does the freeing for you in its destructor.

@TheTumultuousUnicornOfDarkness
Copy link
Collaborator

TheTumultuousUnicornOfDarkness commented Jul 26, 2024

About c3eeb48: cpu_feature_level_t is not part of any release yet, so I am OK with that change.

About 9eb3cf3: sizeof(logical_cpu_t) is 2 bytes for now, but it might evolve in a very long future (if there are systems with more than 65535 cores, of course). I dislike this kind of magic value, please write at comment a least.

@anrieff, I let you review/merge this PR if you agree.

This makes the option names of cpu_feature_level_t consistent
and it also makes it easier to list options of the cpu_feature_t
enum just by their prefix (because it now doesn't collide with
the prefix of cpu_feature_level_t options).
The bindings are implemented via python-cffi and cover all
current functionality of the library. They do not provide
a 1-to-1 mapping of the functionality, but a more "Pythonic"
API instead.

When new functionality is added to the libcpuid library
in the future, the changes must be manually added to the
bindings in order to appear in the Python interface.
However, if only a C enum is extended, the changes will
be automatically reflected in the Python interface.
Additionally, setup the documentation for
deployment to Read The Docs.
And include them in the CI together with pylint
and formatting checks.
@zacikpa
Copy link
Contributor Author

zacikpa commented Aug 12, 2024

Thank you for your feedback, @TheTumultuousUnicornOfDarkness.

Regarding 9eb3cf3: I also wasn't content with the change, so I reverted it and replaced it with another solution (though quite hackish). Essentially, the reason for the change was that CFFI cannot parse sizeof in supplied C declarations. When the binding is being built, it now does simple preprocessing to address this issue, replacing each sizeof with its value.

@TheTumultuousUnicornOfDarkness
Copy link
Collaborator

Nice! The sizeof() approach seems future-proof.

About the documentation, how does it work, is the documentation automatically updated?
The "Edit on GitHub" link is redirecting to your repository.

It looks good to me. I would like to see this PR merged before we release the next version (i.e. v0.7.0). @anrieff is probably on holiday, maybe I can merge this PR myself.

@zacikpa
Copy link
Contributor Author

zacikpa commented Aug 12, 2024

About the documentation, how does it work, is the documentation automatically updated?
The "Edit on GitHub" link is redirecting to your repository.

Yes, the documentation is automatically built with each commit, currently from my fork. If this gets merged, I would take it down to free the domain and either you or @anrieff could set it up on the main repo. The easiest setup is essentially just logging in to https://readthedocs.org via a GitHub account, plus a couple clicks to import the repo.

@TheTumultuousUnicornOfDarkness TheTumultuousUnicornOfDarkness merged commit 9915832 into anrieff:master Aug 25, 2024
19 checks passed
@zacikpa
Copy link
Contributor Author

zacikpa commented Aug 26, 2024

Thanks for merging, @TheTumultuousUnicornOfDarkness. I just released the domain libcpuid.readthedocs.io, so it should be possible to set up the documentation at the domain for the main repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants