-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reference: feat: replace v1 with v2 #1690
Conversation
9b7f7d0
to
344276e
Compare
Removing the v1 Python layer is fairly easy. Need to think about the C++ layer; the Forth machine returns a |
Knowing that the split is coming up soon, we've been using the ForthMachine in ways that don't depend on the return values being The Anyway, the point is that we've been using the Footnotes
|
d3cd98e
to
511356f
Compare
Phew. Now v1-less Awkward compiles and passes most tests, having removed the array and type machinery. I've removed the low hanging fruit in the C++ side, so I've therefore left a much reduced version of New bugs:
|
We should be able to remove |
I think it depends if we want to be passing around unbounded arrays. Actually, this is barely a point at the moment because the IIRC you never get an index directly; rather, you get the output and call Otherwise, I still need to remove the unused kernels... |
In the way AwkwardForth is used, the output of
Perhaps removing unused kernels should be a separate PR. We might even be able to find more that could be removed and haven't yet. Some of them might be equivalent to |
It's okay. But I think we can remove more, and fishing for them can be a different PR. |
Cool. I found these ones by directly looking with (Xonsh) import yaml
import re
kernel_spec = yaml.safe_load(p"/home/angus/Git/awkward/kernel-specification.yml".read_text())['kernels']
in_use_specs = set($(rg r"awkward_\w+" src/awkward/contents src/awkward/_reducers.py -oNI).splitlines())
name_to_specs = {g['name']: [s['name'] for s in g['specializations']] for g in kernel_spec}
spec_to_name = {v: k for k, s in name_to_specs.items() for v in s}
# For cleaning up the spec file
in_use_names = (name_to_specs.keys() & in_use_specs)
# For removing kernels
for f in [f for f in pg`src/cpu-kernels/awkward*.cpp` if not any(s in f.read_text() for s in used_specs)]:
f.unlink() |
Yeah, now that all the removed code is gone, it's easier to see that we don't need an object that holds both length and ptr (besides the output itself). The main "issue" is internally within the Forth machine, rather than the bindings. It should be possible to just access the (shared) pointer directly within the Forth routines, and wrap it in the bindings. |
Agreed, there are several that I know of (not sure if they're still carried through in this PR, haven't checked) ;) |
83f5a1a
to
987ec50
Compare
OK, this now removes |
I haven't fully pulled out the kernel dispatch mechanism yet. I'll tentatively note that I'm not intimately familiar with this code, but my current understanding is that our kernel handling has all been superseded by the nplike dispatch mechnism. |
Any kernel-dispatch in C++ is old. We have an entirely new kernel-dispatch system in Python, using ctypes. Anything you find in C++ about Another thing that can go is the dlpack git submodule. That was all for supporting CUDA arrays in C++, which is now handled by CuPy in nplike. |
It's already gone! |
15de1cd
to
ec8390c
Compare
3546e43
to
a992b09
Compare
OK, now we're at a point where there's not much else (anything?) that needs to be removed. The main changes are now described in this PRs description. |
a992b09
to
be6445a
Compare
I've added some rudimentary time handling. I originally tried employing If I calculate the local epoch using: +std::chrono::system_clock::time_point local_epoch() {
+ std::tm tm = {
+ /* .tm_sec = */ 0,
+ /* .tm_min = */ 0,
+ /* .tm_hour = */ 0,
+ /* .tm_mday = */ 1,
+ /* .tm_mon = */ 1 - 1,
+ /* .tm_year = */ 1970 - 1900,
+ };
+ // Use local time zone DST
+ tm.tm_isdst = -1;
+ return std::chrono::system_clock::from_time_t(std::mktime(&tm));
+} Then when I use this in these calculations, I get one-hour shifts for particular datetimes: + auto time = obj.cast<std::chrono::system_clock::time_point>();
+ int64_t time_since_epoch_us = std::chrono::duration_cast<std::chrono::microseconds>(
+ time - local_epoch()
+ ).count();
+ std::cout << "time since epoch " << time_since_epoch_us << std::endl;
+ std::cout << "time since epoch cpp " << std::chrono::duration_cast<std::chrono::microseconds>(time.time_since_epoch()).count() << std::endl;
+ self.datetime(time_since_epoch_us, "datetime64[us]"); I am currently under the impression that this is because the timezones differ between Python and C++. I'm looking into this. |
The above code example is wrong - the epoch should not be specified in local time (via Now that I'm more familiar with
|
Actually, I didn't think about this - the timezone boundaries do not map onto distinct values: 01:00 and 02:00 (UTC) in my current timezone on 2022/03/27 are both the same local times (01:00):
|
Since NumPy does not deal with timezones (its times are timezone-naive, which is to say that knowledge of the timezone is not contained within the array, but is managed by the user in some external metadata somewhere), then we have to operate at the same level (also timezone-naive). Python's Probably the least-surprising way to turn timezone-aware This is not the worst thing to happen to people who deal with time-data... |
We're on the same page w.r.t whether we support timezones! In this PR, the issue I'm running into is concerned with the pybind11 bindings for Actually, writing this response has given me an idea for a "hack". These problems all go away if the localtime is UTC, because there are no DST periods. However, setting the |
@jpivarski NB I haven't touched "studies/" yet, because that should be a simple find-replace and I didn't want to grow this stage of the PR review yet. |
I'll make this a todo for later review |
The studies/ directory should be left as-is. It's just a record of the temporary work that we did to figure out the algorithms, and so removing v1 doesn't change anything there. The code in studies/ doesn't even need to be able to run. It's a historical record. |
There's a lot of JSON-handling code that can now be removed, too. Would you mind if I That is, you're not working on something that would be hard to merge if I remove some C++? |
@jpivarski sure, which module is this? I thought I'd removed the unused parts. Perhaps some of src/libawkward/io/json.cpp is not required actually. |
There's more, but it's not specifically called out. Also, I can remove the non-C++ template LayoutBuilder (i.e. the old one). I can do the JSON and the LayoutBuilder in two separate commits, for record-keeping. |
I just did |
Ah yes - I left the old layout builder because it still works, and it can be used right-now (whilst we don't have a jitted version of the new layout builder). |
Are there tests with the old LayoutBuilder? The motivation for reimplementing LayoutBuilder was so that it could be optimized for use in compiled loops. A Python interface to such a thing doesn't make sense because you wouldn't be able to use it in compiled loops. The C++ LayoutBuilder will eventually be usable in C++ JIT-compiled code (cppyy) and another implementation is planned for Numba. But if we have any tests now with LayoutBuilder in a Python loop, we can just remove them without replacement, since future uses of LayoutBuilder won't involve anything like Python loops. |
@jpivarski Yes, |
Ah good catch @jpivarski. And I hadn't realised that the ToJSON wasn't used any more. I wonder if it was used for the forms, which I had temporarily left in after removing the contents? 🤔 |
I don't think so; I think the forms built JSON with strings. As for the two versions of from-JSON: there would have been no way to know. I recently revamped the JSON-handling code, and when I needed a C++ function, I wrote a new one beside the old one so that it would be easier to remove the old one (what I just did). There wasn't an obvious indication that one was old and the other new. I'm almost done with the LayoutBuilder. |
@@ -1,12 +1,12 @@ | |||
import awkward as ak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file (being in the studies/ directory) didn't need to be changed, but it's not bad to drop all the "._v2"s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
As you know, I made some changes. Look it over and see if you agree/have any changes to make based on these. I have created a main-v1
branch. If you think this is done, you can do the honors and merge it into main
as a single commit.
(After that, PR #1666 will have to be closed and re-applied to main
and possibly main-v1
. I don't see how it can be easily merged with this, and anyway it's mostly generated by a pre-commit configuration.)
Final (?) compilation times (pip install .
) on a Mac with 8 cores:
- with v1 (old): 265.87s user 13.39s system 333% cpu 1:23.73 total
- this PR (new): 86.52s user 7.11s system 371% cpu 25.186 total
Compiled binary sizes:
- with v1 (old): 9.5M _ext.cpython-310-darwin.so, 1.3M libawkward-cpu-kernels.dylib, 6.1M libawkward.dylib
- this PR (new): 1.1M _ext.cpython-310-darwin.so, 729K libawkward-cpu-kernels.dylib, 921K libawkward.dylib
Loading (import awkward
) times:
- with v1 (old): 95 ms
- this PR (new): 83 ms (I guess it wasn't dominated by loading compiled symbols)
Anyway, awesome work! Disentangling v1 from v2 was not trivial!
I'm happy to merge this. I won't update the docs, because we need to do more than just replace There may well be some bugs / quirks in this branch, but I think at this point we pass the test suite, and its had 1/2 other pairs of eyes on it, so I am happy to iterate in main for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Just a few minor comments. Please, check
Closed by #1721 |
This is just me playing around with removing v1. Should you want to do this yourself, no problem!
malloc
andfree
instead of Awkward allocators.kernel::malloc
withmalloc
ornew
datetime.datetime
infrom_iter
array_deleter
toinclude/awkward/util.h
pyobject_deleter
with the appropriate pybind11 feature.LayoutBuilder
/ArrayBuilder
to_ext
Fixes ak._v2.from_iter should recognize Python datetimes/timedeltas #1701