fix: all AwkwardForth Forms now agree with awkward_form method output. #790

jpivarski · 2022-11-30T00:16:20Z

Quoting #754,

If they don't, make them the same.

They are now; the differences were only minor and some of the errors were in the non-Forth implementation.

The "uproot" parameters should be removed.

They were already gone before I started.

Some of the "__record__" parameters have the C++ name; this is a good idea. We should make sure that all of them do.

They all do now.

A user-visible difference is that we're excluding all of the TObject header and numbytes_version header stuff (used to be marked with @, now gone). It's still possible to get them back, but you have to run a non-Forth read and pass non-default parameters to the awkward_forth method. I don't think we'll ever need it.

jpivarski · 2022-11-30T00:23:39Z

src/uproot/interpretation/library.py

 def _awkward_json_to_array(awkward, form, array):
    if form["class"] == "NumpyArray":
+        form = awkward.forms.from_json(json.dumps(form))
+        dtype = awkward.types.numpytype.primitive_to_dtype(form.primitive)
        if isinstance(array, awkward.contents.EmptyArray):
-            form = awkward.forms.from_json(json.dumps(form))
-            dtype = awkward.types.numpytype.primitive_to_dtype(form.primitive)
-            return awkward.contents.NumpyArray(numpy.empty(0, dtype=dtype))
+            return awkward.contents.NumpyArray(
+                numpy.empty(0, dtype=dtype),
+                parameters=form.parameters,
+            )
        else:
-            return array
+            return awkward.contents.NumpyArray(
+                numpy.asarray(array.data, dtype=dtype),
+                parameters=form.parameters,
+            )


This corrects an error in the old, non-Forth method. The old method ran Python objects through ak.from_iter (which makes all Python integers int64 and floats float64), then fixes them to agree with the dtypes in the actual Form. Many node types were fixed, but not non-empty leaf nodes. Now those are fixed as well.

jpivarski · 2022-11-30T00:24:44Z

src/uproot/interpretation/__init__.py

        header=False,
-        tobject_header=True,
+        tobject_header=False,


The header data that users generally don't want to see used to be obscured as field names starting with @, but the Forth code doesn't even read them in, so the Form (by default) should not include them, either.

jpivarski · 2022-11-30T00:25:41Z

src/uproot/models/TRef.py

-                temp_aform = f'{{"class": "RecordArray", "contents": {{"fName": {{"class": "ListOffsetArray", "offsets": "i64", "content": {{"class": "NumpyArray", "primitive": "uint8", "inner_shape": [], "parameters": {{"__array__": "char"}}, "form_key": "node{form_keys[2]}"}}, "parameters": {{}}, "form_key": "node{form_keys[1]}"}}, "fSize": {{"class": "NumpyArray", "primitive": "int64", "inner_shape": [], "parameters": {{}}, "form_key": "node{form_keys[3]}"}}, "refs": {{"class": "ListOffsetArray", "offsets": "i64", "content": {{"class": "NumpyArray", "primitive": "int64", "inner_shape": [], "parameters": {{}}, "form_key": "node{form_keys[5]}"}}, "parameters": {{}}, "form_key": "node{form_keys[4]}"}}}}, "parameters": {{}}, "form_key": "node{form_keys[0]}"}}'
+                temp_aform = f'{{"class": "RecordArray", "contents": {{"fName": {{"class": "ListOffsetArray", "offsets": "i64", "content": {{"class": "NumpyArray", "primitive": "uint8", "inner_shape": [], "parameters": {{"__array__": "char"}}, "form_key": "node{form_keys[2]}"}}, "parameters": {{"__array__": "string"}}, "form_key": "node{form_keys[1]}"}}, "fSize": {{"class": "NumpyArray", "primitive": "int32", "inner_shape": [], "parameters": {{}}, "form_key": "node{form_keys[3]}"}}, "refs": {{"class": "ListOffsetArray", "offsets": "i64", "content": {{"class": "NumpyArray", "primitive": "int32", "inner_shape": [], "parameters": {{}}, "form_key": "node{form_keys[5]}"}}, "parameters": {{}}, "form_key": "node{form_keys[4]}"}}}}, "parameters": {{"__record__": "TRefArray"}}, "form_key": "node{form_keys[0]}"}}'
                forth_obj.add_form(json.loads(temp_aform))
                forth_stash.add_to_header(
-                    f"output node{form_keys[1]}-offsets int64\noutput node{form_keys[2]}-data uint8\noutput node{form_keys[3]}-data int64\noutput node{form_keys[4]}-offsets int64\noutput node{form_keys[5]}-data int64\n"
+                    f"output node{form_keys[1]}-offsets int64\noutput node{form_keys[2]}-data uint8\noutput node{form_keys[3]}-data int32\noutput node{form_keys[4]}-offsets int64\noutput node{form_keys[5]}-data int32\n"


This is a correction in Forth: the TRefArray data are 32-bit, but the Forth converted the 32-bit numbers into 64-bit numbers. Now it doesn't.

Also, a C++ __record__ name was missing.

It also adds a string __array__ parameter, which looks right (I don't know which key is the content and parent, but we do need a string and char array, which are now there.

jpivarski · 2022-11-30T00:26:40Z

src/uproot/streamers.py

-                    f'            content[{self.name!r}] = {{"class": "ListOffsetArray", "offsets": "i64", "content": {{ "class": "NumpyArray", "primitive": f"{{uproot._awkward_forth.convert_dtype(uproot._awkward_forth.symbol_dict[self._dtype{len(dtypes)}])}}", "inner_shape": [], "parameters": {{}}, "form_key": f"node{{key}}"}}, "form_key": f"node{{key2}}"}}',
+                    f'            content[{self.name!r}] = {{"class": "RegularArray", "size": {self.array_length}, "content": {{ "class": "NumpyArray", "primitive": f"{{uproot._awkward_forth.convert_dtype(uproot._awkward_forth.symbol_dict[self._dtype{len(dtypes)}])}}", "inner_shape": [], "parameters": {{}}, "form_key": f"node{{key}}"}}, "form_key": f"node{{key2}}"}}',


Regular-sized arrays were being read in as ListOffsetArrays. The Forth code must still be generating fake offsets; I didn't remove that. However, the output is now correct: we get RegularArrays.

jpivarski · 2022-11-30T00:28:11Z

tests/test_0637-setup-tests-for-AwkwardForth.py

@@ -18,6 +18,7 @@ def test_00(is_forth):
        py = branch.array(interp, library="ak", entry_stop=2)
        assert py[0]["0"][0] == "expskin_FluxUnisim"
        # py[-1] == <STLMap {'expskin_FluxUnisim': [0.944759093019904, 1.0890682745548674, ..., 1.1035170311451232, 0.8873957186284592], ...} at 0x7fbc4c1325e0>
+        assert py.layout.form == interp.awkward_form(branch.file)


Every test that can read data with library="ak" checks the array's final Form with awkward_form's predicted Form, and they're all exactly equal. (These are Forms without form_keys.)

…-awkward_form-method

jpivarski · 2022-11-30T00:32:16Z

@aryan26roy, if you want to take a look at this PR, please do, though I know you're busy.

@agoose77, I've annotated the changes to explain them.

When this is merged, Uproot will have all of the next-release updates in place, except for #788, which is just a name change adjustment to follow updates in Awkward. When this is in, I'll focus on Awkward.

agoose77

The changes look motivated enough to me. I don't yet have the deep knowledge to make a more reasoned contribution but equally, these are relatively small tweaks in the scheme of things :)

agoose77 · 2022-11-30T00:35:21Z

src/uproot/models/TRef.py

-                temp_aform = f'{{"class": "RecordArray", "contents": {{"fName": {{"class": "ListOffsetArray", "offsets": "i64", "content": {{"class": "NumpyArray", "primitive": "uint8", "inner_shape": [], "parameters": {{"__array__": "char"}}, "form_key": "node{form_keys[2]}"}}, "parameters": {{}}, "form_key": "node{form_keys[1]}"}}, "fSize": {{"class": "NumpyArray", "primitive": "int64", "inner_shape": [], "parameters": {{}}, "form_key": "node{form_keys[3]}"}}, "refs": {{"class": "ListOffsetArray", "offsets": "i64", "content": {{"class": "NumpyArray", "primitive": "int64", "inner_shape": [], "parameters": {{}}, "form_key": "node{form_keys[5]}"}}, "parameters": {{}}, "form_key": "node{form_keys[4]}"}}}}, "parameters": {{}}, "form_key": "node{form_keys[0]}"}}'
+                temp_aform = f'{{"class": "RecordArray", "contents": {{"fName": {{"class": "ListOffsetArray", "offsets": "i64", "content": {{"class": "NumpyArray", "primitive": "uint8", "inner_shape": [], "parameters": {{"__array__": "char"}}, "form_key": "node{form_keys[2]}"}}, "parameters": {{"__array__": "string"}}, "form_key": "node{form_keys[1]}"}}, "fSize": {{"class": "NumpyArray", "primitive": "int32", "inner_shape": [], "parameters": {{}}, "form_key": "node{form_keys[3]}"}}, "refs": {{"class": "ListOffsetArray", "offsets": "i64", "content": {{"class": "NumpyArray", "primitive": "int32", "inner_shape": [], "parameters": {{}}, "form_key": "node{form_keys[5]}"}}, "parameters": {{}}, "form_key": "node{form_keys[4]}"}}}}, "parameters": {{"__record__": "TRefArray"}}, "form_key": "node{form_keys[0]}"}}'
                forth_obj.add_form(json.loads(temp_aform))
                forth_stash.add_to_header(
-                    f"output node{form_keys[1]}-offsets int64\noutput node{form_keys[2]}-data uint8\noutput node{form_keys[3]}-data int64\noutput node{form_keys[4]}-offsets int64\noutput node{form_keys[5]}-data int64\n"
+                    f"output node{form_keys[1]}-offsets int64\noutput node{form_keys[2]}-data uint8\noutput node{form_keys[3]}-data int32\noutput node{form_keys[4]}-offsets int64\noutput node{form_keys[5]}-data int32\n"


It also adds a string __array__ parameter, which looks right (I don't know which key is the content and parent, but we do need a string and char array, which are now there.

jpivarski · 2022-11-30T00:39:45Z

You're right that a "char" NumpyArray was missing the "string" parameter on its containing ListOffsetArray. That was another one of the corrections.

aryan26roy · 2022-11-30T02:29:10Z

src/uproot/streamers.py

@@ -1064,7 +1064,7 @@ def class_code(
                    f'            forth_stash.add_to_header(f"output node{{key}}-data {{uproot._awkward_forth.convert_dtype(uproot._awkward_forth.symbol_dict[self._dtype{len(dtypes)}])}}\\n")',
                    '            forth_stash.add_to_header(f"output node{key2}-offsets int64\\n")',
                    '            forth_stash.add_to_init(f"0 node{key2}-offsets <- stack\\n")',
-                    f'            content[{self.name!r}] = {{"class": "ListOffsetArray", "offsets": "i64", "content": {{ "class": "NumpyArray", "primitive": f"{{uproot._awkward_forth.convert_dtype(uproot._awkward_forth.symbol_dict[self._dtype{len(dtypes)}])}}", "inner_shape": [], "parameters": {{}}, "form_key": f"node{{key}}"}}, "form_key": f"node{{key2}}"}}',
+                    f'            content[{self.name!r}] = {{"class": "RegularArray", "size": {self.array_length}, "content": {{ "class": "NumpyArray", "primitive": f"{{uproot._awkward_forth.convert_dtype(uproot._awkward_forth.symbol_dict[self._dtype{len(dtypes)}])}}", "inner_shape": [], "parameters": {{}}, "form_key": f"node{{key}}"}}, "form_key": f"node{{key2}}"}}',
                    f'            forth_stash.add_to_pre(f"{self.array_length} dup node{{key2}}-offsets +<- stack \\n stream #!{{uproot._awkward_forth.symbol_dict[self._dtype{len(dtypes)}]}}-> node{{key}}-data\\n")\n',


@jpivarski what I did here was hard-code the array length into the forth code and use it to read the file by putting and pulling it off the stack. I think this can be easily replaced with a similar hardcoded version without generating the offset array.

By "hard coded," do you mean that you took the value of self.array_length and included that as a literal in the Forth code? (That would be fine, of course.) You don't mean that you took the length of this particular example, such that it wouldn't work with a file with a different array length, right? (Just checking, because of the way it was worded.)

We can remove the offsets-generation at any time, because that counts as an optimization—the output won't change when it gets fixed.

The first one. It changes depending on the file.

Okay, great! (I thought so, just checking.)

And that reminds me that I should merge this now. Thanks for looking out over!

fix: all AwkwardForth Forms now agree with awkward_form method output.

13594f2

jpivarski linked an issue Nov 30, 2022 that may be closed by this pull request

Ensure that the Forms produced by the AwkwardForth generation agree with the awkward_form methods. #754

Closed

Oops; don't forget this test.

f6fec32

jpivarski commented Nov 30, 2022

View reviewed changes

jpivarski requested a review from agoose77 November 30, 2022 00:28

Merge branch 'main' into jpivarski/awkwardforth-forms-must-agree-with…

a5878c0

…-awkward_form-method

agoose77 approved these changes Nov 30, 2022

View reviewed changes

aryan26roy reviewed Nov 30, 2022

View reviewed changes

jpivarski merged commit 150cf6c into main Nov 30, 2022

jpivarski deleted the jpivarski/awkwardforth-forms-must-agree-with-awkward_form-method branch November 30, 2022 16:31

jpivarski mentioned this pull request Dec 22, 2022

fix: AsStridedObjects.awkward_form was still including the '@' members. #808

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: all AwkwardForth Forms now agree with awkward_form method output. #790

fix: all AwkwardForth Forms now agree with awkward_form method output. #790

jpivarski commented Nov 30, 2022

jpivarski Nov 30, 2022

jpivarski Nov 30, 2022

jpivarski Nov 30, 2022

agoose77 Nov 30, 2022

jpivarski Nov 30, 2022

jpivarski Nov 30, 2022

jpivarski commented Nov 30, 2022

agoose77 left a comment

agoose77 Nov 30, 2022

jpivarski commented Nov 30, 2022

aryan26roy Nov 30, 2022

jpivarski Nov 30, 2022

aryan26roy Nov 30, 2022

jpivarski Nov 30, 2022

		f' content[{self.name!r}] = {{"class": "ListOffsetArray", "offsets": "i64", "content": {{ "class": "NumpyArray", "primitive": f"{{uproot._awkward_forth.convert_dtype(uproot._awkward_forth.symbol_dict[self._dtype{len(dtypes)}])}}", "inner_shape": [], "parameters": {{}}, "form_key": f"node{{key}}"}}, "form_key": f"node{{key2}}"}}',
		f' content[{self.name!r}] = {{"class": "RegularArray", "size": {self.array_length}, "content": {{ "class": "NumpyArray", "primitive": f"{{uproot._awkward_forth.convert_dtype(uproot._awkward_forth.symbol_dict[self._dtype{len(dtypes)}])}}", "inner_shape": [], "parameters": {{}}, "form_key": f"node{{key}}"}}, "form_key": f"node{{key2}}"}}',

fix: all AwkwardForth Forms now agree with awkward_form method output. #790

fix: all AwkwardForth Forms now agree with awkward_form method output. #790

Conversation

jpivarski commented Nov 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpivarski commented Nov 30, 2022

agoose77 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpivarski commented Nov 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment