Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement native container types used by shared EventPipe code. #78852

Merged
merged 3 commits into from
Mar 10, 2023

Conversation

lateralusX
Copy link
Member

@lateralusX lateralusX commented Nov 25, 2022

Use eglib's list, slist, array, ptr_array, queue, hash table as a starting point creating a new set of native container types shared between native runtime components like EventPipe.

EventPipe shim container API used a STL like API, new container types follows STL style container API and match STL type names:

vector (dn_vector)
forward list (dn_fwd_list)
list (dn_list)
queue (dn_queue)
unordered map (dn_umap)

Native container types are still included in runtime build artifacts (through EventPipe cmake files), this can be changed/done differently depending on how each runtime would like to build/use the source files. In order to make that intent more clear, container and EventPipe cmake files have been renamed to not use CMakeList.txt, but .cmake.

A number of new native tests for added container types have been added into EventPipe native test runner.

Custom allocator supported for all container types, adds ability for EventPipe to keep use of local arrays and mainly do stack alloc. Implemented allocators support switching to dynamic allocations when running out of assigned stack space, all transparent to consumer of container type.

Iterators are supported for all container types, but in cases where the underlying type also support simple iteration patterns it is possible to use struct fields directly, like getting access to size and data on vector, head on list/fwd_list and tail on list and use them directly in for statements.

Fields in structs that are "public" are possible to access directly, for fields that have internal/private access, they are stored under a _internal struct and have their names prefixed with _, idea is that these fields can only be accessed by container type implementation and not directly by consumers. Full structs are part of header files to support both static as well as dynamic allocation patterns and each type supports alloc/free for dynamic allocation and init/dispose when type has been either static allocated or allocated using a different memory allocator.

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look through the implementations, just took a high level pass through the code.

Main comment is just that I think it's ok to be more opinionated: if we like the _ex_ functions, just make them part of the new API. If we like STL-style names, no need to provide macros so that we have the old name and the new name (at least not in the common folder). Inevitably we'll end up with both used interchangeably and it will be more confusing in the long run to see a mix of dn_array_ex_erase and dn_array_remove_index, for example.

src/native/containers/dn-allocator.h Outdated Show resolved Hide resolved
src/native/containers/dn-array-ex.h Outdated Show resolved Hide resolved
src/native/containers/dn-array-ex.h Outdated Show resolved Hide resolved
src/native/containers/dn-malloc.h Outdated Show resolved Hide resolved
src/native/containers/dn-sort-frag.h Outdated Show resolved Hide resolved
src/native/containers/dn-utils.h Outdated Show resolved Hide resolved
@lateralusX
Copy link
Member Author

lateralusX commented Dec 1, 2022

I didn't look through the implementations, just took a high level pass through the code.

Main comment is just that I think it's ok to be more opinionated: if we like the _ex_ functions, just make them part of the new API. If we like STL-style names, no need to provide macros so that we have the old name and the new name (at least not in the common folder). Inevitably we'll end up with both used interchangeably and it will be more confusing in the long run to see a mix of dn_array_ex_erase and dn_array_remove_index, for example.

We probably need old API intact so we can change Mono to use these types instead of the once in eglib (and we should be able to drop them there as well). without rewriting a lot of code, alternative is to have and adaption API on Mono side, with glib -> new container API. If we think its OK to break and leave the glib API behind (put still having semantic similar API'sthat could be used in an adaption layer) it might be worth to clean up the API surface a little. Alternative is to just decide that we will keep Mono's eglib types around and then gradually switch over to new shared container types in Mono.

@lambdageek
Copy link
Member

We probably need old API intact so we can change Mono to use these types instead of the once in eglib (and we should be able to drop them there as well). without rewriting a lot of code, alternative is to have and adaption API on Mono side, with glib -> new container API. If we think its OK to break and leave the glib API behind (put still having semantic similar API's) it might be worth to clean up the API surface a little.

I'm not sure what the least disruptive way will be to adopt these in Mono. But I think if we need a mix of STL-style names and glib-style names in Mono, it's probably ok to have some macros in src/mono only. no need to make them part of the common container API.

@lateralusX
Copy link
Member Author

lateralusX commented Dec 1, 2022

We probably need old API intact so we can change Mono to use these types instead of the once in eglib (and we should be able to drop them there as well). without rewriting a lot of code, alternative is to have and adaption API on Mono side, with glib -> new container API. If we think its OK to break and leave the glib API behind (put still having semantic similar API's) it might be worth to clean up the API surface a little.

I'm not sure what the least disruptive way will be to adopt these in Mono. But I think if we need a mix of STL-style names and glib-style names in Mono, it's probably ok to have some macros in src/mono only. no need to make them part of the common container API.

Agree, should we build and extend glib API and extend with additional functions needed, or should we implement a STL like API (as initialized in current _ex files) for the new container types and make sure we have an upgrade path for Mono switching over to new container types? The new API should at least be a superset of existing glib API. So for example dn_array_t would then have the following API's (with corresponding glib API's):

dn_array_alloc
    g_array_new
dn_array_alloc_capacity
    g_array_sized_new
dn_array_free
    g_array_free
dn_array_push_back (several versions)
   dn_array_append_vals
   dn_array_insert_vals
dn_array_erase
    g_array_remove_index
dn_array_clear
dn_array_reserve
    g_array_set_size
dn_array_index
    g_array_index
dn_array_empty
dn_array_data
dn_array_size
    g_array_capacity

and since we now adapt stl vector and since our array is more like vector, maybe we should change name to dn_vector_t instead.

@lateralusX
Copy link
Member Author

lateralusX commented Jan 10, 2023

@jkotas @AaronRobinsonMSFT @lambdageek:

In order to get progress on the shared container types and use from EventPipe as drafted in this PR, we need to decide what API surface we would like to implement. The PR now includes two API surfaces, the original eglib as used by Mono as well as an STL style API surface more inline to what we had in EventPipe shim API. Here is an example of dn_array and if we switch to an stl style API, and if that case the type would probably be renamed dn_vector:

stl syle eglib style
dn_array_alloc g_array_new
dn_array_alloc_capacity g_array_sized_new
dn_array_free g_array_free
dn_array_push_back dn_array_append_vals
dn_array_insert dn_array_insert_vals
dn_array_erase g_array_remove_index
dn_array_clear dn_array_set_size(0)
dn_array_reserve dn_array_set_size
dn_array_index dn_array_index
dn_array_empty
dn_array_data a->data
dn_array_size a->len

We could just implement the STL style API surface and adjust structs and some types (use size_t instead of int32_t and uint32_t for array size), but then it won't be compatible with Mono eglib anymore, meaning we would need to rewrite Mono code to use new container types and keep duplicates of current eglib types in Mono until replaced (if that ever happens).

Alternative is to keep 100% compatible eglib API (except function and type names using dn_ and changes to C99 types). That way we could just do simple name and type adaption in static inline or defines in Mono headers and switch over to shared container types and drop implementation in Mono's eglib. This would be the most straightforward approach, but will also be "limited" and carry forward current design of eglib functions, struct layout and size of data types (like using int32_t for array size).

Thoughts?

@lambdageek
Copy link
Member

Couple of thoughts:

use new container types and keep duplicates of current eglib types in Mono until replaced (if that ever happens).

Would replacing uses of GArray by DNVector really be such a massive undertaking? git grep shows 75 uses of GArray (including in the eglib tests and the implementation itself). There are ~2200 uses of ->data in src/mono but that includes all data structures that happen to have a data field, not just GArray.

use size_t instead of int32_t and uint32_t for array size

Is dn_array_t really the right data structure to use for an array with more than 2 billion elements?


I think we should check in a container API that we prefer and update mono in a follow-up PR, if necessary. If we do it early in the release cycle I don't think it would be that destabilizing. And if there is really a sophisticated user of GArray hiding somewhere, we can keep that code as is using the eglib type. But I suspect in most cases the surrounding code doesn't expect the arrays to get too big, nor does it do anything sophisticated - at best it is relying on the ability to realloc the data to grow the array.


Personally I don't have a strong opinion on whether STL-style or glib-style names are better. As long as we're not mixing them and the naming gives a clear indication of where the inspiration came from (and thus, a hint to how the functions behave).

@lateralusX
Copy link
Member Author

lateralusX commented Jan 10, 2023

Is dn_array_t really the right data structure to use for an array with more than 2 billion elements?

Probably not, but at least use uint32_t instead of int32_t as length, I also know libraries that standardize on size_t, but by default the library only accepts uint32_t sizes and I believe its done that way for security reasons around under/owerflows checks in the implementation and stl standardize on size_type for all sizes of types, including vector and its normally represented as size_t, but by definition its defined as an unsigned integral type, so might (at least in theory) be smaller.

@lateralusX
Copy link
Member Author

lateralusX commented Jan 10, 2023

Would replacing uses of GArray by DNVector really be such a massive undertaking? git grep shows 75 uses of GArray (including in the eglib tests and the implementation itself). There are ~2200 uses of ->data in src/mono but that includes all data structures that happen to have a data field, not just GArray.

We could probably do it type by type, but so far this PR includes ports of g_array, g_byte_array, g_ptr_array, g_list, g_slist, g_queue and g_hash_table.

@jkotas
Copy link
Member

jkotas commented Jan 10, 2023

I think we should check in a container API that we prefer and update mono in a follow-up PR, if necessary.

+1

src/mono/mono/eventpipe/test/dn-array-tests.c Outdated Show resolved Hide resolved
src/native/containers/dn-allocator.c Outdated Show resolved Hide resolved
src/native/containers/dn-allocator.c Outdated Show resolved Hide resolved
if (capacity <= (uint64_t)(priv->capacity))
return true;

new_capacity = (capacity + (capacity >> 1) + 63) & ~63;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these numbers warrant a comment.

@lateralusX
Copy link
Member Author

lateralusX commented Jan 30, 2023

Started to refactored dn_array, dn_ptr_array and dn_byte_array -> dn_vector, dn_ptr_vector. Removed _ex API and replaced with stl style API only as described above, dn_vector should be "done", will now move on to the other types following the same pattern.

@lateralusX
Copy link
Member Author

lateralusX commented Feb 6, 2023

Worked through forward list, list and queue + tests and almost through with hash table type. Will push all changes coming days.

This PR is already rather big and NativeAOT EventPipe work have a dependency on it. At the same time we should probably work on adding even more tests validating API details and potentially extend API surface with more functions before putting the native container types under native/container folder. Could an option be to start out with the container types under eventpipe folder? That way NativeAOT will be unblocked and we could then move type by type into native/container folder in follow up PR's. Keeping it in eventpipe folder for now isolate the changes, and we can tweak API surface if needed going forward, add more tests, all without affecting consumers of eventpipe. Should I make that adjustment to the PR, @lambdageek, @jkotas, @AaronRobinsonMSFT , @LakshanF ?

@lateralusX
Copy link
Member Author

object libraries

On the other hand, I see that eventpipe in particular is using this SOURCES/HEADERS pattern quite extensively. So it might take a bit of work to untangle the cmake bits to do it another way. It's not really pertinent to this PR, so maybe it could be done as follow-up work.

Since we use the include of .cmake files in several other places where we share source files between different runtime components, maybe we could do any refactoring related to that in a follow up PR as suggested and keep things as is in this PR for now.

@lateralusX
Copy link
Member Author

@lambdageek So to address you feedback I'm planning to do the following:

  • Reduce the number of arguments needed when doing custom alloc/init and use a struct.
  • Add custom dispose_func to the pop, pop_front, pop_back functions for consistency.

Anything else (assuming we stick with current .cmake approach)?

@lambdageek
Copy link
Member

  • Reduce the number of arguments needed when doing custom alloc/init and use a struct.

Yes

  • Add custom dispose_func to the pop, pop_front, pop_back functions for consistency.

To clarify: add extra versions of these functions that take a dispose_func argument? Or modify the existing functions? I was suggesting adding.

Anything else (assuming we stick with current .cmake approach)?

No, I think that's it. Thanks @lateralusX

@lateralusX
Copy link
Member Author

To clarify: add extra versions of these functions that take a dispose_func argument? Or modify the existing functions? I was suggesting adding.

Yes adding custom functions for this as we do for all other functions that accept dispose_func argument, like dn_queue_custom_pop (dn_queue *queue, dn_queue_dispose_func_t dispose_func).

@AaronRobinsonMSFT
Copy link
Member

I think we should do a follow up PR later that decides how we should run these low level native tests and what test runner we should apply doing the job.

@lateralusX Let's file an issue on this.

@lateralusX
Copy link
Member Author

@lateralusX Let's file an issue on this.

#82396

@lateralusX
Copy link
Member Author

lateralusX commented Feb 20, 2023

@lambdageek Addressed your feedback + added a number of new tests to cover new functions + additional EventPipe scenarios.

@AaronRobinsonMSFT Are you happy with all the changes as well?

@noahfalk, @davmason I have looked over the EventPipe related changes many times, but might be good to get a pair of extra eyes looking at the EventPipe specific changes.

@AaronRobinsonMSFT
Copy link
Member

@AaronRobinsonMSFT Are you happy with all the changes as well?

I am. The API shape is what I would expect and I appreciate the testing we have for this.

One final thing that might be helpful before we check this in, is a quick before/after on some scenario benchmarks. There are two dimensions I would be interested in: (1) binary size on disk and (2) throughput for the events. I'm not saying we can't have any regressions but understanding the current state will help us understand if we need to revisit anything right now or can iterate over time. The matrix I would expect would be:

CoreCLR: Before CoreCLR: After Mono: Before Mono: After NativeAOT
Size on disk Change in size
Scenario (1)
...
Scenario (N)

@lateralusX
Copy link
Member Author

@AaronRobinsonMSFT Are you happy with all the changes as well?

I am. The API shape is what I would expect and I appreciate the testing we have for this.

One final thing that might be helpful before we check this in, is a quick before/after on some scenario benchmarks. There are two dimensions I would be interested in: (1) binary size on disk and (2) throughput for the events. I'm not saying we can't have any regressions but understanding the current state will help us understand if we need to revisit anything right now or can iterate over time. The matrix I would expect would be:

CoreCLR: Before CoreCLR: After Mono: Before Mono: After NativeAOT
Size on disk Change in size
Scenario (1)
...
Scenario (N)

@davmason I believe John had some performance related EventPipe tests that he ran from time to time. Any idea where they are and how they can be run?

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

I have a couple of suggestions around making the interface a bit nicer to use from C with less formality.

The main suggestion is to ensure that zero-initializing customization structs gets the reasonable default behavior. The second suggestion/question is to just use the fully-customizable versions of the various functions as the interface from the public API into the implementation. (As opposed to adding _ declarations that explode the customization structs and are called by all the other forms. It's just adding noise to the headers, IMO, unless there's some peformance benefit i'm not seeing).

Finally, I think element_size on vector is not a customization parameter. it's a required parameter. and moreover it's required to be non-zero. so don't make it part of the params struct. pass it explicitly. (You already ignore it in some of the macros in favor of sizeof(type) - just more indication that it doesn't belong)

src/native/containers/dn-umap.h Outdated Show resolved Hide resolved
src/native/eventpipe/ep-file.c Outdated Show resolved Hide resolved
src/native/containers/dn-vector-t.h Outdated Show resolved Hide resolved
src/native/containers/dn-vector-t.h Outdated Show resolved Hide resolved
Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice to me :) I'm glad both for the conceptual simplification and the improved readability. I mostly just skimmed, I think the tests have a far better chance of finding a bug in container usage vs. my scrutiny.

One request is to add some comments in the container headers or a README in the same directory that gives a quick overview of using the types
for common tasks. For example show what it looks like to create a vector, add a few elements, iterate over it, and free it. Similar with the other container types.

return it.it == it._internal._vector->size;
}

#define DN_VECTOR_FOREACH_BEGIN(vector,var_type,var_name) do { \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do folks think of ordering the parameters the same way they would show up in a C#, Java, or javascript foreach statement? The collection is always the last parameter as in:

foreach(string s in list)
DN_VECTOR_FOREACH(var_type, var_name, vector)

I'm not sure if there is some other FOREACH precedent in C/C++. Pretty much all the code I always work with in C/C++ just uses an index into an array or pointer arithmetic but that may just be me using ancient conventions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me, mainly put it as first parameter since that what the rest of the API's uses and resembles the "this" pointer, but for the iterate helper macros we can go either way.

Use eglib's list, slist, array, ptr_array, queue, hash table as a
starting point creating a new set of native container types shared
between native runtime components like EventPipe.

EventPipe shim container API used a STL like API, new container types
follows STL style container API and match STL type names:

vector
forward list
list
queue
unordered map

Native container types are still included in runtime build artifacts
(through EventPipe cmake files), this can be changed/done differently
depending on how each runtime would like to build/use the source files.
In order to make that intent more clear, container and EventPipe cmake
files have been renamed to not use CMakeList.txt, but .cmake.

A number of new native tests for added container types have been added
into EventPipe native test runner.
@lateralusX lateralusX force-pushed the lateralusX/ep-container-types branch from dec556c to c9bf963 Compare March 2, 2023 20:41
@lateralusX
Copy link
Member Author

@lambdageek, @noahfalk, have addressed all of your latest feedback.

@AaronRobinsonMSFT will provide some stats tomorrow.

Copy link
Member

@lambdageek lambdageek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Member

@LakshanF LakshanF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked from the NativeAOT EventPipe perspective, and these container types work without any problems! NativeAOT EventPipe port is still work in progress and have minimal containers for the code that is checked in to main currently.

NativeAOT EventPipe is taking a hard dependency on these containers and super happy of this work and getting it merged into main.

@lateralusX
Copy link
Member Author

CC @davmason

@lateralusX
Copy link
Member Author

Failure in runtime (Build Libraries Test Run release coreclr linux_musl x64 Debug) is unrelated and has been observed on many different PR's.

Copy link
Member

@davmason davmason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@lateralusX
Copy link
Member Author

I have collected requested stats, both size diff before/after this PR as well as running EventPipeStress from diagnostics repro:

https://github.com/dotnet/diagnostics/tree/main/src/tests/EventPipeStress

before/after this PR. I have also run the low level performance tests that exists in EventPipe's native unit tests on Mono:

{"test_check_buffer_perf", test_check_buffer_perf},

{"test_buffer_manager_perf", test_buffer_manager_perf},

{"test_write_event_perf", test_write_event_perf},

All stress tests are run on local laptop so values are not very scientific or stable, but I believe they least serve as good markers to detect potential regressions due to this PR. The EventPipeStress tests didn't indicate any performance regressions, on neither CoreCLR or Mono (Windows x64), both runtimes even showed a small performance increase, but well within error margin.

Low level EventPipe native unit tests indicated a small performance regression in one low level tests, test_buffer_manager_perf, but not visible when stress testing both lower as well as higher level native EventPipe API's, so I believe it falls into the error margin as well and doesn't have impact on end2end EventPipe performance. Having that said, I still did some profiling and have a couple of optimizations in dn_vector_t as well as dn_umap_t that close the gap, but I will do them in a follow up PR to not bloat this PR even more it doesn't affect the end2end EventPipe performance.

When it comes to size, there is, as expected, a small increase on both runtimes, mainly since we added new code wihtout removing any existing container implementations (except code in EventPipe shim). If/when Mono replace its eglib with new container classes I anticipate that diff to be eliminated, but on CoreCLR it will probably persist since we probably won't replace CoreCLR C++ container classes with the shared C based container types. The increase is however small 4K on CoreCLR and 5K on Mono, so its just a 0.08% increase on CoreCLR and 0.10% on Mono.

And here are the actual values, Mono's lower numbers on EventPipe throughput is mainly due to Mono's default JIT implementation not producing code on pair with CoreCLR.

Before After Diff Diff %
CoreCLR Size on disk (Bytes) 5449216 5453312 +4096 +0.08
Mono Size on disk (Bytes) 5097472 5102592 +5120 +0.10
CoreCLR EventPipeStress Events/Sec 613425 615559 +2134 +0.35
Mono EventPipeStress Events/Sec 355818 359331 +3513 +0.99

Unless someone have any additional comments, I will now merge this PR.

Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@lateralusX lateralusX merged commit 1d5c424 into dotnet:main Mar 10, 2023
lateralusX added a commit that referenced this pull request Mar 15, 2023
During work collecting regressions statistics for #78852 I did some profiling on performance tests included in native EventPipe tests, https://github.com/dotnet/runtime/blob/main/src/mono/mono/eventpipe/test. This commit implements a couple of optimizations in the EventPipe native container classes as well as Mono's EventPipe implementation improving performance in low level native EventPipe performance tests.

Commit also includes a number of new native EventPipe tests covering optimizations done in dn_vector_ptr_t.
@ghost ghost locked as resolved and limited conversation to collaborators Apr 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants