Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow structs to be marked that they require being passed by reference #2372

Open
terrajobst opened this issue Mar 28, 2019 · 44 comments
Open

Comments

@terrajobst
Copy link
Member

For high-performance scenarios we're now in a position where we'd like to leverage mutable structs. The reason we'd like to use structs is to that they can be declared as ref struct so that they can store references to spans.

They are mutable because they represent writers, so they keep track how many bytes were written. Passing these structs by value is easy to do and creates broken code.

It would be nice if C# would allow us to mark a struct as requiring being passed by reference to methods. This could be an attribute or a keyword, like so:

[RequiresPassingByReference]
public struct MyMutableStruct
{
    // ...
}

This would mean that code like this would fail to compile:

public void Write(MyMutableStruct state)
{
}

ERROR: The type 'MyMutableStruct' requires to be passed by reference. Change the paramater 'state' to be 'ref'.

In other words, it's declaration site error, not a use site error.

@Logerfo
Copy link
Contributor

Logerfo commented Mar 28, 2019

That can be achieved with an analyzer, so I don't think it would be worth the language team effort.

@terrajobst
Copy link
Member Author

That can be achieved with an analyzer, so I don't think it would be worth the language team effort.

Maybe, but that can be said for many language constraints. If we believe it's core enough, doing it in the compiler is better, because it reduces moving parts.

@CyrusNajmabadi
Copy link
Member

I personally agree with @Logerfo . It's especially telling that this would be done with an attribute, as opposed to any specific language syntax. The pairing of an attribute with an analyzer is extremely simple to do and really fits in the bread-and-butter usecase for this sort of thing.

doing it in the compiler is better, because it reduces moving parts.

Also, much costlier :)

Are you guys already using analyzers for anything today? If not, i can see this being a higher up front investment on your part. However, if you've already started using analyzers to find and detect patterns you need ot avoid, then this should be a tiny additional marginal cost on top of that.

@YairHalberstadt
Copy link
Contributor

One disadvantage of an analyser, is that you can't treat warnings issued by the analyser as errors solution wide, only per project.

If that was made easier, I would find analysers a lot more useful.

@gafter
Copy link
Member

gafter commented Mar 28, 2019

Only the compiler knows exactly where defensive copies are produced. This is also requested as a compiler feature at dotnet/roslyn#26937

@terrajobst
Copy link
Member Author

One disadvantage of an analyser, is that you can't treat warnings issued by the analyser as errors solution wide, only per project.

That should have an easy fix: add a Directory.Build.props file and configure the warnings there.

@terrajobst
Copy link
Member Author

terrajobst commented Mar 28, 2019

Are you guys already using analyzers for anything today? If not, i can see this being a higher up front investment on your part. However, if you've already started using analyzers to find and detect patterns you need ot avoid, then this should be a tiny additional marginal cost on top of that.

Yes, but we're the platform. There is no NuGet package for our stuff. The only way we can make such an analyzer useful is by injecting it for everyone. We can, but my gut feel tells me what we're eventually getting dinged for overhead.

@YairHalberstadt
Copy link
Contributor

Is this meant to be an internal implementation detail of the platform, or exposed as a public struct?

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 28, 2019

I wonder if this is a case of XY Problem and there may be a more elegant solution that doesn't involve leaking such abstraction details.

@HaloFour
Copy link
Contributor

Is this really a property of the struct itself? I would think that a developer would want the ability to say that any struct shouldn't be copied and to be warned appropriately.

@CyrusNajmabadi
Copy link
Member

I would think that a developer would want the ability to say that any struct shouldn't be copied and to be warned appropriately.

Really? That would make working with ints, bools, and so many other structs incredibly painful. :D

(i kid, i kid. I know what you meant :D )

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 30, 2019

They are mutable because they represent writers, so they keep track how many bytes were written.

So one thing I thought up that could facilitate this end result without the fragile-looking approach proposed would be to bring a new feature exclusive to ref structs: being able to declare ref fields. I'll state upfront that I have no idea how feasible it even is, but I could imagine the syntax being something like this:

public ref partial struct SpanWriter
{
    private Span<byte> _buffer;
    private ref int _bytesWritten;

    public SpanWriter(Span<byte> buffer, ref int writeCounter)
    {
        _buffer = buffer;
        _bytesWritten = ref writeCounter;
    }
}

The main idea is that SpanWriter can get value-copied, but the _bytesWritten field will still point to the same underlying value, exactly like how value-copies of a Span will all still point to the same underlying value. The combination of restricting this to ref structs and using the Safe-to-return rules from ref returns should make this completely safe to use, provided I haven't overlooked anything.

@HaloFour
Copy link
Contributor

@Joe4evr

IIRC the CLR doesn't currently permit ref fields. The compiler might be able to emulate them with pointers but I think you might be opening up a can of worms having the compiler treat them as refs.

@Korporal
Copy link

For high-performance scenarios we're now in a position where we'd like to leverage mutable structs. The reason we'd like to use structs is to that they can be declared as ref struct so that they can store references to spans.

They are mutable because they represent writers, so they keep track how many bytes were written. Passing these structs by value is easy to do and creates broken code.

It would be nice if C# would allow us to mark a struct as requiring being passed by reference to methods. This could be an attribute or a keyword, like so:

[RequiresPassingByReference]
public struct MyMutableStruct
{
    // ...
}

This would mean that code like this would fail to compile:

public void Write(MyMutableStruct state)
{
}

ERROR: The type 'MyMutableStruct' requires to be passed by reference. Change the paramater 'state' to be 'ref'.

In other words, it's declaration site error, not a use site error.

@terrajobst - Can you explain why a class is not sufficient? surely defining a kind of struct that is always passed by reference is more or less the same as a class?

@HaloFour
Copy link
Contributor

@Korporal

surely defining a kind of struct that is always passed by reference is more or less the same as a class?

Except for heap allocations and the fact that you can't stick ref-likes (like Span<T>) on the heap.

I do agree that for most people writing "normal" programs that features like this aren't beneficial over using classes, and are probably more cumbersome to use. I'm curious about the specific use cases described in the original post.

@HaloFour
Copy link
Contributor

@CyrusNajmabadi

Really? That would make working with ints, bools, and so many other structs incredibly painful. :D

I know you're kidding, but yeah, I can see scenarios where you'd want to pass around even those primitive types as refs only, like as a shared counter or flag, and an accidental copy somewhere would completely break your logic and probably be very hard to find.

Attaching this feature to the struct itself I think creates more problems than it solves. It bifurcates the ecosystem and requires the developer to choose between incompatible types based on how they want to use it at that point in time.

@sharwell
Copy link
Member

For me, the main thing I'd like is the ability to implement something like unique_ptr<T> as a value type.

  1. References to the current location are fine
  2. Moving the current location to a new location is fine, as long as there are no remaining references to the current location

By-ref types show that the compiler is capable of handling this situation. However, it doesn't allow for the second situation, so if the type is defined as a ref struct then it's not useful in async methods or iterators even though those usages would not negatively impact semantics.

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 31, 2019

IIRC the CLR doesn't currently permit ref fields. The compiler might be able to emulate them with pointers but I think you might be opening up a can of worms having the compiler treat them as refs.

That's a bummer, but I'd still rather look at a solution in that space instead of taking the OP proposed approach which looks just too fragile. As you just so succinctly put it:

Attaching this feature to the struct itself I think creates more problems than it solves.

@juliusfriedman
Copy link

juliusfriedman commented Mar 31, 2019

If ref structs get ref fields then that solves his issue... See #1147

As indicated there ByReference gets you 90%.

You can also use IntPtr and make a property which returns a ref, do the unsafe cast using the IntPtr member.

Also just curious why a ref struct with ref property can't work for you in this example, I don't see the need for a ref field spelled out.. it could be contrived by the same arguments as it demerits

Also if you declare the method to take ref then there is no way for the caller to pass by value in the first place...

And @gafter, ? they are produced everywhere ref is not used... What's the mystery?

When I use ref or in there is no copy...

@Joe4evr, partial structs are already a construct...

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 31, 2019

@juliusfriedman

Also just curious why a ref struct with ref property can't work for you in this example

Consider the writer having the following method:

public void Write(ReadOnlySpan<byte> value)
{
    var slice = _span.Slice(_bytesWritten);
    value.CopyTo(slice);
    _bytesWritten += value.Length;
}

If the writer is value-copied into another method that executes Write(), and then returns to its caller, the _bytesWritten field is not updated, but the data written into the Span is, meaning that the next Write() isn't going to start in the right place. This is the problem that the framework team wants a solution for.

@Joe4evr, partial structs are already a construct...

I know, but that's not the point. When writing sample code, partial merely means "there is more API to this type than what is shown here, but it's not relevant to drive the argument".

@juliusfriedman
Copy link

juliusfriedman commented Mar 31, 2019

@Joe4evr

You called Slice there than you passed it to write by value....

Why did you not pass by ref?

Need I type more?

The solution is to not do those kinda things in the first place... someone used to say "if something hurts then don't do it", I have a feeling that person would say the same thing here.

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 31, 2019

Why did you not pass by ref?

Because 1) CopyTo() does not take its argument by ref, since 2) Spans don't need to be passed by ref to update the underlying data. See it in action right here.

@juliusfriedman
Copy link

juliusfriedman commented Mar 31, 2019

Thinking about this scientifically.... a solution to that is tantamount to asking for a solution to when to use ref instead of value anywhere.... that's a question alright...

Perhaps though we could answer like if there ever was an instance it becomes the preferred reference [default if you will]and is used in such cases however a programmer would then need a construct to determine when that is...

Which could also be allowed... e.g. With keyword or another syntax which i could specify the intricies of such... [ref, &, etc]

However how is that more useful than just checking for null or the desired state simply before accessing from that memory as one should be anyway for reference types I am not sure...

It would be useful for lambdas for sure as I could then begin to say what I wanted to capture, how, and what to do for certain state, null etc...

@juliusfriedman
Copy link

juliusfriedman commented Mar 31, 2019

@Joe4evr

That seems like an API change which needs to made or the need for a method which gets passed bytesWritten by ref and calls WriteTo, then update bytesWritten.

@sharwell
Copy link
Member

sharwell commented Mar 31, 2019

If ref structs get ref fields then that solves his issue...

This would not address the situations where non-copyable structs would be valuable for me. I'm essentially looking for StrongBox<T> behavior without forcing an additional allocation for the box.

@juliusfriedman
Copy link

juliusfriedman commented Mar 31, 2019

@sharwall, at this point your not asking for non copyable structs your asking for non copyable e.g. not reable memory e.g. that something is write only first of all.

You do that with 'out' or other encapsulation mechanisms.

Also I believe that is StrongBox<WeakReference> your looking for...

And Finally if you want somerhing for structs that are ref then you would use IntPtr, void, something anything but probably an IWriteOnly interface... (dotnet/coreclr#23586 (comment)) but since you can't use a ref struct and the interface without boxing then you might have to use those other less safe mechanisms...

Again qi bono ¿

Sincerely if I must contrive I can see a possible benefit for lambdas and anonymous Types by allowing enhanced capture syntax for those scenarios but beyond that...

The panicky thing for me is that no else cried use ref...

@sharwell
Copy link
Member

sharwell commented Mar 31, 2019

@juliusfriedman I believe you have misunderstood my request. I'm looking for a way to have a value type instance that can only exist at one location in memory at any given time. It can be moved from one location to another, but not copied such that it exists in two places at once.

A type with these characteristics would meet the goals of the original performance request, and also solve other problems related to correctness.

@juliusfriedman
Copy link

juliusfriedman commented Mar 31, 2019

@sharwall, How about a static struct? As for correctness... use ref...

@sharwell
Copy link
Member

sharwell commented Apr 1, 2019

How about a static struct?

This would limit to one instance. I'm looking for any number of instances, but an existing reference could not be copied to a new location without moving it.

As for correctness... use ref...

This prevents use as a field of a reference type and also prevents use in async methods, which are primary use cases.

@FiniteReality
Copy link

@juliusfriedman It seems like people are asking for a way to implement semantics similar to that of std::unique_ptr in C++: https://en.cppreference.com/w/cpp/memory/unique_ptr

In C++, something can be copy constructed or move constructed, which means copying the data of another object, or taking ownership of the memory allocated by another object. This is useful for making sure, at least in unique_ptr's case, that only the code with an active reference to the unique_ptr container at one time can access the value. As it can only be moved, passing it as a parameter to a method effectively "deactivates" the unique_ptr after it has been passed, so it does not refer to the value anymore.

I could see something like this being useful in C# too, such as flowing context throughout various places where spans are needed, as they can't be used in classes. Personally, I would use this in command parsing to avoid allocating strings to contain the user's tokenized (split into spaces w/ some special handling for quotes, etc.) message across various handlers, without introducing a huge amount of complexity into my project.

@juliusfriedman
Copy link

@FiniteReality, more accurately people are asking for an officially supported construct for such.
We can already access memmove (not really needed though as we can already write a class which implements the move semantics and enforces it for our application).

And not for nothing but how does unique_ptr guarantee move semantics?

There is NOTHING AFAIK (feel free to cite) in C++ which states that multiple unique_ptr 's cant exist and point to the same object....

@sharwell, you went from not wanting multiple instances to wanting them and the back to unique_ptr also, just make a Move method which gets access by reference and then if you want to move it move it and don't forget to set the other reference to null... or not depending on what you want, maybe allocate another there...

Remember this is a managed language which happens to support unsafe code, nothing is stopping us from providing that unsafe code which needs to implement move semantics or otherwise.

If you want something better then show you plan on using it and I bet that something even more useful would get developed...

Finally, I don't see how unique_ptr will help for the example @Joe4evr gave as the same argument the used for my 'solution' would be applicable, the method doesn't accept unique_ptr NOR how unique_ptr will fix the issue defined here.

That issue was about ref fields and we have gone wayyy of topic, and I am still not convinced of the reasoning here for ref fields although I am sure there are other good ones...

@sharwell
Copy link
Member

sharwell commented Apr 1, 2019

Finally, I don't see how unique_ptr will help for the example @Joe4evr gave ...

  • My comments don't provide a solution for users who want to store ref-type fields in ref struct
  • Non-copyable structs directly addresses the situation of mutable value types described by @Joe4evr in the object writer example

@sharwell, you went from not wanting multiple instances to wanting them and the back ...

No, I never suggested that multiple instances were a problem. I only said that multiple copies of the same instance are a problem.

Remember this is a managed language which happens to support unsafe code, nothing is stopping us from providing that unsafe code which needs to implement move semantics or otherwise.

Enforced non-copyable semantics are not supported by unsafe code either.

If you want something better then show you plan on using it and I bet that something even more useful would get developed...

dotnet/roslyn@master...sharwell:owned-disposable
dotnet/roslyn-analyzers@master...sharwell:non-copyable

@juliusfriedman
Copy link

Haven't looked at your code yet, I'm driving... but I must say... if memmove is not enough you have bigger problems, also you could use unsafe to do what you needed but you might not need to...

@yaakov-h
Copy link
Member

yaakov-h commented Apr 1, 2019

Please don't GitHub while driving.

Also, memmove requires explicitly moving an object, and doesn't allow for situations such as returning the struct from a method call.

@juliusfriedman
Copy link

juliusfriedman commented Apr 1, 2019

Just getting done driving.

memmove function moves memory by pointer and size.?. My function would call that and if successful return return instance = default; or orherwise as needed...

Guys I hate to break it to you but what is really a ref field?

Isn't it just a ref to a field... (isnt everything in a ref struct already such?)

If you guys want that and ByReference isn't enough and using ref in the declaration of the function is somehow not enough then guess what... ref fields won't help either... I could still Unsafe or unsafe cast it away just like you can cast away const or readonly.

Yea you can analyze it and detect someone is doing that even make a compiler error out if it with a #pragma to switch it on or off etc...

Finally, if there is something unsafe cannot achieve then you are saying the hardware itself cannot support what you want to achieve and after a few more enhancements are made to the Unsafe class you likely won't be able to express anything in IL which you wouldn't also be able to from code, especially when using unsafe.

You guys want something like Reference.. what difference there is besides the class has a typed ref field... I could implement that with a IntPtr or TypedReference and be done with it.

You lost me at var.... if you think you can stop someone from copying that memory Irl.... good luck.

@CyrusNajmabadi
Copy link
Member

Finally, if there is something unsafe cannot achieve then you are saying the hardware itself cannot support what you want to achieve and after a few more enhancements are made to the Unsafe class you likely won't be able to express anything in IL which you wouldn't also be able to from code, especially when using unsafe.

You guys want something like Reference.. what difference there is besides the class has a typed ref field... I could implement that with a IntPtr or TypedReference and be done with it.

I"m really not sure what any of this means.

if you think you can stop someone from copying that memory Irl.... good luck.

I don't see why we couldn't prevent someone from doing this from safe code. We can control what users of the language can do, just as we do today with other features under the 'ref' umbrella. For example all the compile-time safety we built into the language to support things like Span: https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md

@juliusfriedman
Copy link

juliusfriedman commented Apr 4, 2019

@CyrusNajmabadi , please see these Gists:

https://gist.github.com/juliusfriedman/c3de0e9b408add44652e10eee8c959bb

https://gist.github.com/juliusfriedman/52897453c54e23a4b0769c49cae72ba6

Currently we don't support readonly parameters and I think if we are going this far to support non copy-able memory and the fact we already support the in keyword I think this must be an oversight...

@CyrusNajmabadi
Copy link
Member

@juliusfriedman I'm not sure what that has to do with this topic

@juliusfriedman
Copy link

@CyrusNajmabadi, I'm not sure with a lot of the comments have to do with this topic...

None the less the subject matter is relevant when considered with those comments, tifwyw..

@CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi, I'm not sure with a lot of the comments have to do with this topic...

The topic is preventing copying of structs... You said:

if you think you can stop someone from copying that memory Irl.... good luck.

I pointed out that there doesn't seem to be anything preventing the language from having such a concept. Similar to how it added rules for how people could work with ref structs to prevent doing certain things that would break 'safety' from teh perspective of the language.

You've responded with posts that seem to either be non-sequiturs, or which might be on topic, but are so vague as to not carry their point across on their own. I've asked you to clarify, but you don't seem to want to. So, until that happens, i'm going to have to think they're non-sequiturs.

@juliusfriedman
Copy link

Point out succinctly what you want an answer to and you will surely get one...

You can do whatever you want, e.g. force something to passed by ref or not but the fact still stands, if you don't have the same implementation available for users of the keywords which exist already how much are you really adding to the language and are you doing it in the correct way...

@CyrusNajmabadi
Copy link
Member

Point out succinctly what you want an answer to and you will surely get one...

I made it clear above. You stated:

if you think you can stop someone from copying that memory Irl.... good luck.

And i stated that it was very easy to that here:

I don't see why we couldn't prevent someone from doing this from safe code. We can control what users of the language can do, just as we do today with other features under the 'ref' umbrella. For example all the compile-time safety we built into the language to support things like Span: https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md

You can do whatever you want, e.g. force something to passed by ref or not but the fact still stands,

Again, i don't know what this means. I agree with can do whatever we want. We have done this very sort of thing in C#. The entire ref/span work was this sort of thing. It works well.

if you don't have the same implementation available for users of the keywords which exist already how much are you really adding to the language and are you doing it in the correct way...

We're adding a lot to the language. Just like with ref/span/etc. We can add rules to the language, and we can get a lot of benefits from it (esp. in the perf arena). I don't see why this request would be any different.

Again, this is not speculative. We showed this is possible already, and it worked out very well. This new request falls squarely in teh same bucket, and nothing you've said seems to actually demonstrate that this isn't doable or would be even be undesirable.

@CyrusNajmabadi
Copy link
Member

So, to address this:

Point out succinctly what you want an answer to and you will surely get one...

I would like an answer to the points made to you that this is completely feasible for the language to do, and doesn't have the sorts of problems you seem to ascribe to it.

@CyrusNajmabadi
Copy link
Member

Also, perhaps there is a language issue here, or we're just talking past each other. If you want, feel free to come over to gitter.im/dotnet/csharplang and we can discuss things further. Maybe you'll can clarify your position better and explain in depth what you think the actual problems are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests