Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding memoryUnsafe and memorySafe effects #314

Closed
Clonkk opened this issue Jan 12, 2021 · 23 comments
Closed

Adding memoryUnsafe and memorySafe effects #314

Clonkk opened this issue Jan 12, 2021 · 23 comments

Comments

@Clonkk
Copy link

Clonkk commented Jan 12, 2021

The goal of this RFC is to add a memUnsafe effect to Nim to be able to trace unsafe operations in procs.

This needs to be an effect a not a tag because it has to be applied by keyword of the language (cast) and pragmas (importc, importcpp, emit etc.).

What is "unsafe" ?

Unsafe operations can be narrowed down in Nim to :

  • keyword: cast
  • system proc : addr, unsafeAddr
  • pragmas : importc, importcpp, importjs, importobjc, emit
  • derefenrecing a pointer through either [] or . syntax (ideally speaking, hidden dereferencing might be hard to track)
  • what the developer wants to explicitly mark as unsafe

Description

memUnsafe should not have any impact by default. No friction should be added for wrapping library or writing code when memory safety isn't an issue.

memUnsafe therefore relies on memSafe to disables compilations of any memUnsafeeffects.

proc myUnsafeProc() {.memUnsafe.} =
  discard

# normalProc can use myUnsafeProc without issues or any additional notation. 
# Side effect propagate and normalProc is marked memUnsafe but it has no effect until memSafe notation is used
proc normalProc() =
  myUnsafeProc()

# Does not compile
proc mySafeProc() {.memSafe.} =
  normalProc()

To remove the memUnsafe effect, {.cast(memSafe).} can be used exactly like gcSafe.

Thus we can make the above example compile :

proc myUnsafeProc() {.memUnsafe.} =
  discard

# normalProc is not unsafe since myUnsafeProc is used in a {.cast(memSafe).} scope
proc normalProc() =
  {.cast(memSafe).}:
    myUnsafeProc()
  echo "the rest of the proc"

# No problem
proc mySafeProc() {.memSafe.} =
  normalProc()

This would be useful for any context where memory safety is required (embedded target without an MPU comes to mind).

@dom96
Copy link
Contributor

dom96 commented Jan 12, 2021

Yes! Finally we're bringing more of Nim's effect system into regular use, I love it.

@Araq
Copy link
Member

Araq commented Jan 12, 2021

when memory safety isn't an issue.

That's a poor way of phrasing it, it always is an issue. The problem is whether tools like memUnsafe help more than they hurt. And IMO they hurt more than they help. You simply don't find many bugs with this mechanism, the bugs we should focus on are:

  • Array indexing out of bounds.
  • Integer overflows.
  • Derefrencing nil pointers.

This would be useful for any context where memory safety is required (embedded target without an MPU comes to mind).

What does that mean, memory safety is always important, but what is safe it defined by the language, not wether your target has an MPU.

@juancarlospaco
Copy link
Contributor

How is importjs memory unsafe ?.

@Clonkk
Copy link
Author

Clonkk commented Jan 13, 2021

That's a poor way of phrasing it, it always is an issue.

Probably, but communication is hard.
The point is that there are situations where an unsafe memory access can have critical consequences.

The problem is whether tools like memUnsafe help more than they hurt. And IMO they hurt more than they help.

I agree that adding more friction for a situational needs is not a good solution. Which is why memUnsafe doesn't do anything unless specifically marked memSafe. It shouldn't break existing code or add more work to someone wrapping a library.

You simply don't find many bugs with this mechanism, the bugs we should focus on are:
Array indexing out of bounds.
Integer overflows.
Derefrencing nil pointers.

This is subjective. I use FFI & cast regularly but never had a case of integer overflow. The list of what should be marked unsafe can be expanded if it's relevant.
These are not mutually exclusive and I don't see why limit the tools at our disposal (again, given it does not add more friction).

what is safe it defined by the language

So let's define it.

not wether your target has an MPU.

The consequences of an unsafe memory access depends (among other things) on your OS and target.

How is importjs memory unsafe ?.

I added importjs and importobjc mostly for uniformization with importc and importcpp. (I only use c & c++ backend). Depending on how pointers are handled in JS, it may not be relevant to define it as unsafe.

@Araq
Copy link
Member

Araq commented Jan 13, 2021

Yes, but DrNim can detect "array index out of bounds" and can do everything that Nim's effect system can do. I encourage contributions to DrNim... Maybe we can discuss these things on IRC/gitter/whatever suits you.

@Clonkk
Copy link
Author

Clonkk commented Jan 13, 2021

I haven't looked in details at DrNim but safety by formal verification does sound interesting.
I usually use Matrix, is there a DrNim channel to join ?

@HugoP707
Copy link

Personally, I don’t feel like memUnsafe/safe can be very useful, I’d rather have other effects like “writes” (whose discussion has stalled a lot https://nim-lang.org/araq/writetracking.html)

@mratsim
Copy link
Collaborator

mratsim commented Jan 13, 2021

I assume this is missing raw memory management like alloc, alloc0, allocShared, create, createShared, c_malloc, free, freeShared, dealloc, deallocShared.

Also newSeqUninitialized, everything that creates an uninitialized ref and, ptr and ptr UncheckedArray.

I agree with Araq though, this is might be cheap in terms of semantics but formal verification is what you want in embedded. See my RFC #222 first part.

@Araq
Copy link
Member

Araq commented Jan 13, 2021

Personally, I don’t feel like memUnsafe/safe can be very useful, I’d rather have other effects like “writes” (whose discussion has stalled a lot https://nim-lang.org/araq/writetracking.html)

Er, no, it's shipping via --experimental:strictFuncs

@juancarlospaco
Copy link
Contributor

@Clonkk
Copy link
Author

Clonkk commented Jan 14, 2021

I assume this is missing raw memory management like alloc, alloc0, allocShared, create, createShared, c_malloc, free, freeShared, dealloc, deallocShared.

I thought about that, but I think it's more relevant to mark the act of accessing memory as unsafe rather than the allocation / deallocation. i.e. I don't think a dangling pointer that you never access is an issue.

Also newSeqUninitialized, everything that creates an uninitialized ref and, ptr and ptr UncheckedArray.

UncheckedArray completly agree - I didn't explictly wrote it because I assumed accessing an unchecked array was the same as derefenrencing a pointer.
A case can be made for uninitialized ref, and seq with -d:danger (or when not compileOption("boundChecks") whichever is more relevant).

this is might be cheap in terms of semantics but formal verification is what you want

This RFC extend the existing effect system with notation that already exists and is documented. It's purely a semantic check with familiar syntax. It's a simple solution, easy to understand that shouldn't be a burden in maintenance (at least I hope).

There is no doubt that formal verification is a better, more rigorous solution in the long-run.
I haven't looked at DrNim in details yet, but it's likely more complex and more expensive (there's also the new syntax / documentation cost but that's manageable I think) than an effect based solution.
It's hard to have a clear idea of the cost of expanding and stabilizing DrNim into a "production ready" tool (and maintenaning it) :).

@Araq
Copy link
Member

Araq commented Jan 14, 2021

Well at least DrNim is a separate tool, so the complexity is "elsewhere" and doesn't affect Nim with more experimental features. More importantly though: I claim the additions are not "worth it", so every time you lose productivity you need to measure it and every time you win productivity because it found a bug. Then in the end we would have an objective measure. It's hard to do but right now we don't even have anectodes about how it helps/hurts. Now, this argues for "let's add it and see how it turns out" but in the past we never removed any new effect from Nim, I tried to remove some but failed. There always are voices which want to keep every little thing that we added, esp if it "could find a bug one day".

@juancarlospaco
Copy link
Contributor

Imagine if DrNim is just ship like Nimpretty, it would make memUnsafe almost redundant,
with help from not nil, strictFuncs, etc.

@disruptek
Copy link

This seems pretty harmless to me. We can always deprecate/transform them out of code later, and any discussion of DrNim by comparison seems disingenuous at best -- DrNim has far less support (good luck compiling it) and is practically invisible as far as users are concerned. Are the annotations needed even documented? Are there examples?

Gate this behind yet another flag if you must; I'm sure @timotheecour would be happy to provide a PR to do so.

@Araq
Copy link
Member

Araq commented Jan 15, 2021

This seems pretty harmless to me.

What about:

proc myUnsafeProc() {.memUnsafe.} =
  discard

let indirect: proc () = myUnsafeProc # allowed?

proc mySafeProc() {.memSafe.} =
  indirect() # loophole here?

@disruptek
Copy link

Fair. So what you want for composition is

type
  SafeProc = concept c
    c is proc
    c.hasEffect not(memUnsafe)

@Clonkk
Copy link
Author

Clonkk commented Jan 18, 2021

Now, this argues for "let's add it and see how it turns out" but in the past we never removed any new effect from Nim, I tried to remove some but failed.

Merging code that is already planned to be removed is not a good practice, I agree.

any discussion of DrNim by comparison seems disingenuous at best -- DrNim has far less support (good luck compiling it) and is practically invisible as far as users are concerned. Are the annotations needed even documented? Are there examples?

From what I understand, DrNim should/will be able to deal with checking properties better than the current effect system can, so it's not absurd to compare it to an effect-based solution.

DrNim being heavily in develoment shouldn't be a reason to not invest in it. If it has been decided to move towards formal verification and that DrNim is part of Nim's future, then efforts spent improving it are worth considering.

Of course, remains the unanswered question of the timing : how long to implement this feature vs how long to stabilize DrNim ?
Then again, it doesn't seem this RFC has sparked the interest of many people, so there's no rush.

@snej
Copy link

snej commented Jan 19, 2021

@Araq:

The problem is whether tools like memUnsafe help more than they hurt. And IMO they hurt more than they help. You simply don't find many bugs with this mechanism

[Citation Needed] ;-) In Nim I pretty often find myself resorting to cast and unsafeAddr, and I've run into bugs I caused through using them. I get the impression other people use them too, since I see them used in workarounds for common problems like "how do I pass a seq[byte] to a parameter that takes a string?"

Yes, but DrNim can detect "array index out of bounds" and can do everything that Nim's effect system can do.

I'm interested in "unsafe" annotations less as a way to detect bugs, and more as a way of auditing modules I import and code I write, to try to constrain unsafe behaviors to limited areas.

@Araq
Copy link
Member

Araq commented Jan 20, 2021

In Nim I pretty often find myself resorting to cast and unsafeAddr, and I've run into bugs I caused through using them.

And you would run into less bugs if only they were inside an unsafe code section? Come on, I know you don't believe that.

I'm interested in "unsafe" annotations less as a way to detect bugs, and more as a way of auditing modules I import and code I write, to try to constrain unsafe behaviors to limited areas.

When you audit code you can search for addr|cast|unsafeAddr much like you can look for unsafe. So constrain unsafe code to the procs that need it. I really dislike the idea of making Nim more complex just because you feel the need for more punishment for using the already ugly and verbose unsafe constructs.

An alternative proposal would be a tag for not allowing direct calls into importc'ed functions. That would be much more useful (IMHO) and would actually address the valid point you raised on the forum.

@mratsim
Copy link
Collaborator

mratsim commented Jan 20, 2021

[Citation Needed] ;-) In Nim I pretty often find myself resorting to cast and unsafeAddr, and I've run into bugs I caused through using them. I get the impression other people use them too, since I see them used in workarounds for common problems like "how do I pass a seq[byte] to a parameter that takes a string?"

You certainly shouldn't cast or you would have a string that doesn't end with \0 and it will break in a strange manner in a C library status-im/nim-http-utils#9

The short-term solution is to provide a conversion proc in sugar and the long-term solution is for all libraries that work on blobs to standardize on openarray[byte].

@timotheecour
Copy link
Member

timotheecour commented Jan 20, 2021

adding {.memSafe.}, {.memUnsafe.} would actually add a lot of complexity.

Here are a few problems: under this RFC:

  • every forward declaration needs to decide whether it's {.memSafe.}, {.memUnsafe.}
  • every importc proc and magic needs to be marked as {.memSafe.}, {.memUnsafe.}
  • it increases risks of breaking changes: if a proc is marked as {.memSafe.}, changing it to {.memUnsafe.} (eg because doing so enables better performance) would be a breaking change
  • it's unclear how you can handle callabacks:
proc fn1() {.memSafe.} = discard
proc fn2() {.memUnsafe.} = discard
proc bar(fn: proc()) = fn()
  • if bar is inferred as {.memSafe.}, it prevents calling it with fn2, which can prevent valid use cases (eg if bar represents sorting and fn represents a cmp), or if the annotation is added, it introduces a breaking change (previously valid code bar(fn2) would stop working)
  • if bar is inferred as {.memUnsafe.}, it prevents using it inside {.memSafe.} code even if it's only called with {.memSafe.} callbacks such as fn1.

So this forces you to have some kind of genericity in {.memSafe.} vs {.memUnsafe.}, but that has its own issues, eg for fwd declarations:

proc bar[Fn](fn: Fn) # should you annotate as {.memSafe.} or {.memUnsafe.}?
proc fn1() {.memSafe.} = discard
proc fn2() {.memUnsafe.} = discard
proc bar[Fn](fn: Fn) = fn()

and heuristics such as "if any argument is {.memUnsafe.}, mark it as {.memUnsafe.}" just don't cut it (too pessimistic).

note that some of those problems affect other effects too, but the problem is exacerbated with memory safety. In all likelihood, the benefit you'd get would be limited (I doubt this would help catch bugs that couldn't be caught otherwise), especially in comparison with the costs involved.

precedent: see D's analogous annotations @safe, @trusted, @System (https://dlang.org/spec/memory-safe-d.html#:~:text=Memory%20Safety%20for%20a%20program,never%20result%20in%20memory%20corruption.) which make the language significantly more complex for IMO limited benefit.

@mratsim
Copy link
Collaborator

mratsim commented Jan 21, 2021

That's function coloring. It is justified in many case because it helps read the codebase or ensure some degree of safety but too much coloring and you fall in http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/.

That said, I think coloring for async/await or {.gcsafe.} or even {.raises: [Defect].} is absolutely worth it especially from a maintenance point of view in large codebase (https://lukasa.co.uk/2016/07/The_Function_Colour_Myth/).

Having a type system is a form of coloring your functions already. We need a balance between convenience and overkill.

@Araq
Copy link
Member

Araq commented Feb 23, 2021

From https://arxiv.org/pdf/2007.00752.pdf

Across our dataset, we find that a majority of crates include functions which
are possibly unsafe. We also find that unsafe function calls are the
most common use of unsafeness, and the unsafeness is through
library dependencies rather than through the use of the keyword
unsafe. Perhaps nonintuitively, we find that the most downloaded
crates have more unsafe code than other crates. From these results,
it is difficult for users to know if their code is safe, and thus we
present recommendations for helping users understand when they
are using Unsafe Rust in their software.

Which supports my view, fwiw, not even in Rust it's easy to tell whether you used something "unsafe". Whatever that even means, of course eventually you use loads and stores into memory...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants