Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib concurrency-safety documentation, part three: arrays #11237

Merged

Conversation

Octachron
Copy link
Member

This PRs adds a section about concurrency-safety to the documentation of all four kind of mutable arrays in the standard library: arrays, byte sequences, bigarrays and Float.Array.t.

All sections follows the same uniform template with some optional subsections.

  • Arrays are safe in OCaml even in presence of data races
  • (bigarray only) Arrays are not safe in C in presence of data races
  • Data races should be avoided as much as possible
  • In case of data races, beware of:
    • non-linearizability
    • (float arrays and bigarrays only) tearing
    • (byte sequences only) mixed-size access are unspecified

with a concrete example for each case in the "beware of data races" subsection.

@gadmm
Copy link
Contributor

gadmm commented May 4, 2022

Is it not premature to document safety in the presence of data races for code that relies on bits written in C, in light of #10992 ? For instance, lots of array operations are written in C.

@kayceesrk
Copy link
Contributor

@gadmm for context, memory model and array operations was discussed in ocaml-multicore/ocaml-multicore#822.

Do you have a specific operation that is unsafe? We can focus on getting that operation right.

@xavierleroy
Copy link
Contributor

What definition of "safe" are we using?

If it's "a read returns a value of the expected type", then yes, all reads from strings, byte arrays, flat float arrays or bigarrays are safe, just because any bit vector of the appropriate size is an appropriate value of integer or floating-point type.

If it's "a read doesn't return a secret value that the program never got access to", then out-of-thin-air reads could break safety. I guess this is what @gadmm has in mind.

My reaction is that we will rule out OOTA accesses in the OCaml runtime system, by putting appropriate atomic / volatile annotations as in #10992 and/or by asking the compiler to refrain from silly optimizations (like we do for signed arithmetic overflow with the -fwrapv option or for type-based nonaliasing with the -fno-strict-aliasing option).

@gadmm
Copy link
Contributor

gadmm commented May 4, 2022

I am referring to parts of the stdlib written in C, for instance runtime/array.c which has a few occurrences of Field as an r-value. More generally, the point of #10992 beyond Field usage is that you first need to define what is the common memory model of C and OCaml before you can start saying something about the runtime/*.c parts for the stdlib. I am aware of past discussions about memory-safety for arrays, it is a different source of unsafety not specific to arrays, and I also do not think this has to do with OOTA (unless I misunderstood what you allude to).

If the latest proposal over there is accepted then there is a solution for the uses of Field in this file (and some work ahead to audit the stdlib). But at the moment it is just a proposal.

If the question is "let's write the doc according to how we hope OCaml 5.0 will be when it gets out" and if you think this will be solved before release, then yeah you can ignore this discussion. My current reaction is more about "whens" than "ifs", since on the one hand I do not know when you expect to release 5.0 and on the other hand the progress on the other issue seems slow.

@Octachron
Copy link
Member Author

My personal policy is that writing documentation draft is never too premature. It is both a question of task parallelization, and a way to have a good place to discuss an user-readable specification for OCaml 5 arrays.

I was using safe in this draft as meaning will not crash the rest of the program.
I think that for the standard library documentation, it is important to emphasize that data races should be avoided. The description of behaviors in presence of data races is for a good part here to give more weight to this advice.

@dra27 dra27 added this to the 5.0.0 milestone Jun 6, 2022
@kayceesrk kayceesrk self-assigned this Oct 25, 2022
Copy link
Contributor

@fpottier fpottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, here is some feedback on this PR.

stdlib/array.mli Outdated Show resolved Hide resolved
stdlib/array.mli Outdated
considerably complicate the possible behaviors of programs.

A general caveat in presence of data races is that concurrent array
operations might be interleaved leading to results that cannot be obtained
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "concurrent array operations might be interleaved" is not clear. The problem is that "many operations on arrays are not atomic", which means that they are implemented by sequences of more basic operations, which (when executed by multiple concurrent threads) can be interleaved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we document the fact that Array.get and Array.set are regarded as non-atomic read and write events in the sense of the memory model, just like reading or writing a non-atomic reference? Or is this not true? I would imagine that it is true except perhaps for float arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I would propose to extend the description to:

A general caveat in the presence of data races is that most functions on
arrays are not atomic. They consist in sequence of basic operations that
might be interleaved when executed in parallel on multiple domains. This
interleaving of basic operations can produce results which are impossible
to obtain with a sequential execution of the higher-level functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I would suggest also documenting which functions are atomic, unless this is already done somewhere else.

stdlib/array.mli Outdated Show resolved Hide resolved
stdlib/bigarray.mli Outdated Show resolved Hide resolved
stdlib/bigarray.mli Outdated Show resolved Hide resolved
stdlib/bigarray.mli Outdated Show resolved Hide resolved
stdlib/bigarray.mli Outdated Show resolved Hide resolved
stdlib/bigarray.mli Outdated Show resolved Hide resolved
stdlib/bigarray.mli Outdated Show resolved Hide resolved
@kayceesrk
Copy link
Contributor

kayceesrk commented Nov 2, 2022

The current text is a very good start. I feel like the text could be improved a bit. Here is my attempt at improvement:

(** {1:array_concurrency Arrays and concurrency safety}

    Care must be taken when concurrently accessing arrays from multiple domains. 

    {2:array_atomicity Atomicity}

    Any array operation that accesses more than one element is not guaranteed to
    be atomic in the presence of concurrent accesses. This includes iteration,
    scanning, sorting, splitting and combining arrays.

    For example, consider the following program:
{[let size = 100_000_000
let a = Array.make size 1
let d1 = Domain.spawn (fun () -> Array.iteri (fun i x -> a.(i) <- x + 1) a)
let d2 = Domain.spawn (fun () -> Array.iteri (fun i x -> a.(i) <- 2 * x + 1) a)
let () = Domain.join d1; Domain.join d2
]}

    After executing this code, each field of the array [a] is either [2], [3],
    [4] or [5]. If atomicity is required, then the user must implement their own
    synchronization (for example, using {!Mutex.t}).

    {2:array_data_race Data races}

    If two domains only access disjoint parts of the array, then the
    observed behaviour is the equivalent to some sequential interleaving of the
    operations from the two domains.

    If two domains access the same element of the array, then it is recommended
    that there is synchronization to mediate the accesses. A data race is said
    to occur when two domains access the same array element without
    synchronization and at least one of the accesses is a write.

    In the absence of data races, the observed behaviour is equivalent to some
    sequential interleaving of the operations from different domains. In the
    presence of data races, the observed behaviour may not be equivalent to any
    sequential interleaving of operations from different domains. However, even
    in the presence of data races, a read operation will return the value of
    some prior write to that location (with a few exceptions for float arrays).
    Importantly, programs with data races do not crash.

    {2:array_data_race_exceptions Float arrays}

    Float arrays have two supplementary caveats in the presence of data races.

    First, the blit operation might copy an array byte-by-byte. Data races
    between such a blit operation and another operation might produce surprising
    values due to tearing: partial writes interleaved with other operations can
    create float values that would not exist with a sequential execution.

    For instance, at the end of
 {[let zeros = Array.make size 0.
let max_floats = Array.make size Float.max_float
let res = Array.copy zeros
let d1 = Domain.spawn (fun () -> Array.blit zeros 0 res 0 size)
let d2 = Domain.spawn (fun () -> Array.blit max_floats 0 res 0 size)
let () = Domain.join d1; Domain.join d2
]}
    the [res] array might contain values that are neither [0.] nor [max_float].

    Second, on 32-bit architectures, getting or setting a field involves two
    separate memory accesses. In the presence of data races, the user may
    observe tearning on any operations.
*)

Copy link
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useful to do another iteration of the documentation. Would be happy to review that quickly once it is done.

@Octachron
Copy link
Member Author

I have pushed your version of the text to be able to comment on it more easily.

stdlib/array.mli Outdated
@@ -320,6 +320,71 @@ val of_seq : 'a Seq.t -> 'a array
(** Create an array from the generator
@since 4.07 *)

(** {1:array_concurrency Arrays and concurrency safety}

Care must be taken when concurrently accessing arrays from multiple domains.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning ought to be explained in the same paragraph, since documentation is often only scanned. I would add a short take-out message:

Care must be taken when concurrently accessing arrays from multiple domains:
accessing an array will never crash a program, but unsynchronized accesses might
yield surprising results. 

Copy link
Contributor

@kayceesrk kayceesrk Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Care must be taken when concurrently accessing arrays from multiple domains: unsynchronized access to an array will never lead to crashes but may yield surprising results.

I wonder whether it is better to use the appropriate term here -- non-sequentially-consistent -- instead of beating around the bush and calling it surprising. Sequential consistency is well understood in concurrency literature and it is not a bad thing if our users also use the terms correctly. To do this, we will need to define sequential consistency. I am not holding to this too tightly. Hence, feel free to ignore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving more context to the vague "surprising" is better but I would keep that point as a footnote:

but may yield surprising (non-sequentially-consistent) results.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

stdlib/array.mli Outdated
{2:array_atomicity Atomicity}

Any array operation that accesses more than one element is not guaranteed to
be atomic in the presence of concurrent accesses. This includes iteration,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all functions in the Array module are not atomic, I would simplify the sentence to:

All functions in the `Array` module that access more than one element are not atomic in the presence of unsynchronized accesses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that "all ... are not" is not idiomatic. How about "Every array operation that accesses more than one array element is not atomic...".

Or, alternatively, give an explicit list of the operations that are atomic, and indicate that every function that does not appear in this list is not atomic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all functions in the Array module are not atomic, I would simplify the sentence to:

I think we have to be careful that there are two notions of atomicity here.

  1. Atomic memory locations in the memory model sense.
  2. Atomic operations which combine multiple non-atomic operations together (using say Mutex.t).

The phrase "All functions in the Array module are not atomic" is true, but Array.get is not atomic in the sense of (1) whereas Array.iter is not atomic in the sense of (2). Here, I am talking about (2).

If we just want to rephrase the sentence, then we should go with @fpottier's suggestion

Every array operation that accesses more than one array element is not atomic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me.

stdlib/array.mli Outdated
sequential interleaving of operations from different domains. However, even
in the presence of data races, a read operation will return the value of
some prior write to that location (with a few exceptions for float arrays).
Importantly, programs with data races do not crash.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression that there are too much back-and-forth in the three paragraphs. The structure is roughly:

  • Disjoint accesses are linearizable
  • Non-disjoint accesses should be mediated by synchronization
  • Definition of data races
  • Data-race free accesses are still linearizable
  • With data races, behavior are not linearizable but there are no values out-of-thin-air and no crashes

I would propose to have the description in increasing order of unsafety

  • Disjoint accesses are linearizable
  • Definition of data races
  • Data-race free accesses are still linearizable
  • Data races should really be avoided by using synchronization
  • With data races, behavior are not linearizable but there are no crashes and no values out-of-thin-air
If two domains only access disjoint parts of the array, then the
observed behaviour is the equivalent to some sequential interleaving of the
operations from the two domains.

A data race is said to occur when two domains access the same array element without synchronization
and at least one of the accesses is a write.
In the absence of data races, the observed behaviour is equivalent to some
sequential interleaving of the operations from different domains. 

Whenever possible, data races should be avoided by using synchronization to mediate the accesses to the array elements.
 
Indeed, in the presence of data races, programs will not crash but the observed behaviour may not be equivalent to any
sequential interleaving of operations from different domains. Nevertheless, even
in the presence of data races, a read operation will return the value of some prior write to that location
(with a few exceptions for float arrays).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@Octachron
Copy link
Member Author

@kayceesrk , I have updated the text for all array types (with some minor variations for bytes) if you want to review it once again. (I think that it is fine for this part of the documentation to be merged before the first rc and not in the upcoming beta2).

stdlib/array.mli Outdated Show resolved Hide resolved
For example, consider the following program:
{[let size = 100_000_000
let a = Array.make size 1
let d1 = Domain.spawn (fun () -> Array.iteri (fun i x -> a.(i) <- x + 1) a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example has to be updated to use ArrayLabels instead of Array.

create float values that would not exist with a sequential execution.

For instance, at the end of
{[let zeros = Array.make size 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example has to be updated to use ArrayLabels instead of Array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This required to update the syncing script between the two code. I am slightly worried that my changes are brittle. But let's see.

stdlib/float.mli Outdated Show resolved Hide resolved
@Octachron Octachron force-pushed the array_and_concurrency_documentation branch from a2ec82b to b51e53f Compare December 1, 2022 10:20
@Octachron
Copy link
Member Author

@kayceesrk the PR is still in the change requested mode, are you planning to do more review passes?

Copy link
Contributor

@kayceesrk kayceesrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for all the work!

@Octachron Octachron merged commit 693e9c4 into ocaml:trunk Dec 5, 2022
Octachron added a commit that referenced this pull request Dec 5, 2022
…ntation

stdlib concurrency-safety documentation, part three: arrays

(cherry picked from commit 693e9c4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants