Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: label-break-value #2046

Merged
merged 7 commits into from
Feb 27, 2018
Merged

Conversation

ciphergoth
Copy link
Contributor

@ciphergoth ciphergoth commented Jun 26, 2017

Rendered

Tracking issue: rust-lang/rust#48594

Allow a break not only out of loop, but of labelled blocks with no loop. Like loop, this break can carry a value.

See also Pre-RFC discussion.

Status as of 2018-01-18.

@ciphergoth ciphergoth changed the title label-break-value RFC RFC: label-break-value Jun 26, 2017
@mark-i-m
Copy link
Member

This feels to much like a goto. I don't really want Rust == Fortran 😛

@scottmcm
Copy link
Member

@mark-i-m Is there something particular about this proposal that makes it feel too goto-like?

This seems to have the same restrictions as the labeled break we already have, as well as to normal break and return: cannot go backwards, cannot skip variable definitions, can only exit a syntactically-visible {} block. (Or, more abstractly, that it doesn't violate the "progress of the process remains characterized by a single textual index" property from Go To Considered Harmful.)

@eddyb
Copy link
Member

eddyb commented Jun 27, 2017

The problems with goto are more or less those of safety. Rust doesn't have the same issues.
Besides you can already do this with loop { break {...} }, the difference is purely in ergonomics.

@mark-i-m
Copy link
Member

that it doesn't violate the "progress of the process remains characterized by a single textual index" property from Go To Considered Harmful

This is a fair point, but I feel more strongly, I guess.

I personally don't like the labeled breaks we already have. IMHO, they usually just make control flow harder to follow/write/debug without really providing any benefit, which is the heart of Dijkstra's point. In fact, I would go so far as to say, that I generally dislike normal break/continue, too, but I accept them for lack of a better alternative. The way I see it, the more entry/exit points you have for a loop, the worse code quality is -- it's just becomes convoluted to reason about loop invariants.

@mark-i-m
Copy link
Member

I should add that by breaking out of the middle of a block, you have effictively made it into 2 blocks, one of which doesn't always happen. And that is not always syntactically obvious, IMHO...

@ciphergoth
Copy link
Contributor Author

Full text of Go To Statement Considered Harmful - very interesting!

@ciphergoth
Copy link
Contributor Author

I filed a Pre-RFC about this, see also discussion there.

 link to discussions in discussions instead.
@ciphergoth
Copy link
Contributor Author

Other discussions: I proposed this here. An identical proposal was part of the explanation for trait based exception handling.

@golddranks
Copy link

golddranks commented Jun 27, 2017

I think that making control flow more flexible is generally a good thing. As already argued by others Dijkstra's critique doesn't apply here; the critique is against obfuscating the program state using surprising control paths. In the context of this feature it's only allowed to break outwards from a scope, which doesn't allow witnessing uninitialised variables or skipping in the middle of some code that expects there to be a state set up by the earlier code. It does allow skipping some code in a possibly "surprising way" the sense that one can jump from an inner scope to the grandparent scope, but unlike exceptions, this is still a local feature that is well visible in the local context – so in the end, it hardly isn't actually surprising, and when used with discretion, can lead to cleaner code.

I'd argue, that without flexible and safe control flow constructs, people tend to store the "which path" information to local flags or use inner functions to be able to return early. These both feel like hacks to me. The problem with manually juggling control flow flags is that the compiler can't hardly reason about their state and the problem with functions is that they are a wrong abstraction – they come with a new stack frame and aren't easily able to access the parent frame unless the state is explicitly passed to them. They are too heavyweight. Having labelled breaks is a nice way to retain the "state machine-y" feeling of a function-local control flow but still allow more flexible flow that comes handy from time to time.

@eddyb
Copy link
Member

eddyb commented Jun 27, 2017

@ciphergoth In fact my first instinct for desugaring catch is that there's a label on it (or at least the compiler has a way to refer to it) and ? uses break 'innermost_catch err instead of return err.
I haven't looked much into how it ended up being implemented but AFAIK it's close enough.
The advantage of a proposal like this is that you can go to one of many labels instead of just one.

@ciphergoth
Copy link
Contributor Author

ciphergoth commented Jun 27, 2017

@eddyb The RFC that proposes catch invents pretty much exactly what I propose here in order to describe what it does. It also describes return in terms of a break to a special 'fn scope for the whole function.

In the pre-RFC discussion, nikomatsakis says:

The compiler already internally supports [labelled blocks] for use with catch { } (that is how the HIR represents catch).

@withoutboats withoutboats added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jun 27, 2017
@mark-i-m
Copy link
Member

@golddranks

As already argued by others Dijkstra's critique doesn't apply here; the critique is against obfuscating the program state using surprising control paths.

Perhaps I will always just disagree on this... I suspect I am probably more extreme than most on this point. It looks like I am pretty outnumbered here, so I wont spam everyone more beyond this post, unless asked for more 😛

Basically, I can't imagine many useful situations where this is easier to follow

'block: {
    do_thing();
    if condition_not_met() {
        break 'block;
    }
    do_next_thing();
    if condition_not_met() {
        break 'block;
    }
    do_last_thing();
}

than this

do_thing();

if condition_met() {
    do_next_thing();
    if condition_met() {
        do_last_thing();
    }
}

In the first example it's not clear that the preconditions for do_last_thing are the conditions_met1() && conditions_met2(). It is also a bit annoying that syntactically non-obvious prefixes of a block might execute.

But in the second one, the curly braces (and formatting conventions) make it clear, which is what we expect because curly braces are the primary way rust indicates a block of code. Of course someone will argue that if you have 50 of them, you will have too much indenting. I thinks it's worth it, but I think that's really a matter of taste.

@est31
Copy link
Member

est31 commented Jun 28, 2017

@mark-i-m In my use case, I need to conditionally break from inside a deeply nested structure, the eno! invocation is 4 layers inside structure, with loops and if's outside. This can't just be simply refactored to use if.

Also, I especially like the pattern to break early if some condition is not met, so that there is no big rightward drift.

@mark-i-m
Copy link
Member

@est31 I'll take your word for it that it is hard to refactor (I haven't tried). I guess I can see the use case, but I still don't really like the break as a pattern... although, I don't have an alternative, other than major refactoring...

@scottmcm
Copy link
Member

@mark-i-m I actually agree with your for that example. Overall, though, I suspect that people won't reach for this except in cases where if and match are awkward too, so it doesn't scare me to have it.

The example I found quite compelling was this one:

let result = 'block: {
    for &v in first_container.iter() {
        if v > 0 { break 'block v; }
    }
    for &v in second_container.iter() {
        if v < 0 { break 'block v; }
    }
    0
}

Because a simple translation to normal constructs ends up something like this:

let mut result = None;
for &v in first_container.iter() {
    if v > 0 {
        result = Some(v);
        break;
    }
}
if result.is_none() {
    for &v in second_container.iter() {
        if v < 0 {
            result = Some(v);
            break;
        }
    }
}
let result = result.unwrap_or(0);

Which I find more awkward, as it obscures the symmetry and has the compiler less able to help with the initialization logic. (You could also do this one with the {||{ … }}() "operator" and return, but I'm not a fan of immediately-called-closure as a pattern either.)

@ciphergoth
Copy link
Contributor Author

Should I be changing the examples in the RFC to reflect discussion here?

@mark-i-m
Copy link
Member

I think it would be useful to include some of the motivating examples. And I would also like to see the disadvantages section updated, with some of the objections, even if the language is not strongly worded...

Various other small improvements
@ciphergoth
Copy link
Contributor Author

ciphergoth commented Jul 1, 2017

For the specific example given you could also do this:

first_container.iter().filter(|&&v| v > 0).chain(
        second_container.iter().filter(|&&v| v < 0)
    ).map(|&v| v).next().unwrap_or(0);

@mark-i-m
Copy link
Member

mark-i-m commented Jul 1, 2017 via email

@Ericson2314
Copy link
Contributor

Big fan of this; thanks for writing!

One nit is while the desugaring is correct, I don't think in long term we should implement/teach/think about the feature that way. I rather have:

  1. Break on block is fundamental

  2. Breaking out of loop is no different than breaking out of underlying block

  3. Break with no label is desugared to innermost loop instead of innermost block for historical reasons.

RFC language can be incremental I suppose, but o hope something like that makes the books.

@ciphergoth
Copy link
Contributor Author

Ericson2314 - how do we integrate continue into that teaching plan?

Maybe we could describe loops as being implicitly this:

'outer: {
    for i in container.iter() {
        'inner: {
            LOOP BODY
        }
    }
}

Then break means break 'outer and continue means break 'inner. continue 'label really means something like continue 'label_inner.

@mark-i-m
Copy link
Member

mark-i-m commented Mar 2, 2018

I think the question is inherently a bit subjective... Does this encourage less-clear-than-it-could-be control flow? (as opposed to an objective question: does this fundamentally introduce new control flow?)

Anyway, I think that the best thing to do now would be to focus on making the feature as good as it can be 🥇

@SoniEx2
Copy link

SoniEx2 commented Mar 2, 2018

Does this encourage less-clear-than-it-could-be control flow?

I mean, it allows replacing:

loop {
/* ... */
if cond { break val; }
/* ... */
break otherval;
}

with:

'block: {
/* ... */
if cond { break /*'block?*/ val; }
/* ... */
val
}

So I mean, if you find the former clearer... ;)

@Centril Centril mentioned this pull request Apr 21, 2018
@Centril Centril mentioned this pull request May 3, 2018
@SoniEx2 SoniEx2 mentioned this pull request May 4, 2018
@dtolnay dtolnay mentioned this pull request Sep 2, 2018
@Centril Centril added A-syntax Syntax related proposals & ideas A-control-flow Proposals relating to control flow. A-expressions Term language related proposals & ideas labels Nov 23, 2018
@ciphergoth ciphergoth deleted the label-break-value branch February 28, 2019 07:00
@derekdreery
Copy link

I'm a big fan of this idea for a number of reasons

  1. People are already doing this in hacky ways (loops that only ever run once etc.). I think this proves the need exists
  2. It's not goto. You can't jump arbitrarily around. goto is sometimes used like this in C, but IMO that's what it should be used for.
  3. It simplifies the language rather than complicating it. This is because it unifies concepts of early return and early ending of loops, so conceptually you can think of them as the same. I always like features that make the language simpler. (I accept this is partially subjective, hopefully I've put forward a convincing argument)

@Kimundi
Copy link
Member

Kimundi commented Jun 16, 2019

I also recently had to use the loop hack, so I'd really like to see this feature in Rust.

@ijackson
Copy link

Because the target of the break is ambiguous, code like the following will produce an error at compile time:

loop {
    'labelled_block: {
        if condition() {
            break;
        }
    }
}

I would strongly prefer this to be permitted, and resolved by only binding labelled breaks to labelled blocks.

Something like this is needed for macros which want to adjust the control flow. I came across one in if_chain. The loop hack can't be used there, because it's not possible to rethrow an inner break. The break can be caught, giving a value (which is possibly ()). But it can't be rethrown because the macro doesn't know whether it's in a for (where break () is forbidden) or a loop returning a non-unit (where break without a value is forbidden).

@ciphergoth
Copy link
Contributor Author

What you want can still be written as

'loop: loop {
    'labelled_block: {
        if condition() {
            break 'loop;
        }
    }
}

Would that meet your needs?

@ijackson
Copy link

Would that meet your needs?

No, sadly not. The if_chain macro is a good example here, but there are other similar situations.

Currently

  if_chain! {
    if let Some(a) = get_a();
    if let Some(b) = get_b();
    then { Left((a, b)) }
    else { Right(42) }
  }

expands to (roughly)

  if let Some(a) = get(a) {
    if let Some(b) = get_b() {
      Left((a, b))
    } else {
      Right(42)
    }
  } else {
    Right(42)
  }

This is bad because it repeats the else clause and, worse, the different copies of it can pick up differnet variable bindings (imagine if there were an a and b in the environment and we said Right((a, b))! I want to expand to something like this:

  'if_chain_outer: {
    'if_chain_else: {
      if let Some(a) = get(a) {
	if let Some(b) = get_b() {
	  break 'if_chain_outer Left((a, b));
	} else {
	  break 'if_chain_else;
      } else {
	break 'if_chain_else;
      };
      Right(42)
    }
  }

But this does not work correctly if the macro user writes break. Eg,

  for x in stuff {
    if_chain!{
      let spong = {
        if complicated { break }

According to the current RFC, this break is "ambiguous" and must be rejected. Of course it's not ambiguous in the user's code. The macro just needs that break to bypass all the named blocks.

I don't think having it bypass the named blocks is at all strange or confusing. Perl's named blocks work the same way.

This limitation is a real shame becaue there are a lot of cool things that can easily be done with macros that are able to define their own control flow like this, but they're lacking this primitive. You can't use the loop trick either for similar reasons to do with break.

@SoniEx2
Copy link

SoniEx2 commented Mar 23, 2021

This limitation would force user crate to break explicitly. How is that not a good thing? If anything this limitation can be used by macros to prevent user break from conflicting with macro break!

@ijackson
Copy link

(You might think that the macro could "catch" the user's break by surrounding the user's code with a loop, and then rethrow it. But the macro doesn't know if it is inside a loop, and if so whether that's a for loop where break () is forbidden or a loop loop with a value return where value-less break is forbidden, so it doesn't know whether to emit the rethrowing code.)

@SoniEx2
Copy link

SoniEx2 commented Mar 23, 2021

Do you want hygienic break?

@ijackson
Copy link

This limitation would force user crate to break explicitly. How is that not a good thing?

I'm not sure what you mean. The if_chain macro doesn't have anything to do with loops and the user doesn't expect it to "steal" break. In the user's code break is quite clearly referencing the loop, just as it would be if the if were an if rather than an if_chain!.

If anything this limitation can be used by macros to prevent user break from conflicting with macro break!

Err, if the macro is supposed by its function to embed the user's code in a loop, then the user's break ought to refer to the macro's loop. Presumably the macro would provide the user with a way to name the macro-defined loop.

Do you want hygienic break?

Essentially, yes. In fact the named break is already hygienic, I think. The labels are identifiers and I assume they are already hygienic so that if the macro and the user use the same name, it won't be treated as the same identifier.

The only difficulty is that the "unnamed label" is not hygienic because it inserts itself unconditionally.

@SoniEx2
Copy link

SoniEx2 commented Mar 23, 2021

Well, luckily wrapping the user code in a label-break-value causes it to error instead of miscompiling, thus alerting the user of how to fix it. That's a win, yeah?

You don't make the unnamed label hygienic. You introduce a mechanism that can invalidate the unnamed label. That mechanism is called label-break-value.

@alercah
Copy link
Contributor

alercah commented Mar 23, 2021

I think it might be worth arguing that break () ought to be permitted in for and while.

There's definitely also an argument for hygiene of unlabeled break.

This RFC is not really the right place to argue either, though, especially not in the now-closed RFC thread. This conversation should be on ILRO or Zulip.

@scottmcm
Copy link
Member

Looks like this predates the "Prior Art" section, but nitroll mentioned that emacs lisp has this, even using 'foo syntax: https://www.gnu.org/software/emacs/manual/html_node/elisp/Catch-and-Throw.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-control-flow Proposals relating to control flow. A-expressions Term language related proposals & ideas A-syntax Syntax related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.