Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Spec Glossary #1537

Closed
wants to merge 4 commits into from
Closed

Conversation

chorman0773
Copy link
Contributor

This adds some of the terms I used in #1523 to a separate top level section of the glossary.

Not too picky about changes to the formatting, I just wanted to throw something together to get T-spec eyes on it.

@rustbot label +T-spec +A-glossary

@chorman0773 chorman0773 self-assigned this Jul 18, 2024
@rustbot rustbot added A-glossary Area: glossary T-spec Team: spec labels Jul 18, 2024
@chorman0773
Copy link
Contributor Author

Based on some discussion in RPLCS, I'm likely going to change the term "ill-formed" here and in #1523 to be "fails to compile", but leave "shall" untouched. Not sure what do do about "Ill-formed - No Diagnostic Required" yet, but I'll make that change as well.

@@ -105,6 +129,7 @@ are also considered local. Fundamental type constructors cannot [cover](#uncover
Any time the term "covered type" is used,
the `T` in `&T`, `&mut T`, `Box<T>`, and `Pin<T>` is not considered covered.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a stray newline?

@@ -1,5 +1,29 @@
# Glossary

## Terms used in the Specification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more about why this would be separated?

I think that generally we shouldn't need to separate this out. I don't think there should be terms here that aren't used in the specification (unless they are aliases for ones that are).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm back and forth on this.

The main thing I would put in the section would be prescriptive terms, rather than semantic terms (IE. terms used to express the requirements of the spec, not terms that refer to a construct). IE. a term like "function" wouldn't go in this section, but would go into the rest of the glossary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be something to discuss tomorrow.

src/glossary.md Outdated
@@ -1,5 +1,29 @@
# Glossary

## Terms used in the Specification

### Ill-formed Program
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Ill-formed Program
### Ill-formed program

Headings should be in sentence case.

src/glossary.md Outdated

### Ill-formed Program

A Rust Program is ill-formed if it violates a statically set constraint placed on it by the language.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why some of these words are being capitalized. Also, I think this can be reworded to simplify a little.

Suggested change
A Rust Program is ill-formed if it violates a statically set constraint placed on it by the language.
A Rust program is ill-formed if it violates a static constraint of the language.

I'm also trying to figure out, what is a "static constraint"? How would this contrast with a "dynamic constraint"? Would be possible to state it as "violates a rule of the language"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's to separate out from dynamic constraints that lead to undefined behaviour (such as dereferencing a null pointer).

src/glossary.md Outdated
### Ill-formed Program

A Rust Program is ill-formed if it violates a statically set constraint placed on it by the language.
Unless otherwise statement explicitly an implementation shall issue an appropriate diagnostic for an ill-formed program and shall not translate such a program.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some awkward wording here.

Suggested change
Unless otherwise statement explicitly an implementation shall issue an appropriate diagnostic for an ill-formed program and shall not translate such a program.
Unless otherwise stated explicitly, an implementation shall issue an appropriate diagnostic for an ill-formed program and shall not translate such a program.

However, as an alternative, can this be stated as:

Suggested change
Unless otherwise statement explicitly an implementation shall issue an appropriate diagnostic for an ill-formed program and shall not translate such a program.
Unless otherwise stated, it is an error to have an ill-formed program.

Although I don't know when we would ever state that it is not an error. Can you give an example? Is this referring to things like undefined-behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See "No Diagnostic Required"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the proposed wording is a bit ambiguous here - you could argue it's always an error to have such a program (even if the constraint the program violated is one for which "no diagnostic is required").

src/glossary.md Outdated
A Rust Program is ill-formed if it violates a statically set constraint placed on it by the language.
Unless otherwise statement explicitly an implementation shall issue an appropriate diagnostic for an ill-formed program and shall not translate such a program.

#### No Diagnostic Required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still struggling with understanding what this means.

Is this trying to say that some rules may result in a warning, but otherwise isn't enforced?

If so, I don't think we have yet agreed on an approach for how to document warnings. In general, we have avoided all warnings in the reference. There are a few exceptions when it is directly relevant to the language (like the diagnostic chapter), or when we have felt the warning is very important (like unsafe_op_in_unsafe_fn, though that is arguable if it is appropriate.

Copy link
Contributor Author

@chorman0773 chorman0773 Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lints are separate from this.
The compiler can issue a hard error (fail to compile), warn (lint, fcw, etc.), or silently accept it. This isn't for things that you'd lint on, though, this is particularily for things may horribly break the program, but can't be easily (or at all) detected and diagnosed. For example, asm!(".section .data");, which can cause the rest of the function (and possibly even entire other functions) to be emitted in the .data section, far far away from where the code is expected to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still not understanding what this means. I don't see why there would be rules that we want to say "this may or may not compile".

I'm trying to understand, is this related to undefined behavior?

I see in the assembly chapter, asm.invocation.prefix-instr mentions this. I'm confused because the original text had that in a list of things the programmer must not do to avoid undefined behavior. The new text doesn't seem to be organized that way, and I'm having a hard time understanding why that changed or how it relates to the original text.

Copy link
Contributor Author

@chorman0773 chorman0773 Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically super UB. With UB it's fine to have it in the program as long as it's not evaluated*. With IF-NDR, it's not fine to have it in the program.

That particular list (in the original ASM chapter) is particularly bad, because it had quite a few things mixed together, including things that obviously are not fine in unevaluated code because they have potentially invalid global implications - for example, asm!(".section .data") as mentioned, which switches to a section called .data and depending on how inline asm is implemented, that could affect just the rest of the block, all inline assembly in the crate, or potentially even the rest of the function and all others in the crate.

For the same reason, trying to put a function into .data is likely fundamentally broken, because depending on the platform, the binary format, and the linker, that could end up in an RWX section, an RW section, and RX section (and mutable data is placed in the same RX section), or outright being rejected by the linker or loader (whether or not the function is even called), but it's difficult to diagnose this in the general case.

Comment on lines +21 to +23
### Shall

The word "shall" is a static constraint placed upon the program. A program that violates a constraint that uses "shall" is ill-formed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't entirely object to specifying this, but I just want to call out that it seems unusual to do so. At least, I have not seen any other languages do that. Is there a precedent that shows this needing definition or way this could be confusing otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought C++ did but I checked and am incorrect.
I know C does so implicitly. I think it's useful to call out the term shall as specifically meaning a static constraint, because it could reasonably mean either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ada standards do (since Ada 95; Ada 83 used "must"). See eg Ada 2012 §1.1.5:3.

@ehuss ehuss added S-blocked Status: Marked as blocked on further work. and removed S-blocked Status: Marked as blocked on further work. labels Jul 19, 2024
src/glossary.md Show resolved Hide resolved
@traviscross
Copy link
Contributor

In our lang-docs call, we discussed this PR. The PR does three things:

  • Adds a notion of an "ill-formed program".
  • Adds a notion of "no diagnostic required" as a kind of undefined behavior.
  • Defines the word "shall".

Taking these in order:

"ill-formed program": Our feeling is that we probably don't need to define an "ill-formed program" in this way, and that if we were to, we'd probably call it something else, such as the program being "invalid" or "not accepted".

"no diagnostic required": Our feeling is that we don't need to import this C++ definition. We've generally called this class of thing "undefined behavior" or maybe "compile-time undefined behavior" in Rust.

"shall": Our feeling is that there are one of two ways to go here. Either we import or normatively reference all the IETF RFC 2119 words, and start writing "SHALL" / "MAY" / "MUST" / etc., or that we write more normally, e.g. "the user must ... to avoid undefined behavior", "the compiler will ...", in which case it may be overkill to define the words at all.

Given that, our inclination is to close the PR as a matter of review, but as some of these things were discussed on a recent spec call, we wanted to give others the opportunity to weigh in here first.

cc @rust-lang/spec

@chorman0773
Copy link
Contributor Author

We've generally called this class of thing "undefined behavior" or maybe "compile-time undefined behavior" in Rust.

As a note, the reason I wanted to add the term is because "undefined behavior" is well understood to mean an execution result produced at runtime - that is, it is only has any effects on the execution if it would be eventually reached by that execution*. In particular, it is well-understood that for any construct has_ub!() with undefined behaviour, the program if false{ has_ub!()} println!("Foo"); is well defined to print "Foo".
To avoid confusion (that may lead to writing incorrect programs), I feel it's necessary to have a distinction between the terms so that it's clear when a construct is invalid to have in unreachable (or even statically unevaluated) code. "Compile-time undefined behaviour" also runs the risk of being confused with const-eval undefined behaviour, which I believe is likewise not permitted to produce arbitrary runtime results if the appropriate constant is not ever evaluated (though it is allowed to produce a compile-time diagnostic and fail translation).

*Or any other execution, for the same input, that results from any valid sequence of results from calls to specr pick from the appropriate Minirust.

@traviscross
Copy link
Contributor

traviscross commented Jul 31, 2024

Thanks @chorman0773 for that background. Agreed those are interesting and reasonable questions. And in particular, I agree about the potential of ambiguity with "compile-time UB" with respect to const-eval.

However, it's not clear to me how the NDR term proposed here addresses the case that you mention:

In particular, it is well-understood that for any construct has_ub!() with undefined behaviour, the program if false{ has_ub!()} println!("Foo"); is well defined to print "Foo".

To avoid confusion..., I feel it's necessary to have a distinction between the terms so that it's clear when a construct is invalid to have in unreachable (or even statically unevaluated) code.

As defined in this PR, if a program is invalid in the NDR sense, then the behavior of running that program is undefined. The case you mention seems to call for a term that instead denotes that the program source violates some validity rule or property even if the resulting program has defined behavior.

Perhaps we should instead just say directly what rule or property is being violated. E.g., we might say that the expression below is not well typed:

fn main() {
    if false {
        union U { x: f32, y: f64 }
        _ = unsafe { U { x: 42 }.y };
    }
}

@chorman0773
Copy link
Contributor Author

chorman0773 commented Jul 31, 2024

As defined in this PR, if a program is invalid in the NDR sense, then the behavior of running that program is undefined

We don't call it "undefined behavior" though. It's called out in a deliberately different way.

well-typed/ill-typed is also a misnomer, because the rules that would cause a program to be ill-formed (much less the ones that cause it to be Ill-formed; no diagnostic required) aren't just type checking rules, or even semantic validity rules. Would you call

static struct: i32 = 0;

Ill-typed?

The point of defining the terms is that we don't have to elaborate the definition every place its used. Same goes with shall (and I don't agree we need full RFC 2119 - most language specifications I've read don't use 2119. What even would be a "SHOULD" or "MAY" in Rust?).

@traviscross
Copy link
Contributor

The point of defining the terms is that we don't have to elaborate the definition every place its used.

The discussion here -- that we're struggling to converge on the purpose and meaning of these terms -- suggests to me that what's in the PR wouldn't yet fill that role.

r[requirement.error]

A Rust program is ill-formed if it violates a static constraint of the language.
Unless otherwise stated (See ["no diagnostic required"][requirement.ndr]), it is an error to have an ill-formed program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is an error to have an ill-formed program.

Should that be

compiling an ill-formed program results in an error diagnostic.

or something like that?

"it is an error" sounds like something the user isn't supposed to do, not necessarily like something that results in an error message.

@traviscross
Copy link
Contributor

traviscross commented Aug 1, 2024

As discussed above, and after discussing this on the spec call today, we're going to close this. If other glossary items come up, we can open separate PRs or issues for those, and that would help in terms of keeping discussion and review focused.

In general, we'd prefer to see the need for a new bit of jargon, in terms of seeing that we've repeated ourselves too often, before we go out of our way to introduce that jargon.

@traviscross traviscross closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-glossary Area: glossary T-spec Team: spec
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants