Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for module redesign. #2108

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions 0000-modules/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
- Feature Name: modules
- Start Date: 2017-08-07
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

This is a redesign of the Rust module system, intended to improve its
ergonomics, learnability, and locality of reasoning. Because this is a
relatively large proposal, it has been broken into multiple text files.

# Table of Contents

* **[Motivation][motivation]** - why we propose to make this change
* **[Overview][overview]** - a high level overview of what it will be like to
use the new system as a whole
* **Detailed design** - the details of the proposal, broken into multiple
sections:
* **[Loading Files][loading-files]**
* **[Paths][paths]**
* **[Visibility and Re-exporting][visibility]**
* **[Migration][migration]** - this proposal involves migrating from one
system to another, and this section describes it in detail.

Each of the detailed design subsections contains its own description of
drawbacks and alternatives.

[motivation]: motivation.md
[overview]: overview.md
[loading-files]: detailed-design/loading-files.md
[paths]: detailed-design/paths.md
[visibility]: detailed-design/visibility-and-reexport.md
[migration]: detailed-design/migration.md
194 changes: 194 additions & 0 deletions 0000-modules/detailed-design/loading-files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
# Loading Files

When building a Rust project, rustc will and parse some files as Rust code in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will and parse

addition to the root module. These will be used to construct a module tree. By
default, cargo will generate the list of files to load in this way for you,
though you can generate such a list yourself and specify it in your
`Cargo.toml`, or you can generate the list in another way for your non-cargo
build system.

This eliminates the need to write `mod` statements to add new files to your
project. Instead, files will be picked up automatically as a part of your
module hierarchy.

## Detailed design

### Processing the `--module` list (rustc)

rustc takes a new argument called `--module`. Each `--module` argument passes
the name of a file to treat as a module. rustc will attempt to open and parse
all of these files, and report any errors if it is unable to. It will mount
these files as a tree of Rust modules using rules which mirror the current
rules for looking up modules.

It will not attempt to open or parse files if the paths meet these conditions:

* The file name is not a valid Rust identifier followed by `.rs`.
* The file is not a subdirectory of the directory containing the root module.
* Any of the subdirectories of the root module in the path to this file are not
valid Rust identifiers.

(Cargo's default system will not pass any files that would be ignored by these
conditions, but if they are passed by some other system, they are ignored
regardless.)

Rust will mount files as modules using these rules:

* If a file is named `mod.rs`, it will mount it as the module for the name of
directory which contains it (the directory containing the root module cannot
contain a `mod.rs` file; this is an error).
* Otherwise, it will mount it at a module with the name of the file prior to
the `.rs`.

All modules mounted this way are visible to the entire crate, but are not (by
default) visible in the external API of this crate.

If, during parsing, a `mod` statement is encountered which would cause Rust to
load a file which was a part of the `--module` list, Rust does not
attempt to load that file. Instead, a warning is issued that that `mod`
statement is dead code. (See [migrations][migrations] for more info.)

If a module is mounted multiple times, or there are multiple possible files
which could define a module, that continues to be an error.

Another result of this design is that the naming convention becomes slightly
more flexible. Prior to this RFC, if a module file is going to have submodule
files, it must be located at `mod.rs` in the directory containing those
submodules - e.g. `src/foo/mod.rs`. As a result of this RFC, users can instead
locate it at `src/foo.rs`, but still have submodules in the `foo` directory.
Some users have requested this functionality because their tooling does not
easily support distinguishing files with the same name, such as all of their
`mod.rs` files.

In fact, in this design, it is not necessary to have a `foo.rs` or `foo/mod.rs`
in order to have modules in the `foo` directory. Without such a file, `foo`
will just have no items in it other than the automatically loaded submodules.
For example:

```
/foo
bar.rs
baz.rs
lib.rs
```

This mounts a submodule `foo` with two items in it: submodules `bar` and `baz`.
There is no compiler error.

### Gathering the `--module` list (cargo)

#### Library and binary crates

When building a crate, cargo will collect a list of paths to pass to rustc's
`--module` argument. It will only gather files for which the file name
has the form of a valid Rust identifier, followed by the characters `.rs`.

cargo will recursively walk the directory tree, gathering all appropriate
files, beginning with the directory which contains the crate root file. It will
ignore these files and directories:

* The crate root file itself.
* Any directory with a name which is not a valid Rust identifier.
* If the crate root is in the `src` subdirectory of the Cargo manifest
directory, and there is a directory called `src/bin`, cargo will ignore that
subdirectory.

In short, cargo will include all appropriately named files inside the directory
which contains the crate root, except that it will ignore the `src/bin`
directory.

Packages containing multiple crates which wish to use the default module list
will need to make sure that they do not have multiple crates rooted in the same
directory, or within a subdirectory of another crate. The most likely
problematic crates today are those which have both a `src/lib.rs` and a
`src/main.rs`. We recommend those crates move their binary crate to the
`src/bin` directory solution.

While gathering the default module list, cargo will determine if any other
crate is rooted in a directory which would be collected by the default module
list, and will instead not pass a `--module` list and issue a warning in
that case, informing users that they need to rearrange their crates or provide
a list of modules themselves.

(**Note:** These projects will receive a warning, but will not immediately be
broken, because the `mod` statements they already contain will continue to pick
up files.)

#### Tests, examples, and benchmarks

Test, example, and benchmark crates follow a different set of rules. If the
crate is located in the appropriate top-level directory (`tests`, `examples`,
and so on), no `--module` list will be collected by default. However,
subdirectories of these directories will be treated as individual binary
crates: a `main.rs` file will be treated as the root module, and all other
appropriately named files will be passed as `--module`, using the same
rules described above.

So if you have an examples directory like this:

```
examples/
foo.rs
bar/
main.rs
baz.rs
```

This contains two examples, a `foo` example and a `bar` example, and the `bar`
crate will have `baz.rs` as a submodule.

The reason for this is that today, cargo will treat every file in `tests`,
`examples`, and `benches` as independent crates, which is a well-motivated
design. Usually, these are small enough that a single file makes sense.
However, today, cargo does not make it particularly easy to have tests,
examples, or benchmarks that are multiple files. This design will create a
pattern to enable users to do this.

#### Providing your own module list

The target section of the `Cargo.toml` will gain a new item called `modules`.
This item is expected to be an array of strings, which are relative paths from
the cargo manifest directory to files containing Rust source code. If this item
is specified, cargo will canonicalize these paths and pass them to rustc as the
`--module` argument when building this target.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify-- if any module list is passed here, Cargo won't do any directory searching, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


## Drawbacks

The RFC authors believe that making mod statements unnecessary is a *net* win,
but we must acknowledge that it is not a *pure* win. There are several
advantages that mod statements bring which will not be fully replicated in the
new system.

Some workflows have been convenienced by the fact that statements need to be
added to the source code to add new modules to files. For example, it makes it
easier for users to leave their src directories a little bit dirty while
working, such as through an incomplete `git stash`. If users wish to comment
out a module, it can be easier to comment out the `mod` statement than to
comment out the module file. In general, it enables users to leave code which
would not compile in their src directory without explicitly commenting it out.

Some users have expressed strong concerns that by deriving the module structure
from the file system, without making additional syntactic statements, they will
not be able to as easily find the information they need to navigate and
comprehend the codebases they are reading or working on. To partly ease their
concern, the RFC allows users to explicitly specify their module lists at the
build layer, instead of the source layer. This has some disadvantages, in that
users may prefer to not have to open the build configuration either.

This will require migrating users away from `mod` statements toward the new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also make building "modern" Rust programs much harder without cargo - you can no longer hardcode the list of dependencies in a variable and just call rustc on lib.rs.

Also, might run against the command-line limit for huge crates, but I'm not sure how much of a concern is that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also make building "modern" Rust programs much harder without cargo - you can no longer hardcode the list of dependencies in a variable and just call rustc on lib.rs.

I'd be eager to get user reports from people actually not passing --extern arguments today. We can also support an analogous modules path env var.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least I often collect the --extern arguments in an env var and call rustc explicitly when debugging (that includes running rustc under gdb and running rustc under valgrind, which aren't easy from cargo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't expect normal users to be debugging rustc, and you can always copy-paste the arguments using cargo build --verbose, that's one annoyance.

Copy link
Contributor

@arielb1 arielb1 Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, running cargo build --verbose, editing the argument line, then rebuilding. This will get much much more annoying if argument lines contain a hundred files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like having the option to use a file or an environment variable to specify external dependencies may resolve these concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't cargo support a way to change how it invokes rustc? Why doesn't that work for running rustc under a debugger?

Because with gdb you have to run it as

$ gdb $RUSTC
gdb> set args ARG1 ARG2
gdb> run
$

Also, sometimes editing the command-line to remove some flags, which is sometimes needed when poking around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't gdb --args $RUSTC ARG1 ARG2 work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it does?

Copy link
Contributor

@gnzlbg gnzlbg Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielb1 If rustc would also allow you to pass a path to a file listing modules to include, would that appease your concerns?

This is how C++ modules work. There one has a modules.modulemap file, that maps files and sets of files to modules. The compiler can take many of these files, and will then include all modules listed in those files (but they can also include specific files directly).

Since I don't like having to include modules in the Cargo.toml file (I find this "unclean"), maybe we could adopt a similar approach to C++, where we have a canonical modules.modulesmap file in the crate directory (just like build.rs), and if it's there cargo detects it and uses it, and if not, cargo will generate one automatically for you from the filesystem structure in the temporary build directory [*]. That way, you would be able to copy-paste the command of cargo build --verbose, those wanting to specify the module file hierarchy differently can still write their own .modulemap files, and my OCD is happy as well because Cargo.toml remains clean.

[*] we can also allow specifying paths to multiple module map files in Cargo.toml, but experience with build.rs has shown that we probably should try if having a canonical location for it is enough. Since modules.modulemap files can specify paths to other modules.modulemap files, worst case users would need to add a dummy file that points somewhere else, and we don't need any Cargo.toml property for this at all.


EDIT: the module.modulemaps file in C++ allow also specifying things like item-level visibility and stuff like that. It might be worth to explore if allowing something like this in Rust could be used to remove boilerplate. As syntax we can use JSON or TOML to just specify a map of relative file paths to modules or modulemap files. But these things are details.

system.

## Alternatives

An alternative is to do nothing, and continue to use `mod` statements.

We could also put the file-lookup in rustc, instead of cargo, and have rustc
perform its own directory walk. We believe this would be a bad choice of
layering.

During the design process, we considered other, more radical redesigns, such as
making all files "inlined" into their directory and/or using the filename to
determine the visibility of a module. We've decided not to steps that are this
radical right now.

[migrations]: migrations.md
167 changes: 167 additions & 0 deletions 0000-modules/detailed-design/migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Migration

The changes proposed by this RFC are significant breaking changes to the
language. To complete them entirely we will need to pass a checkpoint/epoch
boundary as described in [RFC #2052][checkpoints]. However, we will begin with
a multi-phase deprecation process to ease the transition.

We will also introduce a tool that will not only automatically move a crate
onto a new system, but also *improve clarity* of the user's code by
intentionally drawing the distinction between the `export` and `pub`
visibilities that today does not firmly exist.

## Initial, compatible mode

In the first release supporting this RFC, we will begin supporting all of the
feature set of this RFC, without breakages to existing functionality. That is:

* `--extern` and `--module` arguments will be automounted, and cargo will pass
`--module` arguments in its expected way.
* The `crate::` path prefix will be accepted.
* The `export` visibility will be accepted.
* The "vis path" re-export syntax will be accepted.

However, existing semantics will not be broken, meaning:

* Module-internal paths will be accepted from the crate root without the
`crate::` prefix.
* `pub` will continue to have its current semantics if a crate does not ever
use the `export` visibility

In other words, as soon as this RFC is implemented, users will be able to
transition to the new system, but they will not be immediately *required* to.

The only breakage during this phase arises if users have files with valid Rust
module names inside of the directory than defines a crate, which is not a
compilable module of that crate. Users are encouraged to fix that by renaming
those files so that they will not be picked up by cargo, or else deleting them
if they are not necessary. **Note** that this does not include overlapping
crates which are both managed by cargo, as when cargo detects that it will
instead pass no `--module` arguments.

### Using `export` changes semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a way to annoy people during the transition period. Is there a reason we can't have:

export pub pub(crate)
today error global crate-local
checkpoint 1 global global (deprecated) crate-local
checkpoint 2 global crate-local crate-local (deprecated)
checkpoint 3 global crate-local error

The disadvantage is that in checkpoint 1 people will have to mark all of their pub imports as either pub(crate) or export, and they might be annoyed by pub(crate) being ugly. I think that's better than having pub mean export in all "old" crates and pub(crate) in all "new" crates, with no way to incrementally make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it might be nice to have the private-export lint only start in Ckp2, so that in Ckp1 you could just change all pub to export and not have to go over them one-by-one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@withoutboats and I talked about such a transition strategy, but:

  • It means that code must undergo multiple periods of churn (changing to pub(crate) and then to pub)
  • That's particularly painful for apps, where the distinction doesn't matter
  • It greatly prolongs the overall transition

On the other hand:

  • we can provide a rustfix to automatically convert to the new pub/export model, to make it one painless jump
  • I think in practice it will be highly obvious when you encounter code that is using the old pub semantics, and that with rustfix the transition can and will happen pretty quickly.

Another option, as @withoutboats mentioned, is to move to a different keyword altogether, like public. I for one would be sad to see pub go :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mind seeing pub go, but I do think its very valuable that if a stack overflow example happens to have pub in it, it will still compile - in cases like that (e.g. documentation), the semantic change here hopefully doesn't matter very often (that is, I hope you aren't copying your external API from old stack overflow answers 😉).

In other words, even though this "change semantics if we ever encounter extern" sounds like the most disruptive mechanism possible, when I played out the scenarios in my head, I found that it seemed like the least disruptive thing we could do.

Copy link
Member

@cramertj cramertj Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some existing precedent here with pub(restricted). Using pub(restricted) anywhere made the "public in private" warning a hard error (see here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a breaker, just annoying when you have a project of 20 crates, half of them use the old semantics, and half use the new semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's possible to use crate keyword (currently only used in extern crate) instead of pub. This would avoid issues caused by changing well established meaning of pub keyword. Say...

crate fn hello() {
    println!("Hello, world!");
}

crate struct Example {
    crate a: i32,
    b: i64,
}


One aspect of this RFC requires a bit more of a complex transition than simply
"add new features, then deprecate the old ones." That is the change in the
semantics of the `pub` keyword.

Under the current checkpoint/epoch, the `pub` keyword will continue to mean
exactly what it means prior to this RFC **unless** the crate under compilation
contains at least one `export` statement. If an `export` statement is found in
the crate, `pub` is interpreted to have its semantics defined under this RFC
(public to this crate).

## Deprecations

Deprecation will proceed in two phases:

1. First, we will deprecate code which is now superfluous as a result of this
RFC.
2. Then, at a later point, we will issue more opinionated deprecations, pushing
users to write code which is compatible with the future checkpoint transition.

### Phase 1 - deprecating dead code

#### Dead `extern crate`

We will issue deprecation warnings on extern crate statements which do not have
a `#[macro_use]` attribute. In the majority of cases, users will need to delete
these statements and take no other actions. In some cases, they will also need
to do something else as well:

* If they used `extern crate as` syntax, they will need to add an alias to
their dependency object in their Cargo.toml.
* If they mounted the extern crate somewhere other than the crate root, they
will need to use an appropriate import at that location.

#### Dead `mod`

We will issue deprecation warnings on mod declarations for which there is a
corresponding `--module` argument. For private or `pub(crate)` modules without
attributes, users will need to delete these statements and nothing else. In
other cases, they will need to do other things as well:

* If they used a visibility modifier on the module, they will need a re-export.
* If they used an attribute, they will need to move it inside the module.

Note that modules that do not have a `--module` argument are *not* considered
dead code, and continue to be used to load additional files. If cargo was not
able to generate a module list for a crate, and the user hasn't specified one,
mod statements will continue to be the way to find modules.

#### Dead `pub(crate)`

If the user is using `export` semantics, `pub` becomes equivalent to
`pub(crate)`. We will issue dead code warnings on the `(crate)` restriction if
the crate is being compiled under this RFC's semantics.

### Cargo errors when unable to generate the module list

When cargo recognizes that a crate contains the source of another inside of it,
it will not generate a module list. When it does so, it issues a warning,
encouraging users to do one of two things:

* Set their module list manually in the `Cargo.toml`.
* Rearrange their package so that their crates do not have overlapping source
directories.

### Phase 2 - opinionated deprecations

#### Deprecating all `extern crate` and `mod` statements

All `extern crate` and `mod` statements (without blocks) will be deprecating.

Users will need to be using cargo's default module loading or a manual module
list to avoid warnings.

### Deprecating crate paths not through `crate::`

Absolute paths to items in this crate which do not pass through `crate` will
receive a warning.

### Warning to move to `export` in libraries

When compiling a library, if still using the pre-`export` semantics, users will
receive a warning providing them documentation on the change to `export` and
encouraging them to make that transition.

#### Blockers on phase 2

In the second phase, all `extern crate` statements will be deprecated. Phase 2
as proposed by this RFC is, for this reason, blocked on deprecating the
`#[macro_use]` attribute on extern crates.

#### Staggering phase 2

Its possible that some of the phase 2 deprecations will be issuable sooner than
others. Its not necessary that phase 2 be initiated as a single unit, but
instead its deprecations could be introduced across multiple releases.

### Phase 3 - next checkpoint

Once phase 2 has been completely implemented, in the next checkpoint, all of
the phase 2 warnings will be made into hard errors.

## Tooling

Concurrent with the first release under this RFC, we will also release a tool
which will automatically transform projects to be consistent with this RFC (and
therefore work, without warnings, on both the current and future checkpoint).

The exact interface and distribution of this tool is left unspecified for the
implementation, but as a rough sketch, it would perform these transformations:

* If a package contains both a `lib.rs` and a `main.rs`, it will move the
binary crate into the `bin` directory.
* It will remove unnecessary `mod` and `extern crate` statements.
* It will transform `use` imports and absolute paths to include the `crate`
prefix when necessary.
* It will selectively replace `pub` with `extern` *only* when the type is
exposed in the actual external API of the crate. (For binaries, it will not
perform this change at all.) This way, the tool will automatically make code
more self-documenting.

Our aspiration is that most users will have access to this tool and be able to
use it, making this transition automatic for most users.

[checkpoints]: https://github.com/rust-lang/rfcs/pull/2052
Loading