Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shebang support with flakes #5189

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions doc/manual/src/release-notes/rl-next.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,46 @@
# Release X.Y (202?-??-??)

* The experimental nix command is now a `#!-interpreter` by appending the
contents of any `#! nix` lines and the script's location to a single call.
Some examples:
```
#!/usr/bin/env nix
#! nix shell --file "<nixpkgs>" hello --command bash

hello | cowsay
```
or with flakes:
```
#!/usr/bin/env nix
#! nix shell nixpkgs#bash nixpkgs#hello nixpkgs#cowsay --command bash

hello | cowsay
```
or
```bash
#! /usr/bin/env nix
#! nix shell --impure --expr
#! nix "with (import (builtins.getFlake ''nixpkgs'') {}); terraform.withPlugins (plugins: [ plugins.openstack ])"
#! nix --command bash

terraform "$@"
```
or
```
#!/usr/bin/env nix
//! ```cargo
//! [dependencies]
//! time = "0.1.25"
//! ```
/*
#!nix shell nixpkgs#rustc nixpkgs#rust-script nixpkgs#cargo --command rust-script
*/
fn main() {
for argument in std::env::args().skip(1) {
println!("{}", argument);
};
println!("{}", std::env::var("HOME").expect(""));
println!("{}", time::now().rfc822z());
}
// vim: ft=rust
```
42 changes: 42 additions & 0 deletions src/libutil/args.cc
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
#include "args.hh"
#include "hash.hh"

#include <fstream>
#include <string>
#include <regex>
#include <glob.h>

#include <nlohmann/json.hpp>
Expand Down Expand Up @@ -62,6 +65,12 @@ static std::optional<std::string> needsCompletion(std::string_view s)
}

void Args::parseCmdline(const Strings & _cmdline)
{
// Default via 5.1.2.2.1 in C standard
Args::parseCmdline(_cmdline, false);
}

void Args::parseCmdline(const Strings & _cmdline, bool allowShebang)
{
Strings pendingArgs;
bool dashDash = false;
Expand All @@ -77,6 +86,39 @@ void Args::parseCmdline(const Strings & _cmdline)
}

bool argsSeen = false;

// Heuristic to see if we're invoked as a shebang script, namely,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is very similar to the code in nix-build.cc, it would be nice to factor out the commonality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires quite a lot of refactoring in nix-build.cc, which seems disproportionate. I'm not convinced that the resulting abstraction would be useful, and coupling to the very stable nix-build.cc may well be counterproductive.

That said, factoring out some static functions / private methods will probably make the code easier to follow.

// if we have at least one argument, it's the name of an
// executable file, and it starts with "#!".
Strings savedArgs;
if (allowShebang){
auto script = *cmdline.begin();
try {
std::ifstream stream(script);
char shebang[3]={0,0,0};
stream.get(shebang,3);
if (strncmp(shebang,"#!",2) == 0){
for (auto pos = std::next(cmdline.begin()); pos != cmdline.end();pos++)
savedArgs.push_back(*pos);
cmdline.clear();

std::string line;
std::getline(stream,line);
static const std::string commentChars("#/\\%@*-");
while (std::getline(stream,line) && !line.empty() && commentChars.find(line[0]) != std::string::npos){
line = chomp(line);

std::smatch match;
if (std::regex_match(line, match, std::regex("^#!\\s*nix\\s(.*)$")))
for (const auto & word : shellwords(match[1].str()))
cmdline.push_back(word);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberth i considered an “else, break out of scanning” condition here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about doing a regex check for ^[^#\/\\\*-] and breaking out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be a bit aggressive.
The Rust ones here are already quite interesting https://nixos.wiki/wiki/Nix-shell_shebang
I think I've seen a stack run + nix-shell one once, which was pretty weird.
Your suggestion does happen to work, but I can't help feeling that it's by coincidence.
Perhaps breaking out on the first empty line would be sensible enough? I think those only tend to occur after the shebang lines. I could be wrong, and it'd be an easy mistake to add a newline between two tools' shebang stuff. I guess Nix would be the first one, so it all works out?

Worst case we can add extra instructions to let the script configure shebang scanning later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation seems to strike a good balance. We may learn more in the experimental phase.
Resolving this thread.

}
cmdline.push_back(script);
for (auto pos = savedArgs.begin(); pos != savedArgs.end();pos++)
cmdline.push_back(*pos);
}
} catch (SysError &) { }
}
for (auto pos = cmdline.begin(); pos != cmdline.end(); ) {

auto arg = *pos;
Expand Down
8 changes: 5 additions & 3 deletions src/libutil/args.hh
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ public:
*/
void parseCmdline(const Strings & cmdline);

/**
* Return a short one-line description of the command.
*/
/* Parse the command line with argv0, throwing a UsageError if something
goes wrong. */
void parseCmdline(const Strings & _cmdline, bool allowShebang);

/* Return a short one-line description of the command. */
virtual std::string description() { return ""; }

virtual bool forceImpureByDefault() { return false; }
Expand Down
44 changes: 44 additions & 0 deletions src/libutil/util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
#include <future>
#include <iostream>
#include <mutex>
#include <regex>
#include <sstream>
#include <thread>

Expand Down Expand Up @@ -1459,6 +1460,49 @@ std::string shellEscape(const std::string_view s)
return r;
}

/* Recreate the effect of the perl shellwords function, breaking up a
* string into arguments like a shell word, including escapes
*/
std::vector<std::string> shellwords(const std::string & s)
{
std::regex whitespace("^(\\s+).*");
auto begin = s.cbegin();
std::vector<std::string> res;
std::string cur;
enum state {
sBegin,
sQuote
};
state st = sBegin;
auto it = begin;
for (; it != s.cend(); ++it) {
if (st == sBegin) {
std::smatch match;
if (regex_search(it, s.cend(), match, whitespace)) {
cur.append(begin, it);
res.push_back(cur);
cur.clear();
it = match[1].second;
begin = it;
}
}
switch (*it) {
case '"':
cur.append(begin, it);
begin = it + 1;
st = st == sBegin ? sQuote : sBegin;
break;
case '\\':
/* perl shellwords mostly just treats the next char as part of the string with no special processing */
cur.append(begin, it);
begin = ++it;
break;
Comment on lines +1490 to +1499
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to comment on moved code, but I think shellwords isn't quite appropriate for this use case. I think we should fork it to be more predictable and future proof.
By restricting the syntax, we can be more compatible with the shell (in the shebang -> shell direction only of course; wouldn't want to reimplement a shell - in the limit). Enforcing this is perhaps convenient but most of all helps with understanding. Furthermore each syntax we forbid is trivially forward compatible.
No need to go overboard with this, so what I've tried to narrow it down to are the following changes based on shellwords:

  • In the unquoted state, reject single quotes. We don't need them and putting them in the parsed string would confuse readers.
  • Similarly, we should reject/reserve at least these chars in the unquoted state: \, $, `, <, >; perhaps also *.
  • Fail for lines that have an unterminated quoted string.
  • Use single quotes, because we won't be doing variable substitution (certainly for now) reject unescaped $ in double-quoted strings?

The backtick, <, and unterminated quoted string cases are particularly interesting, because they could serve as extension points for multiline strings, which I think would be very useful, but whose implementation could be scoped out of this PR.
OT?: I think double backticks would make a nice analog for nixlang's double single quote strings, except they'd be entirely verbatim.

These changes aren't compatible with nix-shell, so this would be new code and the move can be reverted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberth This is still pending.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trying to conform with existing syntaxes and conventions is a far harder problem than coming up with a useful syntax that works well for our use case.
New proposal:

  • Use double backtick quotes as the only support type of quotation
  • Disallow any shell-like syntax outside the quoted strings, and be eager to reject syntax in general

This way we keep the code fairly simple, leaving reserved syntax for possible later extension, leaving our options open, while solving "our own use case" well: postponing good inline expression support seems like a mistake the more I think about it.

I'll have a go at implementing this.

}
}
cur.append(begin, it);
if (!cur.empty()) res.push_back(cur);
return res;
}

void ignoreException(Verbosity lvl)
{
Expand Down
11 changes: 7 additions & 4 deletions src/libutil/util.hh
Original file line number Diff line number Diff line change
Expand Up @@ -694,10 +694,13 @@ std::string toLower(const std::string & s);
std::string shellEscape(const std::string_view s);


/**
* Exception handling in destructors: print an error message, then
* ignore the exception.
*/
/* Recreate the effect of the perl shellwords function, breaking up a
string into arguments like a shell word, including escapes. */
std::vector<std::string> shellwords(const std::string & s);


/* Exception handling in destructors: print an error message, then
ignore the exception. */
void ignoreException(Verbosity lvl = lvlError);


Expand Down
44 changes: 0 additions & 44 deletions src/nix-build/nix-build.cc
Original file line number Diff line number Diff line change
Expand Up @@ -29,50 +29,6 @@ using namespace std::string_literals;

extern char * * environ __attribute__((weak));

/* Recreate the effect of the perl shellwords function, breaking up a
* string into arguments like a shell word, including escapes
*/
static std::vector<std::string> shellwords(const std::string & s)
{
std::regex whitespace("^(\\s+).*");
auto begin = s.cbegin();
std::vector<std::string> res;
std::string cur;
enum state {
sBegin,
sQuote
};
state st = sBegin;
auto it = begin;
for (; it != s.cend(); ++it) {
if (st == sBegin) {
std::smatch match;
if (regex_search(it, s.cend(), match, whitespace)) {
cur.append(begin, it);
res.push_back(cur);
cur.clear();
it = match[1].second;
begin = it;
}
}
switch (*it) {
case '"':
cur.append(begin, it);
begin = it + 1;
st = st == sBegin ? sQuote : sBegin;
break;
case '\\':
/* perl shellwords mostly just treats the next char as part of the string with no special processing */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-topic: did we stop taking perl as an example?

cur.append(begin, it);
begin = ++it;
break;
}
}
cur.append(begin, it);
if (!cur.empty()) res.push_back(cur);
return res;
}

static void main_nix_build(int argc, char * * argv)
{
auto dryRun = false;
Expand Down
5 changes: 4 additions & 1 deletion src/nix/main.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include <ifaddrs.h>
#include <netdb.h>
#include <netinet/in.h>
#include <regex>

#include <nlohmann/json.hpp>

Expand Down Expand Up @@ -397,7 +398,9 @@ void mainWrapped(int argc, char * * argv)
});

try {
args.parseCmdline(argvToStrings(argc, argv));
auto isNixCommand = std::regex_search(programName, std::regex("nix$"));
auto allowShebang = isNixCommand && argc > 1;
args.parseCmdline(argvToStrings(argc, argv),allowShebang);
} catch (UsageError &) {
if (!args.helpRequested && !completions) throw;
}
Expand Down
114 changes: 114 additions & 0 deletions src/nix/shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,118 @@ R""(
provides the specified [*installables*](./nix.md#installable). If no command is specified, it starts the
default shell of your user account specified by `$SHELL`.

# Use as a `#!`-interpreter

You can use `nix` as a script interpreter to allow scripts written
in arbitrary languages to obtain their own dependencies via Nix. This is
done by starting the script with the following lines:

```bash
#! /usr/bin/env nix
#! nix shell installables --command real-interpreter
```

where *real-interpreter* is the “real” script interpreter that will be
invoked by `nix shell` after it has obtained the dependencies and
initialised the environment, and *installables* are the attribute names of
the dependencies in Nixpkgs.

The lines starting with `#! nix` specify options (see above). Note that you
cannot write `#! /usr/bin/env nix shell -i ...` because many operating systems
only allow one argument in `#!` lines.

For example, here is a Python script that depends on Python and the
`prettytable` package:

```python
#! /usr/bin/env nix
#! nix shell github:tomberek/-#python3With.prettytable --command python

import prettytable

# Print a simple table.
t = prettytable.PrettyTable(["N", "N^2"])
for n in range(1, 10): t.add_row([n, n * n])
print t
```

Similarly, the following is a Perl script that specifies that it
requires Perl and the `HTML::TokeParser::Simple` and `LWP` packages:

```perl
#! /usr/bin/env nix
#! nix shell github:tomberek/-#perlWith.HTMLTokeParserSimple.LWP --command perl -x

use HTML::TokeParser::Simple;

# Fetch nixos.org and print all hrefs.
my $p = HTML::TokeParser::Simple->new(url => 'http://nixos.org/');

while (my $token = $p->get_tag("a")) {
my $href = $token->get_attr("href");
print "$href\n" if $href;
}
```

Sometimes you need to pass a simple Nix expression to customize a
package like Terraform:

```bash
#! /usr/bin/env nix
#! nix shell --impure --expr
#! nix "with (import (builtins.getFlake ''nixpkgs'') {}); terraform.withPlugins (plugins: [ plugins.openstack ])"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#! nix "with (import (builtins.getFlake ''nixpkgs'') {}); terraform.withPlugins (plugins: [ plugins.openstack ])"
#! nix 'with (import (builtins.getFlake "nixpkgs") {}); terraform.withPlugins (plugins: [ plugins.openstack ])'

This will then change a bit.

#! nix --command bash

terraform "$@"
```

> **Note**
>
> You must use double quotes (`"`) when passing a simple Nix expression
> in a nix shell shebang.

Finally, using the merging of multiple nix shell shebangs the following
Haskell script uses a specific branch of Nixpkgs/NixOS (the 21.11 stable
branch):

```haskell
#!/usr/bin/env nix
#!nix shell --override-input nixpkgs github:NixOS/nixpkgs/nixos-21.11
#!nix github:tomberek/-#haskellWith.download-curl.tagsoup --command runghc

import Network.Curl.Download
import Text.HTML.TagSoup
import Data.Either
import Data.ByteString.Char8 (unpack)

-- Fetch nixos.org and print all hrefs.
main = do
resp <- openURI "https://nixos.org/"
let tags = filter (isTagOpenName "a") $ parseTags $ unpack $ fromRight undefined resp
let tags' = map (fromAttrib "href") tags
mapM_ putStrLn $ filter (/= "") tags'
```

If you want to be even more precise, you can specify a specific revision
of Nixpkgs:

#!nix shell --override-input nixpkgs github:NixOS/nixpkgs/eabc38219184cc3e04a974fe31857d8e0eac098d

You can also use a Nix expression to build your own dependencies. For example,
the Python example could have been written as:

```python
#! /usr/bin/env nix
#! nix shell --impure --file deps.nix -i python
```

where the file `deps.nix` in the same directory as the `#!`-script
contains:

```nix
with import <nixpkgs> {};
python3.withPackages (ps: with ps; [ prettytable ])
```


)""
Loading