Skip to content
This repository has been archived by the owner on Mar 21, 2019. It is now read-only.

Networked Expressions #12

Open
CMCDragonkai opened this issue Apr 15, 2018 · 7 comments
Open

Networked Expressions #12

CMCDragonkai opened this issue Apr 15, 2018 · 7 comments
Assignees
Labels
research Requires research

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 15, 2018

Architect expressions are addressable from any other expression. This avoids the need to have a package manager for Architect expressions.

Dhall

This idea derives from the Dhall language for allowing the ability to import expressions from IPFS paths.

For example in Dhall. The Prelude module is available at: https://ipfs.io/ipfs/QmdtKd5Q7tebdo6rXfZed4kN6DXmErRQHJ4PsNCtca9GbB/Prelude

If you want to acquire the not function, you just have to call: https://ipfs.io/ipfs/QmdtKd5Q7tebdo6rXfZed4kN6DXmErRQHJ4PsNCtca9GbB/Prelude/Bool/not

If you check the contents, it's just an expression:

let not : Bool  Bool
    =   λ(b : Bool)  b == False
in  not

In your own Dhall file, you can later write:

let Bool/not = https://ipfs.io/ipfs/QmdtKd5Q7tebdo6rXfZed4kN6DXmErRQHJ4PsNCtca9GbB/Prelude/Bool/not
in  Bool/not True

We want to take this idea further such that every top-level definition can be addressed by other expressions. To prevent remote code execution, addresses must content addresses.

Content Addressed Source Code

The concept of making things content addressed is not new. For example see: https://joearms.github.io/published/2015-03-12-The_web_of_names.html

But Architect would be the first truly content addressed programming language.

So what do we mean by addressable?

Well consider the Automaton delcaration:

A = Automaton {
  protocol = Protocol [
    -- ACTIONS
  ]
}

This expression may live in a file on your filesystem.

Usually programming languages allow composition of different files together. Sometimes those files are considered modules in that language, in other languages those files are just text that gets "transcluded" https://en.wikipedia.org/wiki/Transclusion

Let's talk about some of the disadvantages of these approaches.

In the case where the files are modules. Composition of files tends to be really restrictive. For example in Haskell, one can only include other modules at the top of the File. It is impossible to include modules later in the file. It is impossible to overwrite definitions that have already been defined. Note that some of these issues are worked around in the GHCi system.

In the case where files are just transcluded, this is a very primitive system. There's no way to track data (and function) provenance (https://en.wikipedia.org/wiki/Data_lineage). It is very easy to introduce all sorts of bugs, and there's no way to explicitly manage namespaces, as everything is just brought into scope. In imperative languages it can be quite unsafe with malicious code execution that can occur on the import.

The Nix system is slightly different. It makes imports first class. Rather than special casing imported expressions as a special module that can only be used in a specific place. They can be used anywhere an expression is expected. Of course this works mainly because Nix expressions results always in data structures. For example let x = 3 in x always results in a value 3. You can also just return a function which is also a first class value. And if you want to return a set of functions, then they would have to be wrapped into an attribute set.

In Nix one can do things like:

let x = (import ./some-function.nix) 4; in x

Here's another example but using a networked path. The fetchTarball upon evaluation requires downloading the tarball into a fixed store path, but the function also returns the path, which is then used by the import. It's important to realise that while Nix is pure, the intepretation of the functions may result in side effects in the /nix/store. However these side-effects don't really affect the functioning of this language. One could easily imagine that fetchTarball instead returns an in-memory data structure, rather than storing things on the disk directory.

let pkgs = import (fetchTarball https://github.com/NixOS/nixpkgs-channels/archive/00e56fbbee06088bf3bf82169032f5f5778588b7.tar.gz) {} in pkgs

Files are just pointers to some well-formed expression. It's simple and works interactively. It is still modular, and we are able to track data provenance, and we are still able to acquire namespacing, and explicitly shadow and overwrite past expressions bindings.

Let's extend this idea so we can use content addressed paths. In Nix this is apparently called fixed output derivation. Basically it is possible to do:

let src = fetchurl {
  url = https://path/to/something;
  sha256 = "...";
}; in src

It is important to be able to do this because otherwise we may suffer from file inclusion vulnerability: https://en.wikipedia.org/wiki/File_inclusion_vulnerability The hash ensures crytographically that we are getting the content that we originally expected.

The problem with this is that it's still quite cumbersome. There are multiple steps required to acquire the hash of something you want. And the syntax is too verbose for a simple expression import.

The other issue is that the granularity of imports. Right now this sort of inclusion system includes files only. To get something within the file, your module system has to expose addressable definitions. This is easily achieved with attributes sets. Is it possible instead to have a path that goes straight to addressable expression?

Compare Nix to Dhall:

let x = https://ipfs.io/ipfs/QmdtKd5Q7tebdo6rXfZed4kN6DXmErRQHJ4PsNCtca9GbB/Prelude/Bool/not
in x True
let x = (import ./path/to/Prelude/Bool).not; in x True

Ignoring just the fact that one is using a networked path and another is a local file path, the key difference is that addressing the not function is done directly within the networked path, while the Nix expression requires importing the file, and then parsing the file into an expression object, and then accessing the property of the object.

However that's just the surface syntax. What really happened is that IPFS allows paths to just "directories", and the Dhall inventor just created a file called not at Prelude/Bool directory.

I call this the file path expression path impedance mismatch (https://en.wikipedia.org/wiki/Object-relational_impedance_mismatch). The reason I call it this, is because just like translating SQL into Objects, what we are really trying to do is to import expressions, but we have to mediate our import operations through something designed to deal with files and directories.

So what am I proposing?

I'm proposing that we understand our Architect source code not as just a file of text. But a structured document (syntax-tree) that allows deep structured access (introspection and reflection) in a first class manner within the language. Doing this not only will help make our import system elegant, but would help later IDE efforts. See "structure editors" https://en.wikipedia.org/wiki/Structure_editor for more information about this. Or these other links:

So when we write our expressions, the A that is assigned there is just a namespace local alias. If you were to introspect what A really is. It should be a content hash of its contents.

A = Automaton {
  protocol = Protocol [
    -- ACTIONS
  ]
}

But how do you find what that content hash is? The interpreter tells you. Once you feed the above expression into the system. You can ask what is A. Then the interpreter may tell you sha256:abc1234.... One can think of A as a local alias, but the hash itself as a global name.

Subsequent expressions can refer to the hash specifically.

let A = sha256:abc1234...
a = deploy A

We will probably need some sort of special syntax that says that this is a "global name" for an Architect expression. We could use a special constructor. Also we could also make use of the IPFS Multihash project to allow our hashes to be changed in the future.

Implementation

To implement this we probably have to integrate IPFS into the Architect language. Such that every top level expression can represent an addressable object. This may mean our expressions aren't really IPFS URIs, but we can the idea.

Another important idea is that how the P2P network of hashes is actually formed. It is formed through the Matrix Node network.

Your localhost is the first node. After that, each subsequent Node is also part of the network. If we imagine the single operator, their laptop is the first node. If you want to acquire expressions from other members of your company, then those other members will need to enter into the same Matrix node network. This ensures that you get a P2P sharing of code, but you can still keep your own code private or have company-local Architect expressions.

For global sharing across all Matrix customers, one can imagine that the your Node can bridge into a global namespace. And acquire expressions from a centralised published source. Or we can take ideas from Usenet and have distributed news feed (that is replicated on the web) to allow everybody to learn of new Architect expressions and their associated artifacts. One must be aware that just because you can acquire the expression, you may not have access to the Artifact.

One can also questions about where the state is stored, and if it's possible to lose Architect expressions. State is stored on whichever node that requires it. It is possible to lose if it nobody pins it. These same problems exist in IPFS. You can have Nodes of last resort to pin these expressions.

@CMCDragonkai
Copy link
Member Author

Another key advantage is to allow single universal & portable address whether you are using it from the local machine, or fetching it remotely. This avoids the problem with centralised package managers, where the address to fetch is always a remote address and it doesn't work when the centralised package manager is down, or when your packages are not on the package manager, and you want to acquire it from the local filesystem, but you don't want to use absolute/relative file paths because it's not portable.

@CMCDragonkai
Copy link
Member Author

#8 (comment)

@CMCDragonkai
Copy link
Member Author

The ability to refer to any expression over a content addressed network also requires us to resolve the filesystem to network impedance mismatch.

We want to avoid a smalltalk situation where you have to enter all code into a single system image.

Instead Architect expressions should still be edited as text files, this makes it more portable and amenable to existing file editing conventions.

However this means we need some way of mapping Architect files into the Architect interpreter/orchestrator which understands Architect expressions as content addressed resources.

Here are some possibilities:

@CMCDragonkai
Copy link
Member Author

Context for understanding why having networked expressions:

@CMCDragonkai
Copy link
Member Author

Getting past the impedance mismatch between filesystem domain and content addressed network domain require some isomorphic mapping. (For discussions on the domain of discourse, see the issues with navigating monorepos vs multirepos).

Imagine being able to parse expression files into a cyclic tree. Circular imports are ok as long as there isn't infinite recursive evaluation. This is ok when the language is lazy (laziness is always possible through lambda abstraction).

Basically while the filesystem represents a hierarchy of files (disregarding hardlinks for portability reasons) where each file represents a unit of expression, the networked representation instead has a flat domain of expressions each addressable by another. To map between these 2 this would be the interpreter has multiple points of entry. It can "import" and is made aware of files at multiple file paths and maintains a link between the file and an in-memory handle to an an expression in the expression network.

Changes have to be isomorphically synchronised. This requires maintaining syntax form of the files themselves. Thus the order index tree may be useful here.

@CMCDragonkai
Copy link
Member Author

Nix converts open expressions (expressions with import ./a.nix or reference to impure environment variables in $NIX_PATH with <nixpkgs>) into closed expression ".drv" files in ATerm format. Before performing the actual build. That has implications for "staging" of determinism.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Aug 11, 2018

This has a relationship with "multistaged evaluation" and at what point purity is enforced. Basically how are parameters substituted, at what stage they are substituted, and whether old parameters can be overwritten with new parameters, and whether substitution involves IO at an earlier stage:

In one way, multistaging is about "code generation", in another way multistaging is about partial evaluation.

@CMCDragonkai CMCDragonkai added the research Requires research label Aug 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
research Requires research
Development

No branches or pull requests

1 participant