Introduce custom string literal syntax? #328

adam-antonik · 2020-01-27T12:20:08Z

Firstly, Hobbes is great, thanks.
Working with hobbes in an interactive environment, Unqualifiers are the natural way to load data, but typing

stuff = (readCsv :: (CSV "file name.csv" _) => _)

or whatever repeatedly feels like too much boilerplate. I'd like to suggest the ability to register custom literal prefixes to allow one to write

stuff = csv:"file name.csv"

and have that be interpreted as the first expression. Having hacked this in locally, it does seem to work rather nicely. It would also, for example. allow printf or python style format strings

:t printf:"%s %010d"
([char] * double) -> [char]
printf:"%s %010d"("hi", 123)
hi 0000000123

The text was updated successfully, but these errors were encountered:

kthielen · 2020-01-27T14:18:00Z

I like your idea, and I agree with you that this syntax is awkward. I think that most people want to be able to avoid talking about types if possible (hence the popularity of type inference), and in this case it definitely seems possible (especially with the _ _) => _ that reads like "blah blah blah"). Avoiding boilerplate is really important, and this particular pattern comes up a lot (loading structured data files and making connections to remote processes are also examples in common use).

I'm not sure if you chose the printf example deliberately for this purpose (it's a great example), but it's one of the common arguments for dependent types, like in Cayenne (see the example here):
https://en.wikipedia.org/wiki/Cayenne_(programming_language)

Basically, the syntax that you describe works well when the argument(s) are "clearly constant", and by making it a syntax distinguished from ordinary function calls you might avoid the issue of dependent types because you could justify having some kind of "illegal argument: not a constant" error message (or even a vanilla parse error).

When arguments aren't "clearly constant" as e.g. variables determined later then (if we were very principled about it) they would create a new "stage" in compilation. We stop talking about "parse-time" versus "type-inference-time" versus "compile-time" versus "run-time" and instead start talking about "N-time" where N is one of a (finite?) set of stages where we might need the full compiler (which is fine, this project is intentionally an embedded JIT compiler for this reason).

Does that make sense? So if you don't know the format string to printf up front but instead read it from a file, then you won't know what the type should be until you've got the file to read (which you might deliberately want to stage later than "compile-time").

Sorry if this is more nuance than you wanted to think about, and if your proposal is to support just a syntax for these macros on constant strings that's totally fine and maybe we should do that (can't hurt, it's an easy special case in a dependently-typed setting anyway).

But I wanted to give you an idea of some of the thinking I've done in this general area because I'm curious to know what you think of it. Dependent types can be a little "mind bending" for some people (at least, they have been for me) but there are a lot of useful things that you can do with them. Actually that's an understatement, you can get to a foundation for all mathematics this way (https://en.wikipedia.org/wiki/Homotopy_type_theory) but it's also an active area of research.

What do you think? Do we take the blue pill and have a special syntax accepting just string literals, or do we take the red pill? :)

adam-antonik · 2020-01-27T14:58:23Z

No, printf wasn't a co-incidence, I'm fully on board the dependent-type band-wagon, and I see from the code comments that is where you think this needs to go too. However, I don't know when they're going to land in hobbes, certainly I wasn't going to be able to make that happen in an evening. Adding an extra lexer token and a little logic to deal with it though does greatly improve the ergonomics from my point of view, and it seems like a reasonable price to pay in the mean-time.

kthielen · 2020-01-27T15:35:30Z

You said that you made a prototype. Is it something that could be made into a PR? What does it look like to introduce one of these macros? Is it something that gets hard-coded in the parser?

adam-antonik · 2020-01-27T17:59:56Z

I did say hack. I haven't got a full implementation for a pr, just a lexer rule introducing a new token

[a-zA-Z_][a-zA-Z0-9_]*/:\" { SAVE_STR;  return TSTRINGLIT; }

and in yacc under l6expr

| "stringlit" ":" "stringV" { if (Expr * decoded = strlit(*$1, *$3, m(@1, @3))) {$$ = decoded;} else {yyerror("unknown literal");};}

where strlit is a function that hard-codes a handful of prefix to either return something like

ConstraintPtr c(new Constraint("CSV", list(TString::make(val), TVar::make("q"))));
return new Assump(ExprPtr(varCtorFn("readCsv", la)), QualTypePtr(new QualType(list(c), TVar::make("q"))),la);

or a lovely nullptr if not there. I think it should be possible to replace this with a call to a single unqualifer of the form (literal :: (StringLiteral _) => _) upon which one can register maps from to a pair of (variable name, *Unqualifier), which would then forward its methods to the registered handlers, but I haven't got around to trying to do it properly yet.

kthielen · 2020-01-27T18:57:52Z

Fair enough, we could probably make that a little bit more convenient for users (to avoid hacking on the parser each time). I agree with you that this is worthwhile. If you want to work it up to a PR, that'd be awesome. If you'd rather not, I could take a look.

adam-antonik · 2020-01-27T19:45:04Z

I'd need to get sign-off for a PR, but I can have a look at that.

adam-antonik · 2020-01-27T21:12:35Z

BTW, the ww project looks very nice, although I'm still undecided as to whether future-me truly trusts past-me with an extensible grammar. On the other hand I do appreciate how hackable the hobbes code base is, for instance I've also redefined "if" as

class If c i e r | c i e -> r where
    (if) :: (c * i * e) -> r

Which wasn't difficult, but would have been cleaner with an extensible (and rebindable) grammar.

kthielen · 2020-01-28T00:34:13Z

Yes, that same uncertainty has made it hard to convince others that this way would be worthwhile (https://github.com/kthielen/ww for anyone reading this thread outside the context of the parser issue, closed now). Also the eventual prospect of IDE support is much more complicated that way (as far as I know, all common IDE tooling assumes a fixed grammar).

Your type class for if is interesting (I guess also complicated by eager evaluation forcing both branches, unlike the lazy setting in Haskell).

Out of curiosity, if you don't mind, did you set up this If c i e r constraint to allow i and e to convert into r? I've had the thought before that we could replace unification errors with conversion into a common type, to transparently handle this sort of thing (some PLs have limited support for this idea like e.g. C/C++). But I worried that this might make error messages even more inscrutable than they are already (several people have complained about the difficulty of deciphering error messages).

But there's also a lot of value in automated conversion especially for structural types. I set up the Convert class to represent this directed conversion. We have used it for automating message versioning decisions (as e.g. a sending process sending {x:int,y:int,z:int} to a process that expects {z:long,x:double} can have such a conversion function automatically derived). It's just a little annoying to have to explicitly write convert(x) to get this behavior where we'd otherwise rather just write x.

Anyway, just another dimension in PL design space I guess. My intuition is to make these things we've discussed optional but there are also folks who are more conservative.

adam-antonik · 2020-01-28T10:00:39Z

I wanted the If typeclass up to allow broadcasting of pointwise expressions over collections of points, but one could equally use it to allow conversions. I wouldn't recommend it at the moment as I haven't worked out how to get

if true then 1 else newPrim()

to work, for the compiler to notice that "If bool int b a" can be unified with "If bool a a a" for which an instance (the original if op) exists.

kthielen · 2020-01-28T14:35:50Z

It sounds like you've got an unbound type variable at the type inference step. In your example, since you've changed the meaning of "if" so that the two branches don't necessarily need to have the same type, then 1 isn't enough to determine the type of newPrim() (ie: the 1 is inferred to have type int but the newPrim() is inferred to have type a).

So you're in that state, and then you go into type class instance selection -- this is where we might get unstuck, since refine in constraints can make decisions for type inference. As implemented here, functional dependencies (fundeps) are the way that type class constraints will decide how to make decisions for type inference. Your fundep says c i e -> r which means "for unique types c, i, and e, there is one unique r". In your expression if true then 1 else newPrim() you do have unique c and i but not e and so this constraint can't make progress.

You might say "but I do have an instance If bool a a a that should say that r = e = i". But unfortunately that's actually not what the instance says (given your fundep). It actually says "If e = i then r = e (or equally r = i).

Does that make sense? Basically, it's just the fundeps that say how the class can advance type inference. Patterns in instance heads are "one way" (not unifying).

I'm not trying to say how it must be, just how it is. This is also a reason I made the Unqualifier interface distinct from type classes (ie: type classes are just one kind of Unqualifier). It's possible to define alternate ways to resolve constraints other than type classes.

FWIW, on the conversion point and how it might be related to what you're trying to do, maybe this helps:

$ cat q.hob
instance Convert a [:a|n:] where
    convert p = [p | _ <- newPrim()::r]::r
                                                                                                                                                                        
$ hi -s q.hob
> t = newPrim()::[:int|42:]
> selectInto([1..100], t)
> t
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]
> if true then 42 else t
stdin:1,1-2: Cannot unify types: int != [:int|42L:]
1 if true then 42 else t
> if true then convert(42) else t
[42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42]
>

Here the convert is explicit to make it work, but you can see that I just put it where the unification failure was. We might possibly say that rather than raising an error where there's a unification failure, we could insert a convert patch. If we did that, then this if true then 42 else t expression would work (given the Convert t [:t|n:] definition). But it'd require some experimentation/research, we wouldn't want to produce less comprehensible error messages than the ones we started with.

adam-antonik · 2020-05-24T18:44:44Z

I withdraw this proposal, I should be using quoted strings as arguments in my own unqualifiers to get something simple, see printf example in #362, (and and with that same PR I can write a function that takes a quoted string and wires up e.g. inputFile if I so wanted.)

kthielen · 2020-05-24T20:06:33Z

Let's keep it open as a use-case for that PR then, good idea. There's a small amount of boilerplate around things like inputFile and connection, and it'd be great to eliminate that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce custom string literal syntax? #328

Introduce custom string literal syntax? #328

adam-antonik commented Jan 27, 2020

kthielen commented Jan 27, 2020 •

edited

Loading

adam-antonik commented Jan 27, 2020

kthielen commented Jan 27, 2020

adam-antonik commented Jan 27, 2020

kthielen commented Jan 27, 2020

adam-antonik commented Jan 27, 2020

adam-antonik commented Jan 27, 2020

kthielen commented Jan 28, 2020

adam-antonik commented Jan 28, 2020

kthielen commented Jan 28, 2020

adam-antonik commented May 24, 2020

kthielen commented May 24, 2020

Introduce custom string literal syntax? #328

Introduce custom string literal syntax? #328

Comments

adam-antonik commented Jan 27, 2020

kthielen commented Jan 27, 2020 • edited Loading

adam-antonik commented Jan 27, 2020

kthielen commented Jan 27, 2020

adam-antonik commented Jan 27, 2020

kthielen commented Jan 27, 2020

adam-antonik commented Jan 27, 2020

adam-antonik commented Jan 27, 2020

kthielen commented Jan 28, 2020

adam-antonik commented Jan 28, 2020

kthielen commented Jan 28, 2020

adam-antonik commented May 24, 2020

kthielen commented May 24, 2020

kthielen commented Jan 27, 2020 •

edited

Loading