Shell ideas

This morning i've been thinking about the design of a shell specifically

And i've been thinking a little more about the programming language aspect of it. This is going to take a lot of inspiration from nushell, but there are some things i want to be different as well.

Input/stdin

In nushell, as far as i can tell, function/command inputs and stdin are two different things. This can make some things confusing/annoying, in my opinion. A lot of builtins work on the concept of getting data piped in, not passed. So you can do [1,2,3] | first, and get 1, but first [1,2,3] is not allowed.

I think instead, something i'd like better is if functions did not directly specify taking something in stdin, and you'd just have a bunch of positional arguments (in addition to subcommands and flags). By default, you'd be able to omit the last positional argument and take it in from a pipeline. That would make the pipe operator basically work as function composition. Maybe it could just literally be function composition. I think it should still be a language builtin though, and not like purescript's $ or #, which are defined functions/operators. Custom operators are probably off the table anyway.

But yeah, in purescript, you can do [1,2,3] # head to get the first element of the list (haskell uses & instead of #). But you can also head [1,2,3] and it works exactly the same way (well, not exactly. # is a function so the type inference happens slightly differently).

There should probably also be a way to have stdin fill a different positional argument. In purescript you can do 5 # \x -> sub x 3, or 5 # (_ - 3) if using an operator. I'd probably want slightly more ergonomic syntax for this, like a placeholder that can also be used for normal command calls. Maybe like 5 | sub $ 3, where the $ is acting like a placeholder. However, this gets tricky when you want to run something on the input value before sending it to the function, while also having higher order functions. What does the $ mean here: someVal | someF (add $ 1)? Is (add $ 1) a function (lambda) that gets passed to someF, or does $ take on the value of someVal, and the result of the pipeline is someF applied to add someVal 1? I guess if you wanted something like this, you'd have two options: either include it as part of the pipeline like this: someVal | add $ 1 | someFun, or make a lambda: someVal | (\x -> someFun (add x 1))

Speaking of lambdas

I think it would be nice to be able to make a lambda without having to name the argument. It's nicely suited for quick one-time shell commands, for example if you want to loop over something: [1,2,3] | map { print "number: $" } (here, $ refers to the input to the lambda, which is surrounded by {}. and $ would get substituted for the input value in the string.) I think nushell doesn't quite have this? You can say $in to refer to stdin, but that's not the same as function input, so i think to get something similar to the last example, you'd have to do [1,2,3] | each {|n| print $"number: ($n)"} or something. There might be a way to do this without naming n, but i'm not sure.

So, if you wanted to have a $ that acts as the input, it'd need to be clear what it's the input of. Maybe {} could introduce a lambda, allowing you to use $ in its body to refer to the input of that lambda. If $ is used outside a lambda, it acts as the stdin of the program or of that segment of the pipeline. So someVal | someF (add $ 1) is equivalent to someF (add someVal 1). Oough, what about { someVal | someF (add $ 1) }? Does the $ scope to the lambda or the pipeline? Surely the pipeline is what makes the most sense, but it does make it a little bit tricky and maybe not that intuitive. This will need some more thinking about.

But what about streaming?

Streaming data is very important in shells. You can for example do find . | grep .nu to find all files with .nu in their name that are in or under the current directory. But it will print these as it finds them, because find streams its output to stdout, and grep streams its input and output, so that if something find outputs matches .nu, it will be printed right away, and you don't need to wait for the entire operation to finish.

My idea is for streams to just be a data type, so any input parameter can be a stream. Let's say you had an external program count that just prints 123 to stdout, with 1 second delay between each character. You could make a function interleave that takes in two streams and interleaves them character by character, outputting a stream as well. You could write count | interleave 321, and you'd get something like 31..22..13 on the output stream, where .. indicates one second of delay. interleave 321 (count) would be equivalent. (not sure if this parenthesis syntax would be best. but generally it's kinda hard to distinguish commands from arguments when you have raw strings, which are kinda necessary for a shell. note that () doesn't mean i'm running count in a subshell, but that in this case, since it's an external command, i end up spawning a process and attaching the pipe to the stream input of interleave's second argument. i think that's how this should/would work, anyway.) i guess 321 here acts as a string literal, which is automatically turned into a stream before being sent to interleave, or something like that.

With this in place, there's nothing in the way of streaming other things than strings. Wanna stream a list of numbers? A table? Sure, why not! I'm wondering how to denote this though. should some types just have streaming support (like string, list<a>, table), and you could have a readline that works on strings, and just grabs the next line that's available? or should there be a string-stream type, which is what readline works on? or maybe it should be stream<string>, but now it seems like you're streaming multiple strings, but actually you're just streaming one string. denoting it like stream<char> would be weird too, because you're not necessarily getting one char at a time.

Maybe string streams and list streams are fundamentally kinda different? With string streams, you can either read everything as it comes in and handle it then, or you can read every line that comes in or something. while list streams are kinda like iterators, you're probably just going to be consuming one item at once. Tables could be list streams as well. I think in nushell, tables are just lists of records. But what if it was slightly different, and a table stream contained all the column names upfront, and then you'd stream the rows.

I think the big question here is whether types like string and list<a> should support streaming by default, or if there should be dedicated string-stream and iterator<a> types for example. If they support streaming by default, that could lead to kinda weird behavior i think. Imagine you do this:

# let's say $str is a string
let line = readline $str
let my-str = $str + "hi"

What's the value of $my-str here? Does it include the line we read earlier, or not? Or maybe this is still streaming, so we'll just get hi appended onto the end of this stream. I couldn't actually think of any string operations that wouldn't be possible to do on a string stream other than like, sorting it in some way i guess. But in general, this leads to the problem that you don't really know whether your string is a stream anymore or not. It would be annoying to write a function, and then at one point, accidentally use a function that only works on non-streamed strings, so your whole output is no longer a stream. It would be nicer to have that be typechecked, but then have most string functions work on both streams and non-streams.

When defining a function, you could maybe denote whether you want a string-stream or just a string, and if it's the latter, but you get passed a stream, it just buffers the whole stream for you before calling your function? But that seems kinda annoying, because then if you don't really think about it, and declare your function as accepting a string, but you're actually just using streaming-compatible functions, you'll be buffering for no reason. So maybe it should just automatically stream for you when it can, and provide a function that just buffers the whole string, if that's what you want.

I guess it's kinda like Aff (the async effect monad) in purescript. There's no type information or anything that tells you whether a given action is async or sync. But internally i think it's two different constructors (one with a callback, and one which just runs the sync action directly). Similarly, with streams, you don't know whether your input is coming in chunks or if it's all in memory already. I still don't know if this is the right thing to do either.

Finally, external programs would have the type string-stream -> string-stream, representing stdin and stdout. There should probably be a special builtin that lets you run it while grabbing stderr too, maybe by making the type into string-stream -> {stdout: string-stream, stderr: string-stream}.

This morning i've been thinking about the design of a shell specifically

Input/stdin

Speaking of lambdas

But what about streaming?

Declaring and using variables

Should i have an optional type

Should i have untagged unions