This morning i've been thinking about the design of a shell specifically
And i've been thinking a little more about the programming language aspect of it. This is going to take a lot of inspiration from nushell, but there are some things i want to be different as well.
Input/stdin
In nushell, as far as i can tell, function/command inputs and stdin are two
different things. This can make some things confusing/annoying, in my opinion.
A lot of builtins work on the concept of getting data piped in, not passed. So
you can do [1,2,3] | first
, and get 1
, but first [1,2,3]
is not allowed.
I think instead, something i'd like better is if functions did not directly
specify taking something in stdin, and you'd just have a bunch of positional
arguments (in addition to subcommands and flags). By default, you'd be able to
omit the last positional argument and take it in from a pipeline. That would
make the pipe operator basically work as function composition. Maybe it could
just literally be function composition. I think it should still be a language
builtin though, and not like purescript's $
or #
, which are defined
functions/operators. Custom operators are probably off the table anyway.
But yeah, in purescript, you can do [1,2,3] # head
to get the first element of
the list (haskell uses &
instead of #
). But you can also head [1,2,3]
and
it works exactly the same way (well, not exactly. #
is a function so the type
inference happens
slightly
differently).
There should probably also be a way to have stdin fill a different positional
argument. In purescript you can do 5 # \x -> sub x 3
, or 5 # (_ - 3)
if
using an operator. I'd probably want slightly more ergonomic syntax for this,
like a placeholder that can also be used for normal command calls. Maybe like
5 | sub $ 3
, where the $
is acting like a placeholder. However, this gets
tricky when you want to run something on the input value before sending it to
the function, while also having higher order functions. What does the $
mean
here: someVal | someF (add $ 1)
? Is (add $ 1)
a function (lambda) that gets
passed to someF, or does $
take on the value of someVal
, and the result of
the pipeline is someF
applied to add someVal 1
? I guess if you wanted
something like this, you'd have two options: either include it as part of the
pipeline like this: someVal | add $ 1 | someFun
, or make a lambda:
someVal | (\x -> someFun (add x 1))
Speaking of lambdas
I think it would be nice to be able to make a lambda without having to name the
argument. It's nicely suited for quick one-time shell commands, for example if
you want to loop over something: [1,2,3] | map { print "number: $" }
(here,
$
refers to the input to the lambda, which is surrounded by {}
. and $
would get substituted for the input value in the string.) I think nushell
doesn't quite have this? You can say $in
to refer to stdin, but that's not the
same as function input, so i think to get something similar to the last example,
you'd have to do [1,2,3] | each {|n| print $"number: ($n)"}
or something.
There might be a way to do this without naming n
, but i'm not sure.
So, if you wanted to have a $
that acts as the input, it'd need to be clear
what it's the input of. Maybe {}
could introduce a lambda, allowing you to
use $
in its body to refer to the input of that lambda. If $
is used outside
a lambda, it acts as the stdin of the program or of that segment of the
pipeline. So someVal | someF (add $ 1)
is equivalent to
someF (add someVal 1)
. Oough, what about { someVal | someF (add $ 1) }
? Does
the $
scope to the lambda or the pipeline? Surely the pipeline is what makes
the most sense, but it does make it a little bit tricky and maybe not that
intuitive. This will need some more thinking about.
But what about streaming?
Streaming data is very important in shells. You can for example do
find . | grep .nu
to find all files with .nu
in their name that are in or
under the current directory. But it will print these as it finds them, because
find
streams its output to stdout, and grep
streams its input and output, so
that if something find
outputs matches .nu
, it will be printed right away,
and you don't need to wait for the entire operation to finish.
My idea is for streams to just be a data type, so any input parameter can be a
stream. Let's say you had an external program count
that just prints 123
to
stdout, with 1 second delay between each character. You could make a function
interleave
that takes in two streams and interleaves them character by
character, outputting a stream as well. You could write
count | interleave 321
, and you'd get something like 31..22..13
on the
output stream, where ..
indicates one second of delay.
interleave 321 (count)
would be equivalent. (not sure if this parenthesis
syntax would be best. but generally it's kinda hard to distinguish commands from
arguments when you have raw strings, which are kinda necessary for a shell. note
that ()
doesn't mean i'm running count
in a subshell, but that in
this case, since it's an external command, i end up spawning a process and
attaching the pipe to the stream input of interleave
's second argument. i
think that's how this should/would work, anyway.) i guess 321
here acts as a
string literal, which is automatically turned into a stream before being sent to
interleave
, or something like that.
With this in place, there's nothing in the way of streaming other things than
strings. Wanna stream a list of numbers? A table? Sure, why not! I'm wondering
how to denote this though. should some types just have streaming support (like
string
, list<a>
, table
), and you could have a readline
that works on
strings, and just grabs the next line that's available? or should there be a
string-stream
type, which is what readline
works on? or maybe it should be
stream<string>
, but now it seems like you're streaming multiple strings, but
actually you're just streaming one string. denoting it like stream<char>
would
be weird too, because you're not necessarily getting one char at a time.
Maybe string streams and list streams are fundamentally kinda different? With string streams, you can either read everything as it comes in and handle it then, or you can read every line that comes in or something. while list streams are kinda like iterators, you're probably just going to be consuming one item at once. Tables could be list streams as well. I think in nushell, tables are just lists of records. But what if it was slightly different, and a table stream contained all the column names upfront, and then you'd stream the rows.
I think the big question here is whether types like string
and list<a>
should support streaming by default, or if there should be dedicated
string-stream
and iterator<a>
types for example. If they support streaming
by default, that could lead to kinda weird behavior i think. Imagine you do
this:
# let's say $str is a string
let line = readline $str
let my-str = $str + "hi"
What's the value of $my-str
here? Does it include the line we read earlier, or
not? Or maybe this is still streaming, so we'll just get hi
appended onto the
end of this stream. I couldn't actually think of any string operations that
wouldn't be possible to do on a string stream other than like, sorting it in
some way i guess. But in general, this leads to the problem that you don't
really know whether your string is a stream anymore or not. It would be annoying
to write a function, and then at one point, accidentally use a function that
only works on non-streamed strings, so your whole output is no longer a stream.
It would be nicer to have that be typechecked, but then have most string
functions work on both streams and non-streams.
When defining a function, you could maybe denote whether you want a
string-stream
or just a string
, and if it's the latter, but you get passed a
stream, it just buffers the whole stream for you before calling your function?
But that seems kinda annoying, because then if you don't really think about it,
and declare your function as accepting a string, but you're actually just
using streaming-compatible functions, you'll be buffering for no reason. So
maybe it should just automatically stream for you when it can, and provide a
function that just buffers the whole string, if that's what you want.
I guess it's kinda like Aff
(the async effect monad) in purescript. There's
no type information or anything that tells you whether a given action is async
or sync. But internally i think it's two different constructors (one with a
callback, and one which just runs the sync action directly). Similarly, with
streams, you don't know whether your input is coming in chunks or if it's all
in memory already. I still don't know if this is the right thing to do either.
Finally, external programs would have the type string-stream -> string-stream
,
representing stdin and stdout. There should probably be a special builtin that
lets you run it while grabbing stderr too, maybe by making the type into
string-stream -> {stdout: string-stream, stderr: string-stream}
.