When writing a shell, you're going to need to deal with both external and
internal/builtin commands. Calling external commands feels like it's kinda the
whole point of the shell to begin with, and internal commands are needed for
stuff like cd
which modify the shell state, or maybe some other things that
are just practical to be able to do without spawning a whole new process.
External commands work in a certain way. They take in ARGV
(list of strings),
and then they take in stdin
(byte stream). They output to stdout
and
stderr
(also byte streams), and have exit codes (8bit unsigned int?). (They
also accept/respond to signals, i think?) Shells must respect this in order to
work with external commands.
Internal commands, however, are unrestricted, and it's completely up to the shell how they're handled, and how they work. Most shells go for an approach where internal commands work like external ones. In some cases you might not even know whether a given command you use a lot is a shell builtin or an external command.
How different shells do this
Disclaimer: I'm not an expert on these shells, i might be wrong on some things. Please let me know if something's wrong or if you have more info.
POSIX/bash/zsh/etc.
Builtins act and are called the same way as externals.
Nushell
Same as above, but it's a little more interesting. Nushell has more interesting
typed data than just strings, and provides a lot of builtins to work with this
data. These builtins can be defined with positional arguments and flags of any
type, which additionally take in a type (or nothing
) on stdin, and give a type
(or nothing
) on stdout. A single function can also define multiple
stdin -> stdout
pairs, so you can have a function that returns a string if it
gets a string, and returns a list<string>
if it gets a list<string>
, and so
on. External commands fit into this by acting like functions that take in any
number of positional arguments and flags, and take a string
on stdin and gives
a string
on stdout. I don't know how stderr fits into this. I'm sure you can
capture the stderr of an external command, but i'm not sure if you can define
your own function that emits stderr in the same way.
Due to this, and due to nushell being very FP-focused, it leads to the following (unfortunate, imo) result:
- You will often have pipelines like
a | b | c
, where most or all of the functions used are builtins - This both looks, feels, and acts a lot like a chain of partially applied
functions, like you would find in haskell for example:
a & b & c
(same asc (b a)
) - But there's a crucial difference! nushell doesn't have currying or partial
application. For a function to accept an argument from a pipeline, it needs to
be defined specifically to take pipeline input. This is how most functions in
nushell are defined. They take their "main" input as stdin, and don't accept
it as a positional argument. So you for example can't do
math max [1 2 3]
, you have to do[1 2 3] | math max
.
YSH (/oils?)
YSH gets a bit more creative with it. External commands are proc
s, which can
be used in the "command language". You can also define your own procs. Though
you'll be doing most of your data manipulation in the "expression language",
where you have func
s. These work more like python functions, and can accept
and return data of different types. There are builtin ones, and you can define
your own. This lets functions be pretty normal (as in, similar to something
you'd find in python), while still having full support for normal external
process invocations too. I think this works pretty well, since procs and funcs
appear in completely separate parts of the syntax, so you always know which
you're dealing with. Although it is a bit unfortunate to have to deal with and
think about two different types of functions/commands/processes that do similar
things, but are actually very different.
Links:
- Oils page about procs and funcs
- Oils page about interior and exterior (i don't fully understand the distinction yet)
But i want to have my cake and eat it too
Okay, so: I want external commands to work like any other function in the language, while still being able to do all the normal shell things, and also allowing to partially apply and chain functions in a pipeline. Let's make that a little more specific. I want the following to be possible:
extA sub --flag=1 | extB -f4 | extC a -v c
: External commands with parameters chain in the expected way. ifextA
reads stdin, it will be given it, and its stdout will be passed to the stdin ofextB
and so on.- If you have a function
add1
that adds 1 to a number, you should be able to do bothadd1 5
and5 | add1
, they should mean the same thing let x = extA param1 param2
should callextA
, give it nostdin
, and save itsstdout
tox
. It shouldn't save a function that takes instdin
as a parameter.
This is how i think this can be done:
- We have normal haskell-style curried functions, but with an important
addition: You can have "rest"/"spread" arguments anywhere you want. A
normal example would be
f :: A -> ...B -> C
, which you could call likef a b b b
to get a result of typeC
. But, you could also haveg :: ...A -> ...B -> C -> D
, which you could call like((g a) b b) c
. So, you have to delimit each "set" of rest arguments to show that you're done with that run of arguments. This gets a little awkward if you want to pass none, like, i guess you have to say((g)) c
or something. Maybe there should be another way to signify that this run of arguments is "done". But, it also needs to be able to happen implicitly, because: - The type of an external command is
...String -> ByteStream -> ByteStream
. It takes any number of string arguments (argv), then an stdin, and returns an stdout. For capturing stderr and exit codes, the return type should probably be slightly different, but it gets automatically "reduded" to just the stdout stream if it's being piped somewhere else. - When let-binding an expression, it gets automatically passed an empty stdin,
if it expects one.
let files = ls
should fill 0...String
arguments, so that we're left withByteStream -> ByteStream
. Then, since it's being bound, something kicks in, sees that theByteStream
parameter can be satisfied with an empty stream (or binding it to/dev/null
? however that works), and gives theByteStream
output. - There should probably be something you can put around a type (like
Nullable
for example), that signifies that it can be "coerced away". So instead of taking in aByteStream
, externals should probably take in aNullable ByteStream
, andlet
would "coerce" it away. - So, it seems
|
should have the typeforall a b c. (a -> b) -> (b -> c) -> (a -> c)
. It takes in two functions, and returns a function where the output of the first is passed to the input of the second. This should work fine with everything that's been discussed so far. If you didlet x = a | b | c
, (where the functions are all externals), the pipeline would have the typeNullable ByteStream -> ByteStream
, and thelet
would get rid of theNullable ByteStream
so that the result is just aByteStream
. - In nushell, you often call functions like
[1, 2, 3] | math max
, where the left side is just a value that gets passed in, not a function to compose. Initially, i thought i wanted to support this, so thata b
is the same asb | a
. On second thought, i'm not sure it's so necessary to have, and if you wanted to write it this way, you could doconst b | a
or something. I guess it makes most sense in long pipelines where you start with a value and then apply a bunch of functions to it. A way to support this would be for|
to also accept regular values on the left side, with some automatic coercions. Not entirely sure what the best way to make that work is, though.
A slightly different idea (rough concept draft)
Not sure why i didn't think of this before, but the idea is based on
haskell/purescript, where you actually have two different kinds of "save this
value" (x <- val
and let x = val
).
- To run an external command (or something else that's "effectful"), bind its
result with
<-
. So you could domyFiles <- ls -la
, for example. - To save a value, which can be either a pure value or an effect (which then
gets saved as the effect), use
let ll = ls -la
. This will not runls
, but will savell
as an alias that can be called later. - Maybe have different builtin function/parameter types. Like, in haskell you
have
a -> b
which means "this is a function froma
tob
", but it also can be seen as "this is ab
, which you need to pass ana
to access". So you could have different function types like "this can be passed a variable number of arguments" or "this can be passed a string to denote a sub-function" or "this can be passed a single argument". Not sure if "this can be passed an stdin" should be its own thing, because then you end up where nushell is, with not being able to do both[1 2 3] | first
andfirst [1 2 3]
.