External and internal commands

When writing a shell, you're going to need to deal with both external and internal/builtin commands. Calling external commands feels like it's kinda the whole point of the shell to begin with, and internal commands are needed for stuff like cd which modify the shell state, or maybe some other things that are just practical to be able to do without spawning a whole new process.

External commands work in a certain way. They take in ARGV (list of strings), and then they take in stdin (byte stream). They output to stdout and stderr (also byte streams), and have exit codes (8bit unsigned int?). (They also accept/respond to signals, i think?) Shells must respect this in order to work with external commands.

Internal commands, however, are unrestricted, and it's completely up to the shell how they're handled, and how they work. Most shells go for an approach where internal commands work like external ones. In some cases you might not even know whether a given command you use a lot is a shell builtin or an external command.

How different shells do this

Disclaimer: I'm not an expert on these shells, i might be wrong on some things. Please let me know if something's wrong or if you have more info.

POSIX/bash/zsh/etc.

Builtins act and are called the same way as externals.

Nushell

Same as above, but it's a little more interesting. Nushell has more interesting typed data than just strings, and provides a lot of builtins to work with this data. These builtins can be defined with positional arguments and flags of any type, which additionally take in a type (or nothing) on stdin, and give a type (or nothing) on stdout. A single function can also define multiple stdin -> stdout pairs, so you can have a function that returns a string if it gets a string, and returns a list<string> if it gets a list<string>, and so on. External commands fit into this by acting like functions that take in any number of positional arguments and flags, and take a string on stdin and gives a string on stdout. I don't know how stderr fits into this. I'm sure you can capture the stderr of an external command, but i'm not sure if you can define your own function that emits stderr in the same way.

Due to this, and due to nushell being very FP-focused, it leads to the following (unfortunate, imo) result:

You will often have pipelines like a | b | c, where most or all of the functions used are builtins
This both looks, feels, and acts a lot like a chain of partially applied functions, like you would find in haskell for example: a & b & c (same as c (b a))
But there's a crucial difference! nushell doesn't have currying or partial application. For a function to accept an argument from a pipeline, it needs to be defined specifically to take pipeline input. This is how most functions in nushell are defined. They take their "main" input as stdin, and don't accept it as a positional argument. So you for example can't do math max [1 2 3], you have to do [1 2 3] | math max.

YSH (/oils?)

YSH gets a bit more creative with it. External commands are procs, which can be used in the "command language". You can also define your own procs. Though you'll be doing most of your data manipulation in the "expression language", where you have funcs. These work more like python functions, and can accept and return data of different types. There are builtin ones, and you can define your own. This lets functions be pretty normal (as in, similar to something you'd find in python), while still having full support for normal external process invocations too. I think this works pretty well, since procs and funcs appear in completely separate parts of the syntax, so you always know which you're dealing with. Although it is a bit unfortunate to have to deal with and think about two different types of functions/commands/processes that do similar things, but are actually very different.

Links:

Oils page about procs and funcs
Oils page about interior and exterior (i don't fully understand the distinction yet)

But i want to have my cake and eat it too

Okay, so: I want external commands to work like any other function in the language, while still being able to do all the normal shell things, and also allowing to partially apply and chain functions in a pipeline. Let's make that a little more specific. I want the following to be possible:

extA sub --flag=1 | extB -f4 | extC a -v c: External commands with parameters chain in the expected way. if extA reads stdin, it will be given it, and its stdout will be passed to the stdin of extB and so on.
If you have a function add1 that adds 1 to a number, you should be able to do both add1 5 and 5 | add1, they should mean the same thing
let x = extA param1 param2 should call extA, give it no stdin, and save its stdout to x. It shouldn't save a function that takes in stdin as a parameter.

This is how i think this can be done:

We have normal haskell-style curried functions, but with an important addition: You can have "rest"/"spread" arguments anywhere you want. A normal example would be f :: A -> ...B -> C, which you could call like f a b b b to get a result of type C. But, you could also have g :: ...A -> ...B -> C -> D, which you could call like ((g a) b b) c. So, you have to delimit each "set" of rest arguments to show that you're done with that run of arguments. This gets a little awkward if you want to pass none, like, i guess you have to say ((g)) c or something. Maybe there should be another way to signify that this run of arguments is "done". But, it also needs to be able to happen implicitly, because:
The type of an external command is ...String -> ByteStream -> ByteStream. It takes any number of string arguments (argv), then an stdin, and returns an stdout. For capturing stderr and exit codes, the return type should probably be slightly different, but it gets automatically "reduded" to just the stdout stream if it's being piped somewhere else.
When let-binding an expression, it gets automatically passed an empty stdin, if it expects one. let files = ls should fill 0 ...String arguments, so that we're left with ByteStream -> ByteStream. Then, since it's being bound, something kicks in, sees that the ByteStream parameter can be satisfied with an empty stream (or binding it to /dev/null? however that works), and gives the ByteStream output.
There should probably be something you can put around a type (like Nullable for example), that signifies that it can be "coerced away". So instead of taking in a ByteStream, externals should probably take in a Nullable ByteStream, and let would "coerce" it away.
So, it seems | should have the type forall a b c. (a -> b) -> (b -> c) -> (a -> c). It takes in two functions, and returns a function where the output of the first is passed to the input of the second. This should work fine with everything that's been discussed so far. If you did let x = a | b | c, (where the functions are all externals), the pipeline would have the type Nullable ByteStream -> ByteStream, and the let would get rid of the Nullable ByteStream so that the result is just a ByteStream.
In nushell, you often call functions like [1, 2, 3] | math max, where the left side is just a value that gets passed in, not a function to compose. Initially, i thought i wanted to support this, so that a b is the same as b | a. On second thought, i'm not sure it's so necessary to have, and if you wanted to write it this way, you could do const b | a or something. I guess it makes most sense in long pipelines where you start with a value and then apply a bunch of functions to it. A way to support this would be for | to also accept regular values on the left side, with some automatic coercions. Not entirely sure what the best way to make that work is, though.

A slightly different idea (rough concept draft)

Not sure why i didn't think of this before, but the idea is based on haskell/purescript, where you actually have two different kinds of "save this value" (x <- val and let x = val).

To run an external command (or something else that's "effectful"), bind its result with <-. So you could do myFiles <- ls -la, for example.
To save a value, which can be either a pure value or an effect (which then gets saved as the effect), use let ll = ls -la. This will not run ls, but will save ll as an alias that can be called later.
Maybe have different builtin function/parameter types. Like, in haskell you have a -> b which means "this is a function from a to b", but it also can be seen as "this is a b, which you need to pass an a to access". So you could have different function types like "this can be passed a variable number of arguments" or "this can be passed a string to denote a sub-function" or "this can be passed a single argument". Not sure if "this can be passed an stdin" should be its own thing, because then you end up where nushell is, with not being able to do both [1 2 3] | first and first [1 2 3].