<- Back to garden entrance

When writing a shell, you're going to need to deal with both external and internal/builtin commands. Calling external commands feels like it's kinda the whole point of the shell to begin with, and internal commands are needed for stuff like cd which modify the shell state, or maybe some other things that are just practical to be able to do without spawning a whole new process.

External commands work in a certain way. They take in ARGV (list of strings), and then they take in stdin (byte stream). They output to stdout and stderr (also byte streams), and have exit codes (8bit unsigned int?). (They also accept/respond to signals, i think?) Shells must respect this in order to work with external commands.

Internal commands, however, are unrestricted, and it's completely up to the shell how they're handled, and how they work. Most shells go for an approach where internal commands work like external ones. In some cases you might not even know whether a given command you use a lot is a shell builtin or an external command.

How different shells do this

Disclaimer: I'm not an expert on these shells, i might be wrong on some things. Please let me know if something's wrong or if you have more info.

POSIX/bash/zsh/etc.

Builtins act and are called the same way as externals.

Nushell

Same as above, but it's a little more interesting. Nushell has more interesting typed data than just strings, and provides a lot of builtins to work with this data. These builtins can be defined with positional arguments and flags of any type, which additionally take in a type (or nothing) on stdin, and give a type (or nothing) on stdout. A single function can also define multiple stdin -> stdout pairs, so you can have a function that returns a string if it gets a string, and returns a list<string> if it gets a list<string>, and so on. External commands fit into this by acting like functions that take in any number of positional arguments and flags, and take a string on stdin and gives a string on stdout. I don't know how stderr fits into this. I'm sure you can capture the stderr of an external command, but i'm not sure if you can define your own function that emits stderr in the same way.

Due to this, and due to nushell being very FP-focused, it leads to the following (unfortunate, imo) result:

YSH (/oils?)

YSH gets a bit more creative with it. External commands are procs, which can be used in the "command language". You can also define your own procs. Though you'll be doing most of your data manipulation in the "expression language", where you have funcs. These work more like python functions, and can accept and return data of different types. There are builtin ones, and you can define your own. This lets functions be pretty normal (as in, similar to something you'd find in python), while still having full support for normal external process invocations too. I think this works pretty well, since procs and funcs appear in completely separate parts of the syntax, so you always know which you're dealing with. Although it is a bit unfortunate to have to deal with and think about two different types of functions/commands/processes that do similar things, but are actually very different.

Links:

But i want to have my cake and eat it too

Okay, so: I want external commands to work like any other function in the language, while still being able to do all the normal shell things, and also allowing to partially apply and chain functions in a pipeline. Let's make that a little more specific. I want the following to be possible:

This is how i think this can be done:

A slightly different idea (rough concept draft)

Not sure why i didn't think of this before, but the idea is based on haskell/purescript, where you actually have two different kinds of "save this value" (x <- val and let x = val).