Testing Gurp
30 July 2025

When you’re writing a tool whose sole purpose is to run as root and change things, you’ve got to be pretty confident it’s going to do what you think it’s going to do. This means tests. Loads of tests.

Unit Tests for Vanity

Gurp’s backend – the bit which actually changes things – is all written in Rust, and it doesn’t have a huge amount of test coverage.

As an absolute Rust novice I wrote a lot of unit tests. I was in the habit, having come from a Ruby background, and it helped me understand the basics of the language. But as I became more able, I found I had more confidence in my code than I’ve had with any other language. Unit tests seemed necessary only if I was implementing some hard-to-understand, complex logic.

If you look at Gurp’s Janet code, however, you’ll find that pretty much every function and every macro has tests. I think behaviour has to be pinned down somehow, and a quick unit test is a good way to that. (I’m not strongly opinionated on static vs dynamic typing. They both have pros and cons, all of which are usually overstated.)

Functional Tests for Sanity

What Gurp lacked was strong functional tests.

The file-line doer is pure Rust, and OS-agnostic, so you can test it thoroughly from mod test{}.

file and directory are harder because, even though they don’t shell out, they can change file ownership which, running as a normal user, cargo (probably) can’t do. Github actions lets us run things in containers, so we could run as root, and write tests that perform real actions on real files, then use assert_fs and a bit of metadata inspection to check things worked. But running as root is dirty. We’re better than that.

How about something like SMF? We can test that sensible-looking manifests are generated from user input, but how do we ensure that when those manifests are imported, they do the right thing? We can’t run SMF in Docker, so the best we can do is assert the manifests Gurp generates, and the commands it runs. That introduces a lot of messy mocking, and it’s never going to be the same as interacting with a real OS.

Proper functional tests can only run on an illumos box. Fine. That’s where I do 90% of my development. Gurp itself can spin up a zone and run a command in that zone, so it seems like it could be used to test itself. But what should it run inside that zone?

Gurp should be idempotent, so a simple test would be to run it once, then run it again and assert that it makes no changes. That proves some amount of the internal logic, but it does not prove it did what it said it would. It may have got owner and group the wrong way round on every file both times.

The obvious approach is to write a tester, probably in Rust. But I know if I did that I’d end up re-using the code from Gurp itself, or at least taking a very similar approach. I might make the same mistake in both places. Then I have to think about how I would define the desired state? It feels a lot like I’d end up writing a second Gurp. And what if there was some obscure bug in Rust where it always set files to 2755 instead of 0755 on illumos? Admittedly it’s a forced example, but I don’t like the idea of using something to test itself. I need to look at another language.

In the past I wrote a lot of ServerSpec. It’s a decent tool but it’s rather old, and it requires Ruby. Ruby is a big package, with a lot of files, and I don’t want to have to drop it into my test zone every time. I could build a reference zone with Ruby and all the required gems, and have Gurp clone a fresh zone from that every time it wanted to run the tests, but ServerSpec is pretty old, and I don’t know how good its illumos support might be.

I like Ruby, and I considered a from-scratch Ruby checker, but there’s no way around that Ruby runtime. I even thought about Crystal, but the illumos support for it was extremely sketchy the last time I looked.

Through gritted teeth, I looked at the YAML-driven goss. It fell at the first hurdle, having dependencies that don’t build on illumos, and even if I got it working, I’d have to implement support for things like svcprop myself.

So why not write something a little like goss myself, in Go? Well, this is supposed to be a fun project, which rules Go out completely. And, again, I don’t want to write TWO config management tools.

When I set down my requirements:

I realised I already had the tools I needed.

Judge

Judge is the Janet testing framework I already use in Gurp. It has a very nice line in macro expansion, but the feature of interest here is that it generates test values for you.

Say you put this in a file:

(use judge)

(test (-> ["P" "R" "U" "G"] (reverse) (string/join) (string/ascii-upper)))

and run judge, you see

# a.janet

(test (-> ["P" "R" "U" "G"] (reverse) (string/join) (string/ascii-upper)))
(test (-> ["P" "R" "U" "G"] (reverse) (string/join) (string/ascii-upper)) "GURP")

0 passed 1 failed

Notice that Judge has filled in the second half of the comparison. And if you run judge -a, it will write that value into the file. If you are happy with the value, leave it there (you can of course put the value in yourself), and when you re-run judge, the test will pass. Given that collecting the values for the assertions was going to be half the work, this is a big win.

janet-sh

The other part of the puzzle is this terrific library which makes shell scripting just about as nice as shell scripting can be.

($ ls /etc | grep pass)

Does exactly what you think it will. And ($< ls /etc | grep pass) will capture stdout, allowing you to use it in scripts.

Putting the two together, let’s say I want to test that my publisher got added as it should.

(use judge)
(use sh)

(deftest "test-sysdef-publisher-was-created"
  (test
    ($< pkg publisher)))

I run judge -a, and it inserts into the (test) call the stdout of the pkg publisher command. I eyeball it, it looks good, and I have an infintely repeatable test. Minimum effort, maximum satisfaction.

The ($) macro is extremely capable, and can handle flags, pipes, errors, and anything else you could wish for. So it’s super easy to write targeted shell commands. Or if it’s neater, I can capture output, parse it in Janet with a PEG or whatever, and assert on some aspect of that parsed output.

Implementation

I made a new project, called Merp. It is in my home directory, which is accessible from my main dev zone and also from the global zone of the machine which hosts it. From the dev zone, which has Janet installed, I run a script which copies in the janet binary, and uses jpm (the Janet package manager) to install the judge and sh modules.

In the global zone, I run a script setup.sh, which installs a clean zone and copies into it janet and the aforementioned modules.

A (controller-for) macro spits out Gurp config to clone that zone and bootstrap it with a given role. It’s called like this:

(controller-for "pkg-server" :remove-after false
                             :test-basenode true
                             :with-dataset true)

basenode is a module all my zones get, and which most of them need to do their thing. Adding the :test-basenode option applies and tests basenode in the cloned zone.

If :with-dataset is truthy, Gurp also creates a ZFS dataset which is delegated to the test zone.

:remove-after tells Gurp to destroy the zone and, if there was one, the dataset, after the tests have run. Normally you’d want to do that, but not doing makes it simpler to make tests.

The zone config generated by controller-for includes a (zone-fs) resource, which mounts the merp directory in the zone’s /var/tmp.

I write a skeleton test of all the things I want to check, like

(deftest "test-zone"
  (test ($< /bin/stat -c "%U:%G %A" /export/home/backup))

  (test ($< /bin/svcs -Ho state svc:/sysdef/telegraf:default)))

then run a little run-tests.sh wrapper. This calls Gurp, which clones the zone, bootstraps it, and calls zlogin to run the tests. They obviously fail because they have no expected values. So I zlogin to the zone myself, cd /var/tmp/tests/judge and run judge -a test-module.janet, and my above file becomes

(deftest "test-zone"
  (test ($< /bin/stat -c "%U:%G %A" /export/home/backup)
    "root:root drwxr-xr-x\n")

  (test ($< /bin/svcs -Ho state svc:/sysdef/telegraf:default)
      "online\n"))

If the results look good, I’m done. The next time I run-tests.sh the tests will pass. Assuming Gurp works, of course.

I like this solution so much I’m toying with the idea of writing a tiny ServerSpec style tool around it.

tags