mstill.dev / blog / DRAFT: The Elements of Zig Style []

This is an overview of my personal Zig coding style. The high-level elements of this style are:

  1. Return early
  2. Use break and continue
  3. Use iterators
  4. Use the workspace pattern
  5. @panic at the disco
  6. Prefer to split out functions by domain-level concept

None of these are particularly exciting…but taken together I find the resulting code very easy to come back to and understand. Some of these are more general than Zig, but that the Zig language and build system make particularly pleasant.

1. Return early

A very high proporition of functions have precoditions that if violated cause the function either to not run to completion or error.

Prefer:

fn foo(optional_bar: ?usize, optional_baz: ?usize) usize {
    const bar = optional_bar orelse return;
    const baz = optional_baz orelse return;

    return bar + baz;
}

To:

fn foo(optional_bar: ?usize, optional_baz: ?usize) usize {
    if (optional_bar) |bar| {
        if (optional_baz) |baz| {
            return bar + baz;
        }
    }
}

I hope the benefit is obvious:

In a codebase with this pattern it becomes very quick to figure out what the meat of a function is doing…it will tend to be towards the bottom of the function and left-indented. This return early pattern minimises indentation…some of the other elements are related or also reduce indentation and this is maybe a more general theme.

Note also that in the above example the orelse is doing something very similar to try when calling a function that can error:

fn foo() !usize {
    const bar = try getBar();
    const baz = try getBaz();

    return bar + baz;
}

Also of note, the function in the example above returns void and so we just have return;, but I will frequently have code that returns null or an error:

fn foo(optional_bar: ?usize, optional_baz: ?usize) !?usize {
    const bar = optional_bar orelse return error.ExpectedBar;
    const baz = optional_baz orelse return null;

    return bar + baz;
}

Some final thoughts on return early:

2. Use break and continue

Okay, you are now using the return-early pattern. Excellent! The cool things is we can continue this style into loops. In doing so we’ll make our loops easier to read in the same way as return early.

This leads us to:

break and continue are to loops what return is to functions

You use return right? When exiting the scope of the current loop iteration two things can happen, we either break out of the loop or continue back to the top of scope and so we need two keywords for the equivalent of return.

Similar to some people railing against return-early, there seems to be similar noise around not using break or continue.

Prefer:

fn foo(bars: []?Bar) usize {
    var count: usize = 0;

    for (bars) |optional_bar| {
        const baz = optional_bar orelse continue;

        count += baz.count();
    }

    return count;
}

To:

fn foo(bars: []?Bar) usize {
    var count: usize = 0;

    for (bars) |optional_bar| {
        if (optional_bar) |bar| {
            count += baz.count();
        }
    }

    return count;
}

In the example above you can see continue act towards the loop as return would to a function. The particular

3. Use iterators

…you don’t have much choice! Zig doesn’t have first-class functions so you can’t pass a lambda to, say, a map or filter function (though I’ve seen some attempts to force functional paradigms into Zig code…but it’s really not pretty and is going against the essence of the language).

Right enough, if you have some []T you can use a for (xs) |x|, but for more complicated data structures you are going to use or write iterators a lot.

While above we were trying very hard to avoid the for / if / while (x) |y| syntax to reduce indentation, this syntax makes working with iterators really pleasant:

fn foo(bars: []?Bar) usize {
    var count: usize = 0;

    while (it.next()) |value| {
        const baz = optional_bar orelse continue;

        count += baz.count();
    }

    return count;
}

I actually think iterators are underrated, which is maybe a topic for another post. Briefly, iterators give you laziness…the caller is in control of when the next element in a collection is accesssed and importantly that element is returned in the scope of the caller. I think maybe this last point is less thought about than it should be…if I return in a lambda I pass to map I return in the scope of the lambda, not the scope of the caller. This can be particuarly important for example in Zig, if I want to return an error whilst iterating, or in javascript where because of async+await / function colouring we may want to await in the callers scope.

4. Use the workspace pattern

rust (via cargo) and javascript (well yarn, pnpm etc.) enable this cool “workspace” pattern. The way I think of this feature is normalising your code in the same way that you normalise data in a database, effectively flattening what would otherwise be a tree. Sharing modules between your local code now acts the same as pulling in third-party dependencies…and I really think it helps structure your code in a sensible fashion and really consider the relationships between your different modules.

The zig package management stuff doesn’t have anything explicitly called workspaces, but the zig build system lets you build essentially the same thing. I have started to organise all my zig repos in that fashion, for example: https://github.com/malcolmstill/foxwhale.

Each module has its own build.zig and build.zig.zon and then, in this example, the toplevel build.zig.zon defines these dependencies:

    .dependencies = .{dependencies.
        .foxwhale_gen = .{
            .url = "https://github.com/malcolmstill/foxwhale-gen/archive/21c0f17c7d2258fcdf0819866d4cf4c784f00213.tar.gz",

            .hash = "122006b5217a03a29dab5ffc6040814e9cf3da85de6553177d72cd00a7d61584634b",

        },
        .foxwhale_epoll = .{ .path = "foxwhale-epoll" },
        .foxwhale_pool = .{ .path = "foxwhale-pool" },
        .foxwhale_subset_pool = .{ .path = "foxwhale-subset-pool" },
        .foxwhale_iterable_pool = .{ .path = "foxwhale-iterable-pool" },
        .foxwhale_wayland = .{ .path = "foxwhale-wayland" },
        .foxwhale_backend = .{ .path = "foxwhale-backend" },
        .foxwhale_animation = .{ .path = "foxwhale-animation" },
        .foxwhale_ease = .{ .path = "foxwhale-ease" },
    },

And the build.zig exposes those to the main exe with the following:

    const epoll = b.dependency("foxwhale_epoll", .{ .target = target, .optimize = optimize });
    const pool = b.dependency("foxwhale_pool", .{ .target = target, .optimize = optimize });
    const subset_pool = b.dependency("foxwhale_subset_pool", .{ .target = target, .optimize = optimize });
    const iterable_pool = b.dependency("foxwhale_iterable_pool", .{ .target = target, .optimize = optimize });
    const wayland = b.dependency("foxwhale_wayland", .{ .target = target, .optimize = optimize });
    const backend = b.dependency("foxwhale_backend", .{ .target = target, .optimize = optimize });
    const animation = b.dependency("foxwhale_animation", .{ .target = target, .optimize = optimize });
    const ease = b.dependency("foxwhale_ease", .{ .target = target, .optimize = optimize });

    const exe = b.addExecutable(.{
        .name = "foxwhale",
        .root_source_file = .{ .path = "src/main.zig" },
        .target = target,
        .optimize = optimize,
        .single_threaded = true,
    });

    exe.root_module.addImport("foxwhale-epoll", epoll.module("foxwhale-epoll"));
    exe.root_module.addImport("foxwhale-pool", pool.module("foxwhale-pool"));
    exe.root_module.addImport("foxwhale-subset-pool", subset_pool.module("foxwhale-subset-pool"));
    exe.root_module.addImport("foxwhale-iterable-pool", iterable_pool.module("foxwhale-iterable-pool"));
    exe.root_module.addImport("foxwhale-wayland", wayland.module("foxwhale-wayland"));
    exe.root_module.addImport("foxwhale-backend", backend.module("foxwhale-backend"));
    exe.root_module.addImport("foxwhale-animation", animation.module("foxwhale-animation"));
    exe.root_module.addImport("foxwhale-ease", ease.module("foxwhale-ease"));

Read more

5. @panic at the disco

Assertions are cool. Assertions are the ultimate comment. Use them everywhere.

Let’s say we have this function:

/// Do foo
fn foo(bar: []usize) void {
    // ...
}

Unfortunately foo requires that bar be sorted for to operate properly. But we currently rely on the caller to ensure that they have done this. What can we do instead?

/// Do foo
///
/// `bar` needs to be sorted in ascending order
fn foo(bar: []usize) void {
    // ...
}

Okay we’ve added a comment…that’s a marginal improvement…the person writing the caller might read the documentation…and they might ensure the order. But chances are they won’t

How about:

/// Do foo
fn foo(bar: []usize) void {
    const sorted_bar = sort(bar);

    // ...
}

Okay, this is better because we ensure internally that we end up with a sorted array. However this will incur some performance penalty…and if we are being consistent we’d add this same sorting call all the way down a stack. So let’s not do that.

Maybe:

/// Do foo
fn foo(bar: []usize) !void {
    if (!isSorted(bar)) return error.BarNotSorted;

    // ...
}

Better, we don’t incur the penalty of sorting, but we will now get an error. But we don’t actually want this to be an error that can be handled. We just want to assert:

/// Do foo
fn foo(bar: []usize) !void {
    std.debug.assert(isSorted(bar));

    // ...
}

If we don’t pass in a sorted bar our program will exit and we will know we’ve made a programming mistake. And we can see that this assertion is basically an executable version of our attempt to add a comment.

Don’t forget to defer postconditions.

Read more

6. Prefer to split out functions by domain-level concept

…rather than function length.

It’s tempting to split a large function into smaller pieces when it gets too big. I tend to avoid doing that too early, preferring to live with the long function.

If I do have a long function, I will try and delimit different parts of the function within a block scope.

If I do ultimately split the function out I’d much rather do it where there different domain concepts. If there are those concepts you’ll find that naming the function is very easy. If you do it arbitrarily you’ll find you struggle to come up with a name.

Final thoughts

I’m not sure if any of the above is particularly interesting or useful to anyone…but let’s me get them out of my head and in a place for future reference.