Today I managed to fix 3 issues.

Issues:

  • weston-subsurfaces was leaking regions. This was apparent from rapidly resizing the window.
  • cheese was leaking…something. Apparent from the object IDs always increasing.
  • gedit would display a subsurface and when the subsurface was dismissed the window would stop responding.

I had realised there was an in issue with weston-subsurfaces. Namely, one of the subsurfaces would, briefly, not be in the correct set_position. I reasoned that I was either mapping the window too early, or not dealing with synchronisation properly or a combination of the two.

This lead me sort out the flip() (window.zig) function so that it also flips synchronised subwindows (after the parent itself flips). I also changed commit (wl_surface.zig) so that it will only map a window when a buffer is attached. The combination of the two seemed to do the trick and the window appears all together.

Interestingly, weston-subsurfaces desynchronises its children after set up, but I reasoned that if you resize the window, it properly sets everything to synchronous (otherwise how do you avoid one surface lagging behind). And right enough, resizing causes a temporary set_sync.

What this then revealed was that I was leaking Regions. I was a bit confused about this because I was pretty sure I was cleaning up regions properly (especially after adding some logic for a case I hadn’t handled). But still the leak persisted.

It was around this time that I realiased there was an issue with gedit and the outstanding cheese issue put me into a period of self doubt.

Which is why I’m pleased to say that I fixed all three issues!

Regions

  • The regions issue came about because sometimes a region gets created but is then never set on a particular window. Because all of the extant clean up code was actually happening with relation to a window these regions were never released. The solution then was to store a ?*Window against the region so that if destroy() is called on a region and it has the null window then we can release immediately. See this, this and that.

  • This then revealed another issue. I was also leaking ShmPools! This was a similar issues as above where a wl_shm_pool was created but a buffer was never created from that pool before the pool was destroyed. In that case we need to deinit the pool immediately on destroy(). See this commit.

At that the weston-subsurfaces issues were fixed.

Gedit

The gedit issue I was dreading the most. But it actually turned out to be not so bad. It boiled down to not removing the subsurface from the window tree when it was destroyed.

To help explain, here’s output from foxwhalectl:

➜  foxwhale git:(master) ✗ bin/foxwhalectl window-tree
window[2 ^ null] @16 (xdg_toplevel) (0, 0) (617, 399):
    window[6 ^ 2] @20 (wl_subsurface) (157, 149) (233, 194) [2, -1] [-1, -1]
    window[2 ^ null] @16 (xdg_toplevel) (0, 0) (617, 399) [-1, -1] [-1, 6]

The unindented line is a toplevel window. Then the indented lines show the order that the (sub)windows are rendered in. The window at the bottom is rendered first, the window at the top is rendered last. [6 ^ 2] means window with index 6 is the child of window with index 2.

And this makes sense here: we draw our main toplevel window first (2) and then draw the “popup” subsurface above that.

But was happening when we clicked away from that subsurface (and it disappeared leaving only the main window was) this:

➜  foxwhale git:(master) ✗ bin/foxwhalectl window-tree
window[2 ^ null] @16 (xdg_toplevel) (0, 0) (617, 399):
    window[6 ^ null] @20 (wl_surface) (0, 0) (233, 194) [-1, -1] [-1, -1]

What this implies is that we attempt to draw window 2 but we seem to get stuck and can only draw window 6 (which shouldn’t even exist any more). Note that we SHOULD always have the toplevel window (2) as one of the indented rows, but here it’s gone.

This boiled down to not detaching that subsurface window such that window 2 was still pointing to it even though it was gone. As we iterated through the (incorrect) drawing order we (incorrectly) move to window 6, but because window 6’s next / prev pointers we deinitialised we can no longer see the toplevel window.

Again, the solution was just to make sure we detach() inside the deinit() function.

This then gives the correct output after the subwindow disappears:

➜  foxwhale git:(master) ✗ bin/foxwhalectl window-tree
window[2 ^ null] @16 (xdg_toplevel) (0, 0) (617, 399):
    window[2 ^ null] @16 (xdg_toplevel) (0, 0) (617, 399) [-1, -1] [-1, -1]

Nice!

Cheese

This issue I had already been aware and was consistent, though relatively quiet, noise in my brain.

I had been worried that it was something in the underlying wire protocol implementation that meant delete_id events were delayed after a callback done such that the client would go and take a new id rather than reuse and old ID.

However, having been looking at some of the other destroy() functions and with wl_shm_pool stuff fresh in my mind I realised that we were never releasing the wl_shm_pool object. A quick fix later and cheese no longer causes leaks!

Memory leaks

One of the cool things that’s come up from using fixed arrays of resources is that it’s absolutely fantastic for detecing memory leaks. If you set the size to be sensibly long (long enough for a well behaved program) then you get a nice crash (in debug build) when it exhausts the available objects.

If I was just malloc’ing resources then I think it would be much harder to track things down.

One thing I have been considering is that resources that belong to a Client should not be global as the currently are. Rather each Client should have a fixed size of, say, Windows that it can allocate. This then means, if we run out of memory because a Client is misbehaving then it will be killed rather than the unfortunate client that happens to request a Window that we’ve just run out of.

I also think that perhaps we should do some dynamic allocation, so that we keep our memory usage to a minimum, but still have some fixed maximum value. I imagine this would be very similar to Stalloc except we would grow (realloc) the underlying array up to a certain maximum capacity. We still get our nice block of resources but we can start small and grow when we need to but still kill bad clients if they get excessive.

If we’re doing this we’ll need to, and I actually quite like this, is rather than use pointers to resources (e.g. *Window), we’ll just use an index. This means that the allocator can quite happily reallocate the array and copy the existing content and all our references will still work.

Perhaps we swap out what type of allocation is being used a build time, such that a debug build would use a quite small (but big enough so that well behaved clients don’t trigger OutOfMemory) and fixed sized array so that memory leaks become easily apparent, but a release build grows dynamically up to some max size.

Generally I just really like that Zig makes memory allocation quite an explicit thing. By that I mean that there’s no assumption of just malloc and that it encourages the right type of allocation for a particular problem.