Server Allocated Buffers in Mir

  2013-03-14


…Or possibly server owned buffers

One of the significant differences in design between Mir and Wayland compositors¹ is the buffer allocation strategy.

Wayland favours a client-allocated strategy. In this, the client code asks the graphics driver for a buffer to render to, and then renders to it. Once it's done rendering it gives a handle² to this buffer to the Wayland compositor, which merrily goes about its job of actually displaying it to the screen.

In Mir we use a server-allocated strategy. Here, when a client creates a surface, Mir asks the graphics driver for a set of buffers to render to. Mir then sends the client a reference to one of the buffers. The client renders away, and when it's done it asks Mir for the next buffer to render to. At this point, Mir sends the client a reference to another buffer, and displays the first buffer.

You can see this in action - in the software-rendered case - in demo_client_unaccelerated.c (I know the API here is a bit awkward; I'm agitating for a better one).
The meat of it is:


mir_wait_for(mir_surface_next_buffer(surface, surface_next_callback, 0));
which asks the server for the next buffer and blocks until the server has replied. This also tells the server that the current buffer is available to display.
Then

mir_surface_get_graphics_region(surface, &graphics_region);
gets a CPU-writeable block of memory from the current buffer. You'll note that we don't have a mir_wait_for around this; that's because we've already got the buffer from the server above; this just accesses it.

The code then writes an interesting or, in fact, highly boring pattern to the buffer and loops back to next_buffer to display that rendering and get a new buffer, ad infinitum.

Great. So why are you doing it, again?

The need for server-allocated buffers is primarily driven by the ARM world and Android graphics stack, an area in which I'm blissfully unaware of the details. So, as with my second-hand why Mir post, you, gentle reader, get my own understanding.

The overall theme is: server-allocated buffers give us more control over resource consumption.

ARM devices tend to be RAM constrained, while at the same time having higher resolution displays than many laptops. Applications also tend to be full-screen. Window buffers are on the order of 4 MB for mid-range screens, 8 MB for the fancy 1080p displays on high end phones, and 16 MB for the Nexus 10s. Realistically you need to at least double-buffer, and it's often worth triple-buffering, so you're eating 8-12MB on the low end and 32-48MB on the high end, per window. Have a couple of applications open and this starts to add up on devices with between 512MB and 2GB of RAM and no dedicated VRAM.

In addition to their lack of dedicated video memory, ARM GPUs also tend to be much less flexible. On a desktop GPU you can generally ask for as many scanout-capable³ buffers as you like. ARM GPUs tend to have several restrictions on scanout buffers. Often they need to be physically contiguous, which in practice means allocated out of a limited block of memory reserved at boot. Sometimes they can only scanout of particular physical addresses. I'm sure there are other oddities out there.

So scanout buffers are a particularly precious resource. However, you really want clients to be able to use scanout-capable buffers - in the common case where you've got a single, opaque, fullscreen window, you don't have to do any compositing, so having the application draw directly to the scanout buffer saves you a copy; a significant performance win. Desktop aficionados might recognise this as the same effect as Compiz's unredirect fullscreen windows option.

So, there's the problem domain. What do server-allocated buffers buy us?

The server can control access to the limited scanout buffers.
This one's pretty self-explanatory.

The server can steal buffers from inactive clients.
When applications aren't visible, they don't really need their window buffers, do they? The server can either destroy the inactive client's buffers, or clear them and hand them off to a different application.

When applications are idle for an extended period we'll want to clear up more resources - maybe destroy their EGL state, and eventually kill them entirely. However, applications will be able to resume faster if we've just killed their window buffers than if we've killed their whole EGL state.

The server can easily throttle clients.
There's an obvious way for the server to prevent a client from rendering faster than it needs to; delay handing out the next buffer.

Finally:

Having a single allocation model is cleaner.
It's cleaner to have just one allocation model. We need to do at least some server-allocation, so that's the one allocation model. Plus, even though desktops have lots more memory than phones, people still care about memory usage ☺.

But doesn't X do server-allocated buffers, and it is terrible?

Yes. But we're not doing most of the things that make server-allocated buffers terrible. In X's case
  • X allocates all sorts of buffers, not just window surfaces. We leave the client to allocate pixmap-like things - that is, GL textures, framebuffer objects, etc - itself. Notably, we're not allocating ancillary buffers in the server, just the colour buffers. We shouldn't need to change protocol to support new types of buffers (HiZ springs to mind), like has happened with DRI2.
  • X isn't particularly integrated. When you resize an X window, the client gets an DRI2 Invalidate event, indicating that it needs to request new buffers. It also gets a separate ConfigureNotify event about the actual resize. We won't have this duplication; clients will automatically get buffers of the new size on the next buffer advance, and these buffers know their size.
  • X has more round-trips than necessary. After submitting rendering with SwapBuffers, clients receive an Invalidate event. They then need to request new, valid buffers. We won't have as many roundtrips; the server responds to a request to submit the current rendering with the new buffer.

A possibly significant disadvantage we can't mitigate is that the server no longer knows how big the client thinks its window is, so we may display garbage in the window if the client thinks its window is smaller than the server's window. I'm not sure how much of a problem that will be in practise.

Summary

Many of the benefits of server-allocation could also be gained with sufficiently smart clients and some extra protocol messages - although that falls down once you start suspending idle clients, as it requires the client to actually be running. We think that doing it in the server will be easier, cleaner, and more reliable.

I'm less convinced that server-allocation is the right choice than the rest of Mir's architecture. I'm not convinced that it's a mistake, either, and some of the benefits are attractive. We'll see how it goes!


¹: Yes, I know you can write a server-allocated buffer model with a Wayland protocol. All the Wayland compositors for which we have source, however, use a client-allocated model, and that's what existing Wayland client code expects. Feel free to substitute “existing Wayland compositors” whenever I say “Wayland”.

²: Actually, in the common (GPU-accelerated) case the client itself basically only gets a handle to the buffer; the GEM handle, which it gets from the kernel.

³: scanout is the process of reading from video memory and sending to an output encoder, like HDMI or VGA. A scanout-capable buffer is the only thing the GPU is capable of putting on a physical display.