David Nalley's Blog

Secondary storage scalability configuration and tuning in CloudStack


Someone asked about this recently on the cloudstack-users mailing list, and I figure this is both an interesting design point of CloudStack as well as a specific piece of knowledge that others will find useful.

So first, lets talk about what Secondary Storage is – especially if you are not familiar with CloudStack’s hierarchy. Secondary storage is where templates (aka non-running disk images) and snapshots are stored. The  Secondary Storage VM (SSVM) handles a number of things, most importantly shuttling images and snapshots between various primary storage (where running disk images are stored) resources and the secondary storage resources. The SSVM also handles receiving uploaded disk images and aging/scavenging snapshots. The SSVM is really just some CloudStack agent software running atop a Linux VM, but bears the name SSVM because of the functionality that it is providing.

So why run this storage management service  as a stateless virtual machine instead of incorporating it into some centralized all-in-one management server? First your management server might not be network-proximate to your secondary storage resource. Particularly if you have multiple geographically disparate datacenters with resources being managed in just a few locations. Secondly, all-in-one isn’t very scalable. With the stateless ‘worker VMs’ CloudStack can dynamically scale the number of VMs up to deal with load. The minimum is that a single SSVM will exist in each zone. CloudStack uses two different configuration options to determine when to scale up and add more:

secstorage.capacity.standby is the minimal number of command execution sessions that system is able to serve immediately. secstorage.session.max is the max number of command execution sessions that a SSVM can handle. By manipulating these settings (making the former higher and latter lower) you can trigger a faster automatic scale at lower load, or you can let the defaults do their job.

Of course that’s the new-fangled cloud way of doing things – just add more workers to do the work, but that’s not the only way of tuning or making your SSVM more scalable.

First, you can just make the SSVM bigger. The default service offering is for a 500Mhz, single vCPU, 256MB of RAM machine. That is pretty small, and works well for environments where change isn’t high, snapshots are infrequent, and the environment itself is small. You can of course edit that service offering or define a new one to be beefier. If you define a new offering you need to configure that with the secstorage.service.offering parameter.  Not only can you scale out, but you can also scale up.

Of course, nothing really trumps architecting things right in the first place. It’s pretty trivial to throw up a cloud computing environment with a single NIC on every host and letting management, guest, public, and storage traffic traverse a single physical network. And there isn’t inherently anything wrong with that, but if you really need to maximize efficiency, you can always have a dedicated storage network, and even better, enable jumbo frames on your networking hardware as well as your hypervisors, storage hardware – and yes your SSVM as well – you can configure that on the SSVM with the secstorage.vm.mtu.size parameter.

Even if you don’t have intense levels of snapshot work and template work, you may still want to tune your SSVM to make it more time-efficient.