One of the hard problems that needs to be solved in a distributed storage system is to figure out how to effectively place the data within the storage cluster. Swift has a “unique-as-possible” placement algorithm which ensures that the data is placed efficiently and with as much protection from hardware failure as possible.
Swift places data into distinct availability zones to ensure both high durability and high availability. An availability zone is a distinct set of physical hardware with unique failure mode isolation. In a large deployment, availability zones may be defined as unique facilities in a large data center campus. In a single-DC deployment, the availability zones may be unique rooms, separated by firewalls and powered with different utility providers. A multi-rack cluster may choose to define availability zones as a rack and everything behind a single top-of-rack switch. Swift allows a deployer to choose how to define availability zones based on the particular details of the available infrastructure.
When Swift was first released, deployers were required to have at least as many availability zones as replicas of their data. This data placement method did not work well for most deployments. Deployers were forced into convoluted deployment patterns that did not match their underlying hardware. Despite the actual details of the deployment, clusters were required to have at least three availability zones, and ideally four or five for handoff purposes. (When data cannot be immediately placed in one of its primary locations, Swift will choose a handoff node, if available, to ensure that data is fully replicated.) Often times this lack of flexibility in the system caused deployers to do odd things. For example, a small cluster on two servers would be required to carve out some drives from each server to serve as a third availability zone.
Something better was needed, and so a better method was created. Commit bb509dd8 last April updated Swift’s data placement method to use a “unique-as-possible” placement. With this new method, deployments are not required to force Swift’s semantics onto a deployment that doesn’t exactly match.
Swift’s unique-as-possible placement works like this: data is placed into tiers–first the availability zone, next the server, and finally the storage volume itself. Replicas of the data are placed so that each replica has as much separation as the deployment allows.
When Swift chooses how to place each replica, it first will choose an availability zone that hasn’t been used. If all availability zones have been chosen, the data will be placed on a unique server in the least used availability zone. Finally, if all servers in all availability zones have been used, then Swift will place replicas on unique drives on the servers.
As an example, suppose you are storing three replicas, and you have two availability zones, each with two servers.
In this example, you can see that there is at least one copy in each availability zone, but no two replicas are on the same server. If, for example, Server C became unavailable for some reason, new writes would use Server B as a handoff node (rather than reusing Server A or Server D), thus keeping a good separation of the data and protecting data durability.
The unique-as-possible placement in Swift gives deployers the flexibility to organize their infrastructure as they choose. Swift can be configured to take advantage of what has been deployed, without requiring that the deployer conform the hardware to the application running on that hardware.