The goal of SwiftStack’s core storage is for apps to store and retrieve data without worrying about hard problems like durability, failure handling, bit rot detection, capacity scaling, and concurrent access. With SwiftStack, an app can write bytes into the storage system and later read them back out. No need to worry about anything else.
Storage administrators, the operators responsible for keeping the storage systems available and healthy, don’t have the same “fire and forget” luxuries as app developers. To start with, storage admins need to do appropriate cluster sizing. Of course, figuring out the right number of storage devices is important, but there’s more to sizing that just calculating the bytes on disk.
In addition to understanding how many bytes are needed in the storage system, storage admins must also understand how many objects need to be stored. There’s quite a difference between 1PB of 1GB objects and 1PB of 1KB objects! In a traditional storage system, you would not want to try to put a few billion files in a single directory.
However, in an object storage system, billions of objects in a single “directory” (called buckets with the S3 API and containers with the Swift API) happens on a regular basis. Applications solving problems for connected cities, media and entertainment, and IoT need a storage system that can handle billions of objects, organized however the app sees fit.
A New Architecture
In order to support these applications, SwiftStack worked as part of the upstream Swift community to design and implement a feature called container sharding. Objects in a Swift cluster are distributed throughout the cluster, but each object is tracked according to where it falls in the logical namespace. For example, in an account `myacct` with `container_a` and `container_b`, all objects stored in `container_a` will have a corresponding entry in the container itself and aggregated stats from both containers will be in the account. This tracking and rollup allows for enumerating the objects in each container and tracking usage for billing, chargebacks, and other accounting. At some point (depending on hardware), the number of items tracked in a container can become prohibitively large. Eventually, the listing will grow large enough that it exceeds the capacity of a single storage device. The container sharding feature implements a way for this listing information to be split across the entire storage cluster.
Splitting up a large listing, storing the sub-listings somewhere else, and tracking where the sub-listings live doesn’t sound like a very complicated feature. However, in Swift, to ensure durability and high availability, each of the listings are stored in multiple places and have no central coordination point. They communicate amongst themselves to converge on a consistent listing despite any partial failures that may exist in the cluster. These durability protection schemes in the listings greatly complicate the work needed to implement a container sharding feature. First, since different replicas of the listings may not be exactly the same, we must ensure that any resulting split is the same across all replicas of the listings. Second, the splitting process must be performed on active production clusters with no interruption in data access or cluster availability.
A full write-up of this architecture is available here: https://docs.openstack.org/swift/latest/overview_container_sharding.html
Operations at Scale
At SwiftStack, we have already been using it–to great success–with several of our customers already storing multiple billions of objects in a single bucket. SwiftStack makes it simple for operators to automate the entire process of utilizing this capability to manage and monitor these extremely large records.
More and more use cases not only depend on the amount of storage, but the quantity of objects that can be stored. This enables much more flexibility for applications to write data how they want to write, rather than needing to work within the constraints of the infrastructure. With the Swift architecture, many millions of containers can be created for an account and billions of objects can stored in a single bucket / container.