A Blueprint: Multi-Region, Multi-Cloud, Metadata-Searchable Storage
Fred Hutchinson Cancer Research Center. HudsonAlpha Institute for Biotechnology. A top-10 global pharmaceutical. Counsyl. Oklahoma Medical Research Foundation. Ploid. University of Virginia. Georgia Tech. SURFsara. And more. These companies and institutions—ranging from leading pharmaceuticals to globally recognized not-for-profit researchers and myriad organizations providing specific services to the industry—have been moving rapidly into the future of scientific computing and research, and—for each of them—managing the ever-growing repositories of data has been a challenge.
This is the story of the current reality that we hear all too often: Individual departments or researchers maintain their own servers and storage; some use Amazon or Google’s cloud services; some leverage a central IT department; most do some of both. Important data gets shipped around the world on disk drives, so collaboration moves at the pace of freight shipments, and extra copies take up unnecessary space. The sum of all the laptop, workstation, GPFS/Lustre, NAS/SAN, and AWS/Google storage keeps growing, but no one has a clear view into what is stored where. Most of the applications still require NFS or SMB/CIFS protocols, but the scalable/flexible/cost-optimized cloud storage vendors only support the S3 or Swift APIs…or try to bridge the gap with an expensive gateway that can’t scale like the storage itself. Centralized search of the “whole archive of data” is a pipedream; metadata must be a part of the answer, but there are no standards for what metadata to generate or how to store it to make global search a reality. Meanwhile, the expectations of things like personalized medicine only accelerate expectations for fast turnaround times, so project deadlines squeeze closer, and no one has the time to think about overhauling an entire infrastructure to proactively address all of this for the future.
If that resonates, you are not alone; every company mentioned above—and many others—have been where you are. By itself, SwiftStack can’t cure cancer or prevent heart attacks or understand juvenile diabetes or help prepare new parents to care for a special-needs child, but we have made it a fundamental part of our business to simplify storage and data management so that you can. This blueprint is intended to introduce a strategy and architecture that has worked well for many SwiftStack clients to date.
Modernizing Data Management in Scientific Research
Who else is using SwiftStack for this? What problems are they solving?
We have included in the paper many details about solutions we deployed in close collaboration with our clients and ideas of how we hope they can help you as well, but if you would prefer a conversation, we’d love to speak with you directly.