Architecting large-scale Artificial Intelligence/Deep Learning data pipelines
In part 1 of this series https://bit.ly/2Taw9WH we discussed how Artificial Intelligence/Deep Learning applications and workflows are inherently different from traditional file-based applications. In this post we will examine what alternatives are available to customers, and why SwiftStack is uniquely positioned to meet these requirements.
Is the current approach the right approach?
Storage vendors have been trying to architect AI/DL environments the same way they have architected typical file-based workloads. This involves using expensive flash storage arrays for ingest, training, and inferencing, when there is really no need to replicate the memory and flash storage which is present in the compute layer. The economics also makes this exorbitantly expensive, as these datasets start somewhere at 10-15 petabytes with the expectation of growing to 50+ petabytes. For example, to build a 15-petabyte storage tier using all-flash storage array would cost approximately $22.5M, assuming a 500TB all-flash array at $1.5/gigabyte street price.
Additionally, flash storage arrays, based on distributed filesystems, have limited ability to add metadata or labeling to DL workflows, nor do they have the cloud connectivity for leveraging hybrid or multi-cloud workflows. This introduces another archival object tier in the pipeline that can provide metadata enrichment services and cloud/hybrid workflows. Moving from disparate storage tiers result in delayed business outcomes, defeating the very purpose of AI/DL pipelines.
Can we do this differently?
What is expected of the storage layer is massive parallelism and hundreds of GBs of throughput to keep GPUs busy, and scalability to accommodate data sets from ten to several hundred PBs. SwiftStack object storage, because of its performance advantage, can provide this very effectively, even with high-capacity spinning drives, at better than “cloud economics.” Plus, the HDD AFR (annual fail rate) is accounted for by several features ensuring extremely high durability, such as erasure coding, high availability, and geographic dispersion.
SwiftStack middleware provides policy-based actions or the ability to facilitate adding custom metadata and labels for supervised workflows. SwiftStack 1space cloud connector and 1space NAS connectors provide global namespace across the core cloud, and cloud-native access to existing NAS data sets respectively. Consequently, the entire data pipeline requirements are met by a single storage tier, thus enabling quicker value to insights and providing actionable intelligence.
As evidence, multiple SwiftStack customers in the autonomous vehicle market are using this architecture today. We have at least three customers, including NVIDIA’s autonomous vehicle training project Maglev, using similar data pipelines.
Clement Farabet, VP of AI infrastructure at Nvidia, eloquently articulated their autonomous vehicles pipelines and how SwiftStack is leveraged as part of their datacenter design here https://developer.nvidia.com/gtc/2019/video/S9649 at GTC 2019.
Comparison based on Gartner selection criteria for AI/DL storage requirements
In fact, if we look at the selection criteria laid down by Gartner for AI/DL storage requirements, and map the available alternatives, we see how SwiftStack provides the best architecture for these workflows.
Data-driven applications leveraging AI/ML/DL pipelines do not necessarily need external shared All-Flash storage systems. The storage stacks do leverage the accelerated compute layer, which have enough memory and flash to accelerate frameworks like GPU-enabled RAPIDS or distributed caching, while the storage layer is expected to provide massive parallelism and throughput to keep the compute saturated. SwiftStack’s deep expertise in Deep Learning architectures, have helped us get enrolled in the coveted Nvidia partner program (NPN) as a solution advisor for these new applications and workflows. SwiftStack has built reference architectures with Nvidia and Cisco, using this expertise.