Top Title: 
Storage and Data Management for AI
NVIDIA Partner Graphic
NVIDIA Partner Graphic

The Platform for Intelligent Data

The only proven storage and data management solution that can accelerate deep learning at ultra-scale.

SwiftStack AI Architecture

White Paper

The SwiftStack AI Architecture is essential for deep learning at ultra-scale. This customer-proven architecture stack satisfies the storage performance needed by GPU compute complexes and modern AI frameworks accessing and processing hundreds of petabytes of data.

Infrastructure challenges are the primary inhibitor for broader adoption of AI/ML workflows. SwiftStack’s multi-cloud data management solution is the first of its kind in the industry and effectively handles storage I/O challenges faced by edge-to-core-to-cloud, large-scale AI/ML data pipelines.

Amita Potnis

Research Director at IDC’s Infrastructure Systems Platforms and
Technologies Group

Featured Use Case:

Autonomous Vehicle Development

Working closely with NVIDIA, SwiftStack powers the data factories and Deep Neural Network (DNN) training of leaders in autonomous vehicle development.

Workflow

Autonomous Vehicle Workflow Graphic
  • Survey cars generate terabytes of data per day, which is quickly transferred to SwiftStack for curation in the Data Factory.
  • Curated data from SwiftStack feeds thousands of GPUs for training, validation, and replay testing.
  • Real-world and simulated data is version-controlled and retained immutably for safety and traceability of DNN improvement over time.

By The Numbers

Training a single complex Deep Neural Network (DNN) will require at least three million images, and refining it to be production-ready will take countless more. Vehicles aimed at SAE autonomy level 2+ will need at least 10 DNNs, and level 5 will require 20 or more. Refer to the table below to see basic storage capacity requirements for this training.

  • 1 survey car, driving
  • 8 hours per day
  • 250 days per year
  • 5 cameras
  • 30 frames per second
  • 2,000 hours
  • 2 megapixels ( 2 MB/image )
  • 1 billion images
  • 33% useful images after data factory cleansing
  • 2,000 hours per year
  • 1 billion images per year
  • 2 PB raw data per year
  • 300 million training images
  • 660 TB training data

Architecture

AV Architecture Diagram
  • The SwiftStack AI Architecture is an engineered and customer-proven edge-to-core-to-cloud storage and data management solution.
  • Core architectural capabilities like ultra-scale performance, distributed flash-based cache, data immutability, standards-based APIs, and workflow integration make deep learning at scale possible.
  • SwiftStack ingests, retains, and provides data to thousands of GPUs working in parallel—”feeding the beast” to train and retrain the deep neural networks necessary for AV.
AV Architecture Diagram

Helping the AI Team

Leveraging AI successfully is a company-wide effort. We believe SwiftStack’s technology and experience can help many of the team members involved.

Executives & Business Leaders   know that data is valuable and want to use it for business insight or competitive advantage, but they are rightfully cautious about investing heavily into unproven technologies or strategies to do so.
How can SwiftStack help? SwiftStack is trusted to manage hundreds of petabytes in the world’s biggest companies—like Verizon, Blizzard, eBay, NVIDIA, Cisco, and AstraZeneca. In AI and Autonomous Vehicles in particular, NVIDIA chose SwiftStack as a core storage platform. And when it’s needed, our award-winning support ranks a perfect 5 out of 5 according to Gartner Peer Insights. If you are well down the path toward AI or new to it but already seeing some value from Big Data (e.g., Hadoop), Operational Intelligence (e.g., Splunk), or Business Intelligence, we would love to share some of our experience in Artificial Intelligence. We believe we offer a unique combination of performance, flexibility, and TCO that may be helpful to your business.
Data Scientists & Engineers   face the challenge of capturing, creating, and/or using piles of raw data to generate something of value—maybe some kind of business insight or threat detection, an intelligent prediction algorithm, or even complex DNNs for self-driving cars, but progress slows when tools and infrastructure can’t keep up.
How can SwiftStack help? Only SwiftStack’s shared-nothing architecture was designed from its inception to scale linearly to infinite levels of throughput, concurrency, and capacity, which means only SwiftStack can ingest, retain, and feed training data fast enough to fully utilize thousands of parallel GPUs working to produce today’s most intelligent DNNs and ML algorithms. If you are using GPUs or pressing for an investment in them, you also need storage that can feed data fast enough to utilize them efficiently and can retain your training data immutably for replay and regression testing down the road. Starting with flash or the public cloud is reasonable, but both will prove to be overly expensive at scale and will eat into your budget for additional development tools and compute horsepower. We believe we offer a unique combination of performance, flexibility, and TCO that will be a perfect fit for your AI development.
IT & Storage Architects & Admins   face the challenge of providing the right tools to enable Data Science teams. They have to balance the cost, complexity, and flexibility of on-prem and cloud investments in storage, networking, and compute—both CPU and GPU, and they are rightfully careful to minimize risk in adopting new technologies.
How can SwiftStack help? Compared to incumbent technologies and alternatives, only SwiftStack offers maximum performance with an optimal TCO. SwiftStack is trusted to manage hundreds of petabytes in the world’s biggest companies—like Verizon, Blizzard, eBay, NVIDIA, Cisco, and AstraZeneca. And when it’s needed, our award-winning support ranks a perfect 5 out of 5 according to Gartner Peer Insights. Only SwiftStack’s shared-nothing architecture was designed from its inception to scale linearly to infinite levels of throughput, concurrency, and capacity, which means only SwiftStack can ingest, retain, and feed training data fast enough to fully utilize thousands of parallel GPUs working to produce today’s most intelligent DNNs and ML algorithms. Starting with flash or the public cloud is reasonable, but both will prove to be overly expensive and cumbersome to manage at scale, which will eat into your budget for additional development tools and compute horsepower. We believe we offer a unique combination of performance, flexibility, and TCO that will enable your data science teams today and scale with you tomorrow.

Solving Real-World Challenges

AI Data Challenges

Speed of Initial Data Ingest

A lot of data from the survey cars has to be moved and stored quickly.

Cost, Performance, and Scale in DNN Development

The “working set” for training data measures into the 10s of PB—which makes public-cloud egress costs prohibitive.

Thousands of GPUs need to be fed subsets of that data simultaneously to be leveraged effectively—which exceeds what alternative storage architectures can deliver.

Long-Term Data Retention, Traceability, and Immutability

Data captured by survey cars must be retained, organized, labeled, and accessible indefinitely to regression-test DNNs as they are developed, which directly impacts safety and confidence in the algorithms.

Infrastructure Complexity

Silos of storage are growing independently on edge devices, in core data centers using traditional flash and NAS systems, and in clouds as well, which makes it difficult for data scientists and GPUs to access it and IT teams to manage it.

SwiftStack Provides

Ultra-Scale Performance

Only SwiftStack’s shared-nothing distributed architecture was designed from its inception to scale linearly to infinite levels of throughput, concurrency, and capacity, which means only SwiftStack can ingest, retain, and feed training data fast enough to fully utilize thousands of parallel GPUs working to produce today’s most intelligent DNNs and ML algorithms.

Elasticity from Edge-to-Core-to-Cloud

Only SwiftStack’s 1space makes it possible for applications to ingest and access data anywhere—on edge devices, in core data centers, and/or in public cloud infrastructure. Policy-based data movement and the ability to merge data namespaces enables applications to move and scale beyond one cloud or one set of on-premises infrastructure, so they can be deployed with ultimate flexibility, and training isn’t slowed by relocating data.

Data Immutability

The unique combination of Versioning, Access Control, and SwiftStack’s Delete Protection means that data can be retained and referenced indefinitely as it was originally written—enabling traceability, accountability, confidence, and safety throughout the life of a DNN.

Optimal TCO

When compared to the cost of frequently accessed public cloud storage or PBs of on-premises flash, the TCO of SwiftStack software deployed on industry-standard hardware offers compelling savings—approximately ⅓ the cost of public cloud storage and 90% less than flash—which can be quickly invested in advancing top-level AI initiatives!

Real-World Confidence

In addition to dozens of large-scale customers each storing tens of petabytes on SwiftStack for other applications, notable AI deployments for autonomous vehicle development include (1) an industry-leading robo-taxi company seamlessly scaling DNN training and storage across Amazon, Google, and on-premises infrastructure, (2) NVIDIA’s internal at-scale reference architecture for AV development—with proven throughput over 100 GBytes/sec from SwiftStack to GPUs, and (3) a Big-3 automaker developing DNNs for their future vehicles. In every case, SwiftStack “just works”—it deploys easily on common servers, scales linearly and infinitely, supports the industry-standard S3 API, and seamlessly merges on-premises and public-cloud storage to enable GPUs in any location.

Choosing the Right Storage Strategy

Many AI strategies start with public cloud resources or a few on-premises GPUs and some flash storage, but—as you prepare to scale—it pays to consider what storage strategy will optimize performance, cost, and flexibility.

We have about a PB of data—more than a PB of data—that is coming in per month and about 15 PB that is curated. For any large-scale company, this is not a huge amount of data, but what makes it tricky and hard is that all this 15 PB is the active training data set, so all of the internal teams are accessing it all of the time every day; every model that you’re working on is going to be using that data.

Senior Engineer, AI Infrastructure

Reference Stack: Storage for Deep Learning Development

Whether you are focused on autonomous vehicles or some other application of deep learning, the challenges are familiar: You need to create or capture huge amounts of training data, label it, retain it indefinitely, and use it to feed hundreds or thousands of GPUs. SwiftStack’s reference stack for AI is an industry-proven storage architecture for deep learning.

Many of the leading AI frameworks and toolsets—including TensorFlow—can now leverage public- and private-cloud data directly with the S3 API.

DNN training is accelerated with NVIDIA’s industry-leading GPUs in NVIDIA-branded DGXs or OEM systems like the Cisco C480ML or Dell DSS 8440.

SwiftStack software deploys easily on industry-standard servers from partners like Cisco, Dell, HPE, and Supermicro.

NVIDIA Preferred Storage Solution Advisor Partner

SwiftStack is proud to be an esteemed storage solution advisor partner in the NVIDIA Partner Network.

Nvidia

SwiftStack for AI, ML and DL

StorageSwiss
Briefing Note

SwiftStack Announces World’s First Multi-Cloud AI/ML Data Management Solution

Press Release

SwiftStack adds a disk object-based AI reference architecture

TechTarget
News
vision telescope graphic

SwiftStack’s Vision

SwiftStack’s assumption is that every organization is moving to a multi-cloud reality. We have a vision that data should be easily managed across clouds and made available to applications regardless of where they are executing. This aligns perfectly with the trends we see driving the advancement of AI workflows in the industry.

Container- and microservice-based applications are changing the landscape, as they are deploying transparently using on-premises GPU resources and/or public cloud GPU farms. Meanwhile, the scale of capacity and throughput required to fully utilize these GPU resources is growing well beyond what traditional architectures can provide. The answer can only be a cloud-native data-management and storage architecture designed to scale infinitely and span both edge and on-premises infrastructure and multiple public clouds.

Sincerely,

SwiftStack AI Team Signature
vision telescope graphic

Visionieers

jon photo jon photo

Jon Kelly

Director of AI Architecture

jkelly@swiftstack.com

Linkedin logo
shailesh photo shailesh photo

Shailesh Manjrekar

Head of AI Products and Solutions

smanjrekar@swiftstack.com

Linkedin logo
chris photo chris photo

Chris Nelson

VP, Solutions at SwiftStack Inc.

cnelson@swiftstack.com

Linkedin logo

Latest AI Blog Posts

Do I need All-Flash Storage for AI/DL? – Part 2

Blog Post

Do I need All-Flash Storage for AI/DL? – Part 2

Architecting large-scale Artificial Intelligence/Deep Learning data pipelines In part 1 of this series...

Do I need All-Flash Storage for AI/DL? – Part 1

Blog Post

Do I need All-Flash Storage for AI/DL? – Part 1

Architecting large-scale Artificial Intelligence/Deep Learning data pipelines Businesses adopting AI/DL data pipelines for Computer vision,...

Tutorial: Loading Data into TensorFlow via S3 API

Blog Post

Tutorial: Loading Data into TensorFlow via S3 API

When I was looking for code examples of loading data into TensorFlow via the S3 API, the results were pretty sparse...

Resources

document icon

Solution Brief

Multi-Cloud Data Management Solution for AI/ML

Download >
document icon

IDC Market Note

SwiftStack Stakes Its Play in the AI/ML Market

Download >
document icon

Cisco Solution Brief

Accelerate Deep Learning with an Edge to Core to Cloud Data Management Solution

Download >
document icon

Article

Looking at AI/ML Edge-Core-Cloud Workflows

Learn More >
document icon

News

Experts from Google, T-Mobile and other tech frontiers weigh in on the future of AI

Learn More >
video icon

Webinar

SwiftStack Data Analytics Solution with Alluxio

Watch Now >
document icon

Briefing Note

StorageSwiss Briefing Note: SwiftStack for AI, ML and DL

Learn More >
document icon

Data Sheet

Multi-Cloud Data Management for at-scale AI/ML

Download >
document icon

News

SwiftStack adds a disk object-based AI reference architecture

Learn More >
video icon

Demo Video

Valohai Tensorflow demo on MNIST leveraging SwifStack Data Management layer

Watch Now >
document icon

Press Release

SwiftStack Announces World’s First Multi-Cloud AI/ML Data Management Solution

Learn More >