SHARE   

According to Amita Potnis, IDC Research Director, Platform and Technologies group, “Infrastructure challenges are the primary inhibitor for broader adoption of AI/ML workflows. SwiftStack’s multi-cloud data management solution is the first of its kind in the industry and effectively handles storage I/O challenges faced by edge-to-core-to-cloud, large-scale AI/ML data pipelines.”

Over the last several months, SwiftStack has been busy helping two large autonomous vehicle (AV) customers. These AV customer’s data pipelines are distributed across edge (vehicle sensors) to core (data center) to multi-cloud locations, and are challenged with ingest, labeling, training, inferencing, and retaining data at large scale.

SwiftStack understands that building Software-defined cars/AVs is hard. Every neural net in the software stack needs to handle thousands of conditions and geolocations. Several aspects become important for a successful AV platform, however on the forefront are the following considerations:

  • Safety is a non-compromisable primary objective and needs models tested on huge datasets to be confident, as well as iterate faster, in producing well-tested models.
  • Collecting enormous amounts of data under innumerable scenarios is key to building good AV models. This data needs to be labeled, accessible, and managed over its lifecycle.
  • Accuracy and predictability of trained models is only as good as the diversity of the data available to them.
  • Simulation and re-simulation become extremely important as physically driving billion of miles is not practical. Hardware-in-loop systems need access to large data sets to simulate several extreme scenarios as well as for regression testing.
  • Inferencing running on the AV vehicle — the edge — is typically limited in its hardware capabilities. Finding the right model without losing performance needs access to tons of data.
  • Reproducibility is important, with proper version control of data sets, models, and experimentation, to understand why a model behaved a certain way.

Data, and access to large data sets, is the cornerstone of building a production-grade AV platform. One of the AV customers we are working with has a scale of 15 PB active dataset, with 100 GB/s of throughput feeding 4000 GPU’s at its initial launch, and is scaling to 3200 nodes and a data set of hundreds of petabytes. 1500 labelers are involved, labeling 20M objects per month, feeding 20 DNN’s. The solution also uses 1 PB of DGX-1 in-rack object cache.

SwiftStack is uniquely positioned to handle these challenges and provides a differentiated solution for each stage of the AV data pipelines. Here’s how:

  • SwiftStack accelerates AV data pipelines – Due to its massive parallelism, SwiftStack is able to ingest huge volumes of data accumulated by radar, LIDAR, and computer vision sensors. SwiftStack is also able to handle massive neural net training throughput needed by several hundred GPU cores in compute servers. The containerized frameworks (Tensorflow, MXnet, and others) are orchestrated with Kubernetes integration.
  • SwiftStack 1space brings cloud and AI to your data – SwiftStack’s 1space connector provides single global namespace. Multi-cloud data management tools enable cloud bursting, if and when more economical GPU cycles are available in the cloud. SwiftStack leverages version control to enable reproducibility of models, data and experimentation. SwiftStack works in tandem with pipeline management systems to adhere to GDPR specifications for these regulated use cases.
  • SwiftStack enables labeling and selection of right datasets – SwiftStack’s sophisticated labeling middleware can enrich the data during ingest, enabling quick metadata search and supervised learning workflows.
  • SwiftStack abstracts access to data whether you are using file or object APIs – SwiftStack ensures seamless universal access to a global namespace using file or object APIs, and with near-infinite scale, so the data scientist, CDO, or CAO can focus on the AI/ML strategy instead of the infrastructure.
  • SwiftStack provides the right economics to build production-ready, multi-petabyte AV data platform. SwiftStack provides industry-best storage efficiency with the choice of replicas or erasure coding. SwiftStack is sold via SaaS subscription pricing, which allows you to start small and grow to PB scale. Our core Swift engine is built on an open-source model, helping you benefit from community innovation.
  • SwiftStack is working with other best-of-breed partners in the AI/ML ecosystem. Providing turnkey solutions makes it easier for customers like ours to adopt multi-cloud workflows and frameworks, and scale to petabytes of storage and hundreds of gigabytes of bandwidth to meet their unique requirements.

SwiftStack’s Head of AI/ML Product and Solutions Marketing will speak about this use case at the upcoming Global AI Conference in Seattle, http://www.globalbigdataconference.com/seattle/global-artificial-intelligence-conference-115/speaker-details/shailesh-manjrekar-73314.html as well as moderating a keynote panel on Future of AI and importance of data https://www.linkedin.com/feed/update/urn:li:activity:6523988917295947776. We look forward to offering attendees more insights on this topic.

About Author

Shailesh Manjrekar

Shailesh Manjrekar

Shailesh has deep experience in infrastructure across storage and networking (EMC, NetApp, HGST, Brocade). As a thought leader on the how AI/ML/DL impacts infrastructure, Shailesh has worked on changes needed in the datacenter, the edge and the cloud. Shailesh serves as the Head of Product and Solutions Marketing for SwiftStack.