Leading businesses and organizations are transforming their industries by being more and more data driven. Over the past few decades, software has changed the world we live in. Now, data is changing the world.
Data is not just being stored at unprecedented scale, but data utilization is increasing dramatically with new ways to extract value from that data, like advanced analytics and deep learning. And these modern workflows are expanding way beyond a single data center or cloud, as we are seeing workflows span from the edge to the core and up into the cloud.
We are seeing this happen today in industries such as automobile manufacturing, personalized medicine, and transportation for uses like autonomous vehicle development, compound drug discovery, and intelligent routing of trucks and planes.
These new demands on data require a new data storage and management architecture. These workflows need an architecture where:
- The control and data planes are separated, where data management and control of the platform is independent of data movement across the infrastructure and within client applications.
- All data infrastructure has a true scale-out design to reach unseen levels of not just capacity, but also performance.
- Data is accessible in a single addressable namespace so all applications and users can extract value from the data that is being retained.
- Data services such as metadata indexing, search, security, and analytics are distributed across the platform.
For example, let’s quickly look at the demands on the data platform when developing and training deep neural networks (DNNs) for vehicles that have level 2+ autonomy (think Tesla Autopilot or GM Super Cruise).
This use case will quickly create a hot data set of 15PBs of labeled image and sensor data. To keep the GPU compute complex busy when training these DNNs, a subset of the data needs to be delivered at over 100GB/s, while the platform is handling requests from thousands of workers.
With a hot dataset growing to multi-petabyte levels, it makes it very challenging to take a one-size-fits-all approach to data storage and management. For example, if you use a all-around high-performance storage system like an all-flash array, its advantages quickly break down when the dataset has to be spread across many silos.
As an alternative, you can use the ultra-high throughput performance of SwiftStack to feed a thin layer of flash living inside the GPU servers to meet the overall demands of the workflow, while always keeping data in a single namespace. This model satisfies performance requirements and offers a compelling TCO.
What’s new in SwiftStack 7
Over the past year, we have been developing the new capabilities of SwiftStack 7 so it can be a platform for intelligent data and to enable advanced analytics and deep learning workflows at ultra-scale. Let me quickly walk through these new features and I will go into more details in future articles and demo videos.
Ultra-scale Performance Architecture
In customer production environments, SwiftStack is driving previously unseen throughput performance at massive multi-petabyte scale while handling thousands of simultaneous workers accessing data. Also, independent third-party testing on similar hardware and workloads validates that SwiftStack 7 is multiple times faster than competitors, realizing extremely high production throughput speeds at over 100GB per second.
The test environment used for this performance analysis was smaller than the production environments, but did show linear scale.
The simple, open architecture allows linear scaling of both performance and capacity well beyond traditional storage technologies. In addition to the architecture being designed for scale of unstructured data uses from day 1, the team at SwiftStack had put a lot of work into ensuring the platform behaves properly under the heavy loads of these new workloads.
ProxyFS Edge extends SwiftStack’s file services to be distributed at the edge, close to the application, for high-throughput, data-intensive use cases. The containerized ProxyFS agent at the edge provides caching capabilities to minimize latency for file-native applications while providing massive throughput to large datasets stored at the core.
ProxyFS Edge is an extension of the existing ProxyFS clustered filesystem that allows for file and object API access to the same data sets.
1space File Connector
The newest component of SwiftStack 1space, the 1space File Connector, brings existing enterprise file data into the cloud namespace so cloud-native applications can access data via S3 or Swift object APIs without complex migration. The containerized 1space File Connector interacts with network-attached storage (NAS) and can scale out to provide high-performance access of files to modern applications.
The 1space File Connector allows businesses and organizations to modernize their workflows incrementally and apply 1space data placement and protection policies (copy, move, tier) to file data when needed.
SwiftStack 7 will be released in November, and will contain the ultra-scale performance enhancements and the 1space File Connector. ProxyFS Edge is currently in beta and will be generally available in early 2020.
If you are interested to learn more, I recommend taking a look at the SwiftStack Platform web pages and if you would like to try SwiftStack, we have made it freely available to do so. Also, please feel free to reach out to any of us on the SwiftStack team at any time.