Based on the availability of SmartStore, Splunk’s optimized data management model in Splunk Enterprise 7.2 and Splunk Enterprise 7.3, Cisco and SwiftStack embarked on an effort to create a reference stack for the compute, storage, and networking infrastructure required to run Splunk at scale. The resulting document, a Cisco Validated Design (or “CVD”), is a comprehensive implementation and best practices guide geared to IT Architects and Administrators.
At over 200 pages (including screenshots and tables), this CVD covers the nuts and bolts of deploying, tuning, and scaling the infrastructure for Splunk Enterprise. However, the document is more of a reference manual than a start-to-finish read, so this blog post creates a “highlight reel” with reference links to the supporting material in the CVD. Reviewing these highlights will take you on a journey of learning what SmartStore is, how it works, and how it performs.
Here are the top 5 takeaways from the Cisco & SwiftStack reference design for Splunk Enterprise with SmartStore:
1. A Scale-Out Architecture at every Tier
At its core, SmartStore enables the decoupling of the compute and storage tiers in a Splunk Enterprise environment. Now, compute resources can be sized to the ingest load placed on the Splunk indexers and storage can be allocated according to the ingest rate and data retention period. The goals of SmartStore are to reduce hardware costs, minimize management complexity, and allow the infrastructure behind Splunk Enterprise to scale efficiently as Splunk usage grows.
Here are the tiers of the Cisco Validated Design for Splunk Enterprise with SmartStore:
- Compute: Cisco HyperFlex with All Flash Nodes. The Splunk indexers run in VM’s which are built on Cisco HyperFlex, a hyperconverged system. The system uses Flash storage for ingesting data into Hot buckets and to form the cache for the Warm buckets.
→ Scale Splunk compute by adding more Cisco HyperFlex Nodes
- Storage: SwiftStack & Cisco UCS S3260. Using high-capacity disk drives in Cisco UCS S3260 storage servers, this scale-out storage tier holds all Warm data from the time it is created until the data retention period expires. With SmartStore, 80-90% of all Splunk data should reside on this tier. Also of note, SmartStore interfaces with the scale-out storage through the S3 API, the de-facto standard for cloud-native data access.
→ Scale Splunk storage by adding more SwiftStack/UCS nodes
Cisco HyperFlex and SwiftStack on Cisco UCS storage underpin a Splunk Enterprise infrastructure that can scale (and scale more) as ingest rates increase and data retention periods lengthen. More details about how the CVD configuration is specified and sized and how Splunk Enterprise with SmartStore uses compute and storage resources can be found here. Table 17 is particularly instructive for scaling the infrastructure as Splunk usage grows.
2. An End-to-End Cisco Infrastructure for Splunk Enterprise
The Cisco and SwiftStack engineers who conducted the testing and authored the CVD geared the level of detail to people like themselves – those who are responsible for making things work in the data center. Due to SmartStore’s potential, they wanted to show – with comprehensive supporting documentation – how this two-tier architecture for Splunk Enterprise is implemented and optimized.
Every step of the deployment process for Cisco HyperFlex, Cisco UCS, and SwiftStack can be found here. The end result is a compute/networking/storage stack that is ready for a Splunk workload. Configuring the major elements of Splunk Enterprise, including indexers, search heads, and the SmartStore feature, is documented here.
3. Functional Testing Reveals More Ways to Protect Splunk Data
The functional tests illustrate several of the advancements SmartStore brings to traditional Splunk Enterprise operations. One of the key changes relates to data protection. With SmartStore, Splunk replication now applies only to the Hot buckets and the Warm bucket cache, both of which are located in local indexer storage. For the Warm buckets which are retained for long-term use on the scale-out tier (ie, SwiftStack on Cisco UCS), the underlying storage system is responsible for the data resiliency. Since 80% to 90% of all Splunk data is stored on this tier when SmartStore is activated, less raw disk storage is needed since space-efficient data protection techniques – like Erasure Coding – are employed.
Starting with an ingest rate of 1TB per day and scaling higher from there, the functional test results are found here.
4. Validation Testing Unpacks Spunk’s Hot-to-Warm Bucket Roll with SmartStore
How SmartStore moves data between buckets inside Splunk Enterprise may be the most important characteristic of the feature. Hot buckets and cached Warm buckets live in indexer storage (ie, Cisco HyperFlex). However, when a bucket rolls from Hot to Warm, a copy of the bucket is also written to scale-out storage (ie, SwiftStack on Cisco UCS).
Based on search patterns and cache size, SmartStore’s cache manager will eventually evict the Warm bucket from the cache, such that the data will now only reside on the scale-out storage tier, thus becoming the master copy. When an indexer needs to search a Warm bucket for which it doesn’t have a cached copy, the cache manager will retrieve the data from scale-out storage and bring it into the local cache.
When a thin layer of low-latency storage (higher $/TB) can work seamlessly with a much larger tier of scale-out storage (lower $/TB), the infrastructure can be constructed more efficiently. This is how costs are lowered and management complexity is reduced with Splunk SmartStore.
This section covers how Splunk Hot-to-Warm bucket rolls in a SmartStore setup were tested and validated.
5. Search Performance with SmartStore Remains High, even as Costs get Lower
The most common response we receive after describing SmartStore is “the savings and efficiencies sound great, but what about search performance?” The CVD project would not be complete without fully exploring this topic. To the user, Splunk searches happen as they always have. The pivotal factor is understanding how a SmartStore architecture affects search performance, if at all.
We structured a series of scenarios, from an optimal case – all of the data is located on low-latency storage (in our tests, on SSD) – to a non-optimal case – all of the data resides on scale-out storage (in our tests, on high-capacity hard drives). Further, the search itself was demanding and resource-intensive: 260 million events over a 4-day period.
Was there a performance impact when searches were executed against SmartStore-enabled indexes where data had to be retrieved from scale-out storage? Yes, but slight – much less than many have theorized. The intelligence of Splunk SmartStore’s caching engine combined with the optimized infrastructure stack from Cisco and SwiftStack kept search performance very reasonable, even in the most taxing of test cases.
The performance testing methodology and results are provided here. Be sure to examine the table, Summary of the Cache Testing Results, for a holistic view of search performance with SmartStore.
Through our engagement with Splunk Enterprise users, we’ve heard a common theme: they love the Splunk tool and want to do more with it. Many desire to keep Splunk data much longer than they’re doing today, to gain insights through deep searches and to meet compliance requirements. The SmartStore feature is a game-changer for these organizations, as they can scale Splunk in line with business and operational needs without breaking the infrastructure budget and overburdening IT.
With this CVD, Cisco and SwiftStack have provided the blueprint for turning SmartStore’s considerable promise into deployment reality.
If, after reviewing this blog and the referenced sections of the CVD, there are further questions, please reach out so we can set up a conversation with a Splunk technical expert.