Flash Based AFM Caches Optimization

flash based afm caches in compute environments n.w
1 / 4
Embed
Share

Explore the potential of flash-based AFM caches in analytic compute environments to enhance data access efficiency. Learn about opportunities and cautions for utilizing AFM caches to accelerate workflows in various industries. Discover strategies to maximize performance and overcome limitations in storage solutions.

  • Flash AFM
  • Data Optimization
  • Analytic Ecosystems
  • AFM Caches
  • Performance Enhancement

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Flash based AFM caches in compute environments Highlight: SpectrumScale in analytic ecosystems

  2. Introduction This presentation is to focus on how flash based AFM caches can play an integral part in analytic compute environments to optimize access to common data sets. Assumes that everyone in the audience knows what AFM is, what its modes are, the benefits for each of its different modes. Assumes that not everyone is aware of limitations that AFM has. Assumes that the potential customer cannot identify a workload that needs to be elevated to a higher performing pool of storage by using the ILM. This is not to compare to Burst Buffers (IME, Datawarp, L2ARC, etc) This presentation is not focused on Disaster Recovery or Business Continuity

  3. The flash based AFM Cache opportunity AFM using Local Update or Single writer mode as a flash based cache to an AFM home enables transparent acceleration for large datasets by allowing them to operate out of flash storage media. Insulates the AFM Home from transient data workloads. Keeping that media less fragmented over time. AFM RO Cache - Flash media is fast (duh); you could possibly configure localized flash as an AFM cache adjacent to a compute cluster to provide it the highest levels of performance to input datasets (using other storage as scratch). Rack adjacent flash SAN use case, assuming it was $$ competitive. Accelerates the following workflows tremendously: Media: Video transcoding, nDVR/cDVR applications FSI: SAS GRID, Informatica, backtesting (up to monthly data, depending on how large the cache is). Special note that Kx Kdb+ uses mmap, but benefits heavily by flash if your dataset can fit into it. Life Sciences: Processing sequenced data, using AFM home for long term archive

  4. The flash based AFM Cache cautions / lessons learned In general about flash, it has garbage collection. When it kicks on, you know it and feel it in performance of the machine. You must be very careful of write amplification. Align the AFM cache block size with the FSW of the underlying storage. Underlying storage needs to have good flash alignment. Using AFM RO mode could help to avoid this. Since you aren t modifying the input data due to it being RO. Some workloads do not benefit from flash. Single-Shared-File with smaller IO sizes does not work well in ANY PFS. In these modes, you need to use an alternate solution. One work-around by using Protocol node for these workloads. Tuning for the PN server should be aggressive These nodes should have very optimal network connectivity. Keeping in mind that 1 port EDR IB accessed via export over IPoIB may be orders of magnitude higher performance than a cluster of 32 NSD Servers dual-port RDMA using GPFS. This can be attributed by the number of clients accessing a common set of data, particularly when IO sizes are small. Only one of the PN Servers should be mounted for these bad workflows at a time and that the IP should failover between the PN s to provide HA. These bad workflows are common, single shared file or small IO in many large shared files. Note for this option is that LROC can further assist in the performance improvements for these workflows.

Related


More Related Content