
High-Performance Shared Memory Communication Solutions
Explore cutting-edge shared memory (SHM) support options, integrated transparency levels, proposed SHM architecture, primitive message exchange mechanisms, and examples for small, medium, and large message handling. Enhance your understanding of SHM primitives and how they enable efficient communication between processes.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
OFI Shared Memory OFIWG
SHM Support Options SHM provider Utility provider using SHM primitives Separate SHM domain Integrated SHM support Native provider using SHM primitives Provider is using 2 protocols 2 www.openfabrics.org
Integrated SHM support Level of transparency Automatically use Can explicitly disable Enabled via interface fi_control, cap, flag, protocol bit, Automatically fall back? Selectable per operation? 3 www.openfabrics.org
Proposed SHM Architecture Primitives SHM structures No protocol Utilities Protocol to implement libfabric interfaces Msg, tagged, RMA, atomics Coordinate with other utility protocols Provider Use utilities or primitives with own protocol Allow for single protocol engine and coordination with provider HW Provider SHM Utilities SHM Primitives 4 www.openfabrics.org
SHM Primitives Support simple message exchange between two (or more) processes Control block Used to setup communication Rx command/control queues Small, fixed sized entries (~64 B) Control used for ACKs Tx inject buffers Pool of small buffers for msg data Control Block Rx Command Queue Rx Control Queue Tx Inject Buffers 5 www.openfabrics.org
Small Message Example Tx side writes Rx entry Rx entry = msg header + msg data Only very small messages fit into Rx entry Rx side decodes header and processes msg Data is retrieved directly from Rx entry Tx CMD Rx Command Queue 6 www.openfabrics.org
Medium Message Example Tx side writes data into Tx inject buffer Tx side writes msg header to Rx entry Rx side decodes header and processes msg Data is retrieved from Tx inject buffer Tx CMD Inject Buffer Rx Command Queue 7 www.openfabrics.org
Large Message Example Tx side writes msg header to Rx entry Rx side decodes header and processes msg Data is pulled from Tx process using CMA Rx side writes ACK msg back to Tx side Tx CMD CMA Buffer Rx Command Queue 8 www.openfabrics.org
Completion Handling For small to medium sized messages Tx may complete after updating Rx queue Requires connection shutdown coordination to ensure Rx side has processed all operations For large messages or delivery complete semantics Tx does not complete until it has processed an ACK from the Rx side 9 www.openfabrics.org
Design Discussion SHM provider Basis for endpoint addresses as stand-alone provider Can use stringified address as integrated provider Do we handle application crashes? Potential for deadlock if resources aren t released Use of CPU specific instructions for sync? Likely available on any platform 10 www.openfabrics.org
Design Discussion Number of Rx command queues Allocate one per peer Simpler synchronization Could require thousands of Rx queues per node Requires polling across multiple queues One per process Need IPC synchronization semaphore or CPU instructions Potential for starvation 11 www.openfabrics.org
Portability SHM is disabled on non-linux platforms (no support for CMA) SHM can be extended later to avoid using CMA Add bounce buffer pool New command indicating use of bounce buffer pool Protocol extended to handle partial transfers with resume capability 12 www.openfabrics.org
Implementation Details SHM Primitives tx_inject_bufs mtu mem_limit Buffers are aligned on MTU util_buf_pool Inject buffers mmapped by peers tx_cmd_ctx datatype queue_size Entries aligned on sizeof(datatype) freeque rx_cmd_queue / rx_ctrl_queue datatype queue_size Rx queues mmapped by peers cirque 13 www.openfabrics.org
Implementation Details SHM Primitives ctrl_block protocol_version pid lock_var total_mem_size Use CPU compare-swap instructions for synchronization cmd_queue_offset ctrl_queue_offset peer_size peer_ctrl[] 14 www.openfabrics.org
Implementation Details SHM Primitives shm_class create(shm_class, struct shm_attr *attr) setname(shm_class, char *name) connect(shm_class, char *name, int *id) close(shm_class) Access primitives buffer pool, cirque, etc. for data transfers 15 www.openfabrics.org
Implementation Details SHM Utilities ctrl inline inject iov ext_iov ack cmd u32 u32 u64 u16 u16 u8 u8 op msg tagged rma atomic ctrl req_id flags data[2] op ctrl rx_id count ext_iov uses > 1 cmd slot used with sub- commands u16 union { u64 char* tx_data[] u64 buf struct iov[] struct ioc[] } size status ctrl selects Tx buffering scheme Rx invokes processing as: size = 64B with 2 IOVs function_table[cmd.ops][cmd.ctrl](cmd) 16 www.openfabrics.org
Implementation Details SHM Utilities rma_cmd struct atomic_cmd datatype data[1]:63-32 op data[1]:31-0 struct rma_ioc[] rma_iov[] RMA and atomic commands consume 2 cmd slots tag_cmd tag data[1] 17 www.openfabrics.org