
Ordering Rules for FI_MSG and FI_RDM Endpoints
Dive into the concept of ordering in libfabric endpoints, exploring the need for simple rules to ensure performance and correctness. Learn about the various aspects of ordering, from expectations of API consumers to concise ordering rules in libfabric, and the specifics of FI_MSG and FI_RDM endpoints. Discover how libfabric handles ordering guarantees through operations posted to queue pairs in this insightful exploration.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Request ordering for FI_MSG and FI_RDM endpoints 29 April 14
Something needed so consumers of libfabric stay sane A few type of endpoints with simple ordering rules that are reasonably easy to understand The ordering rules should allow for sufficient flexibility so that different providers can provide maximum performance while also insuring program correctness 2
Background what is ordering? Ordering is an end-to-end concept and may include some or all of the following: Expectations of the API consumer w.r.t. the order of execution of operations posted to the fabric provider Execution of operations as expressed on the wire Ordering of information as packets/flits/msgs cross the wire Ordering of inbound operations (both inbound requests and inbound RDMA operations) Order in which inbound data is placed on the memory bus Order in which inbound data is written to memory by the memory controller Expectations of the API consumer w.r.t. the order in which operations are completed/notified 3
IB: The concise ordering rules Operations on the SEND queue are transmitted on the wire in order. Operations at the RESPONDER side are executed in the order received. A SEND or RDMA WRITE may be executed before an RDMA READ! Operations on the SEND queue are completed in the order in which they were posted. 3 SEND responder 2 RDMA RD 1 SEND SEND 3 RDMA RD SEND 1 ??? ACK 1 ACK 3 READ DATA www.openfa brics.org 6/14/2011 4
Whats man l man/fi_getinfo.3 have now? FI_MSG - Provides reliable, in-order message based communication, with data transfers maintaining message boundaries. Hmmm. Okay, so if you bought an adaptive network, you wasted your money. FI_RDM - Provides reliable datagram communication without ordering guarantees. Hmmm. Okay, does PSM really work this way? MPI can t use this mode easily. 5
Its worse than that IB provides ordering only on operations posted to a given QP. The QP construct binds together operations of different types in order to provide ordering guarantees between different operations, e.g. between message and RDMA operations. How is that accomplished using the fabric interfaces? 6
What would be nicer FI_MSG - Provides reliable, message based communication, with data transfers maintaining message boundaries. Messages are ordered by default, with relaxed order being optionally supported on a per message basis. FI_RDM - Provides reliable datagram communication. By default, ordering is not guaranteed, although for datagrams targeting a given network endpoint, a sequence of datagrams can be specified as an ordered sequence. 7
Using relaxed order in MPI rendezvous example No ordering dependency No ordering dependency SencCmp1 responder WR1 SndCmp0 WR0 Sndcmp0 Wr0 SndCmp1 Wr1 ordering required ordering required www.openfa brics.org 6/14/2011 8
PCI-e Transaction order rules (for given TC, src, target) Producer/consumer model first op Posted Request Non-posted read request Non-posted AMO req. Row Pass Column? second op Posted Request (RDMA write) a) no b) y/n yes yes Non-posted read request a) no b) y/n y/n y/n Non-posted AMO request. a) no b) y/n y/n y/n a) Strict producer/consumer model i.e. strict order b) I m feeling lucky and have set RO bit in second request Requesting relaxed order doesn t mean you ll get it, hence y/n. If you re a HW designer, don t count on relaxed order. 9
IB Transaction ordering rules (RC) first op Send/ RDMA Read Non-posted AMO req. Row Pass Column? RDMA Write second op a)yes b) no a)yes b) no Send/RDMA Write no RDMA Read no no no Atomic Op. no a) yes b) no a)yes b) no a) No ordering guarantee don t count on order if you want correctness b) I need order and have set the fence bit in second transaction Strict producer-consumer model March 30 April 2, 2014 #OFADevWorkshop 10
Libfabric FI_MSG now depending on interpretation of fi_getinfo.3 first op Non- posted AMO req. Row Pass Column? RDMA write RDMA Read Send Send no no no no second op RDMA Write no no no no RDMA Read no no no no Atomic Op. no no no no March 30 April 2, 2014 #OFADevWorkshop 11
Libfabric FI_MSG with optional relaxed order proposal first op Non-posted AMO req. RDMA write RDMA Read Row Pass Column? Send second op a) no b) y/n a) no b) y/n a) yes c) no a) yes c) no Send a) no b) y/n a) no b) y/n a) yes c) no a)yes c) no RDMA Write a) no b) y/n a) no b) y/n a) no b) y/n a) no b) y/n RDMA Read a) no b) y/n a) no b) y/n a) yes c) no a) yes c) no Atomic Op. a) IB RC like behavior (default) b) Relaxed order bit set in flag (SendMsg, etc.) c) Fence bit set in flag Provider free to ignore b), must observe c) 12
Ordering bits for FI_MSG Add new flag bit for sendmsg/writemsg - FI_RELAXED_ORDER - If this bit is set, MSG, RMA ,or AMO operation may be completed ahead of pending MSG, RMA, or AMO ops in the EP s send queue Messages may appear to complete out of order when this bit is set. Add new flag bit for sendmsg/writemsg - FI_FENCE_GLOBAL - If this bit is set, this operations posted to the EP will not be initiated till all previously posted MSG, RMA, AMO ops to the EP have completed globally fi_ep_sync sounds blocking, this is a potentially non- blocking way to do a fence March 30 April 2, 2014 #OFADevWorkshop 13
Libfabric FI_RDM now depending on interpretation of fi_getinfo.3 first op Non-posted AMO req. RDMA write RDMA Read Row Pass Column? Send Send yes yes yes yes second op RDMA Write yes yes yes yes RDMA Read yes yes yes yes Atomic Op. yes yes yes yes Not enough order? fi_ep_sync seems kind of heavy weight. 14
Libfabric FI_RDM suggestion HyperTransport ordered sequences first op Non-posted AMO req. RDMA write RDMA Read Row Pass Column? Send a) yes b) no a) yes b) no a) yes b) no a) yes b) no Send second op a) yes b) no a) yes b) no a) yes b) no a) yes b) no RDMA Write a) yes b) no a) yes b) no a) yes b) no a) yes b) no RDMA Read a)yes b) no a) yes b) no a) yes b) no a) yes b) no Atomic Op. a) default, as in man page. App must use fi_ep_sync for ordering. b) If second op has the same order sequence. Ops must be back-to- back. 15
Ordering bits for FI_RDM Add new flag bit for sendmsg/writemsg - FI_ORDERED_SEQ - If this bit is set, message or rma or amo operation is treated as part of an ordered sequence. The sequence number is specified in the flow field of the fi_msg, etc. argument Msg, RMA, and AMO requests within an ordered sequence must be posted sequentially to a given endpoint, with intervening requests that are not part of the ordered sequence. The operations must all target the same target address. Some providers may be able to do this efficiently, otherwise the behavior is as if fi_ep_sync were invoked internally between each operation. 16