
Kernel Fabric Interface Direction Check with Linux Kernel Maintainers
"Explore the Kernel Fabric Interface direction check with Linux Kernel Maintainers in the OFA OpenFabrics Interfaces Project Data Storage subgroup. Understand the objectives, positioning in the Linux kernel, the need for a new kernel mode API, current state of development, and next steps."
Uploaded on | 1 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
kfi - Kernel Fabric Interface Direction Check with Linux Kernel Maintainers OFA OpenFabrics Interfaces Project Data Storage/Data Access subgroup December 2015
Agenda Objective is a direction check Background kfi mission and objectives Positioning kfi in the Linux kernel Why a new kernel mode API? State of development today Github repo, etc Conclusions, next steps 2 www.openfabrics.org
Objective for today Describe the approach being taken by the DS/DA subgroup Feedback/direction check - Is this an acceptable direction that merits further development? 3 www.openfabrics.org
Background OpenFabrics Interfaces project (OFI) created by OFA 8/2013 Charter is to develop, test and distribute: 1. An extensible, open source framework that provides access to high- performance fabric interfaces and services. 2. Extensible, open source interfaces aligned with ULP and application needs for high-performance fabric services In short, deliver I/O stack(s) that maximize application effectiveness OFI takes a consumer-centric view of the API Focus is on meeting the requirements of consumers of network services Thus, OFI is organized by classes of consumers (see next slide) OFI currently comprises two working groups: OFI WG user mode APIs for distributed and parallel computing Data Storage/Data Access WG user and kernel mode APIs for storage 4
kfabric, libfabric relationship kfabric is: kernel modules for storage and data access is not: kfabric is not the kernel component of libfabric libfabric a user-mode library for distributed and parallel computing libfabric access to kernel services is performed by the provider(s) using the provider s kernel drivers 5
Kfabric Mission Create network APIs to support kernel-based storage filesystems, object I/O, block storage Incorporate high performance storage interfaces Focus on emerging storage technologies e.g. NVM Transport independence, consumer portability Define an API which is not derived from a specific network technology Base the API on a higher level abstraction built on message passing semantics Emphasis on performance and scalability Minimize code paths to device functionality for performance Focus on optimizing critical code paths Eliminate code branches from critical paths wherever possible Smooth transition path to emerging fabrics and new use cases NVM as persistent memory NVM as persistent storage Independent of any particular network technology 6
Kfi objectives Network agnostic kfi presents a consumer-oriented abstraction of the network support for new networks does not require emulating an existing one device drivers are typically based on a specific network technology Support for emerging use cases - NVM NVM for storage and for memory access see upcoming slide Support for emerging fabrics Allows for innovation with new networks as they emerge Avoids dependencies on the underlying network Support for existing networks kfi is designed to run over existing network technologies 7 www.openfabrics.org
Positioning kfi in the Linux kernel Kfi is intended to be a peer to the kernel sockets stack Rationale Sockets is stream-oriented vs kfabric which is block oriented. Blocks can be conveyed over a stream, but its clumsy and requires things like markers, or the use of an unreliable protocol e.g. UDP Sockets does not include the notion of one-sided operations, Mechanisms (e.g. memory protection mechanisms) to support one-sided operations are absent Sockets tends to be synchronous if a given process has more than one operation in flight, the process itself has to ensure synchronization of the operation Kfabric contains a richer set of completion semantics 8 www.openfabrics.org
Why another kernel API? Kfi is intended to be an abstract I/O API as compared to existing kverbs which is a low level device driver model kverbs is based on a QP-based device; difficult to write to a non-QP-based device without emulating a QP kfi begins as a device agnostic, RMA-interface 9 www.openfabrics.org
Support for NVM (put some of the results of the NVM use case / access model case slide deck here) 10 www.openfabrics.org
KFI Framework KFI API KFI API KFI Providers Verbs Provider Sockets Provider New Providers** kernel Verbs Kernel Sockets Device Drivers NIC iWarp InfiniBand RoCE New Devices Red = new kernel components, ** = e.g. NVM
KFI API the details KFI interfaces form a cohesive set and not simply a union of disjoint interfaces. The interfaces are logically divided into two groups: control interfaces: operations that provide access to local communication resources. communication interfaces expose particular models of communication and fabric functionality, such as message queues, remote memory access, and atomic operations. Communication operations are associated with fabric endpoints. kfi applications typically use control interfaces to discover local capabilities and allocate resources. They then allocate and configure a communication endpoint to send and receive data, or perform other types of data transfers, with storage endpoints. 12 www.openfabrics.org
KFI API Consumer APIs kfi_getinfo() kfi_fabric() kfi_domain() kfi_endpoint() kfi_cq_open() kfi_ep_bind() kfi_listen() kfi_accept() kfi_connect() kfi_send() kfi_recv() kfi_read() kfi_write() kfi_cq_read() kfi_cq_sread() kfi_eq_read() kfi_eq_sread() kfi_close() - up to the consumer KFI API exports i/f s: - down to the provider Provider APIs kfi_provider_register() During kfi provider module load a call to kfi_provider_register() supplies the kfi api with a dispatch vector for kfi_* calls. kfi_provider_deregister() During kfi provider module unload/cleanup kfi_provider_deregister() destroys the kfi_* runtime linkage for the specific provider (ref counted). 13
GitHub, repo directory structure (insert Frank s slides) Who s the maintainer? Are we talking to the right kernel maintainer? Dave Miller? (networking) Doug Ledford? (RDMA) Find a place in the body of the presentation for the following concepts The objective is to promote RDMA independent of (above?) InfiniBand as an implementation RDMA. Really want a different programming paradigm a la sockets Continue to live within the drivers framework for now (baby step) 14 www.openfabrics.org
KFI Naming Repo naming net/kfabric/ or drivers/kfabric/ API naming kfi_*() Module naming Framework: kfabric.ko Providers: kfi_xxx.ko Test: kfi_test_xxx.ko 15 www.openfabrics.org
KFI Repo Layout kfabric kfi (framework) prov (providers) include Makefile/kbuild Documentation tests kfabric.c (kfi.c) ibverbs Sockets Others ibverbs sockets 16 www.openfabrics.org
NVM two main use cases Storage kernel and user mode accesses NVM accessed through a file system Block I/O, File I/O, Object I/O Via an I/O fabric e.g. PCIe using non-transparent bridging Via a network Ethernet, IB, emerging Persistent Memory kernel and user mode accesses Memory semantics load/store to local or remote persistent memory 19 www.openfabrics.org
Storage Storage data mirroring use cases Direct memory-like access to local or remote NVM through: PCIe fabric non-transparent bridging Ethernet fabric Proprietary implementation Prefer one-sided / lightweight operation offered by Kfi Storage block access use cases Direct block access to local or remote NVM PCIe fabric NVMe devices already contain queues, no need to layer IB queues on top of those existing queues. kfi does not begin from the perspective of a classical connection- oriented protocol e.g. NVM requires a lighter weight connection protocol www.openfabrics.org 20
Why kfi for NVM? Storage data mirroring use cases Direct memory-like access to local or remote NVM through: PCIe fabric non-transparent bridging Ethernet fabric Proprietary implementation Prefer one-sided / lightweight operation offered by Kfi Storage block access use cases Direct block access to local or remote NVM PCIe fabric NVMe devices already contain queues, no need to layer IB queues on top of those existing queues. kfi does not begin from the perspective of a classical connection- oriented protocol e.g. NVM requires a lighter weight connection protocol www.openfabrics.org 21
Why kfi for NVM? NVM doesn t fit well under the verbs API ibverbs cannot be used to directly access NVM ibverbs assumes a fairly heavyweight connection mechanism This may not be appropriate for NVM, particularly for local accesses 22 www.openfabrics.org
Taxonomy OFI created a taxonomy for classes of consumers objective is to focus on defining the requirements for each class two working groups launched to focus on the first two classes DS/DA WG OFI WG Data Storage, Data Access - Filesystems - Object storage - Block storage - Distributed storage - Storage at a distance Distributed Computing Legacy apps (skts, IP) Data Analysis Msg passing - MPI middleware Shared memory - PGAS - languages (SHMEM, UPC ) - Skts apps - IP apps - Structured data - Unstructured data OpenFabrics Interfaces - OFI 23 www.openfabrics.org
KFI Provider kfi_provider_register (uint version, struct kfi_provider *provider) kfi_provider_deregister (struct kfi_provider *provider) struct kfi_provider { const char *name; uint32_t version; int (*getinfo)(uint32_t version, const char *node, const int service, uint64_t flags, struct fi_info *hints, struct kfi_info **info); int (*freeinfo)(struct kfi_info *info); int (*fabric)(struct kfi_fabric_attr *attr, struct fid_fabric **fabric, void *context); }; www.openfabrics.org 24
KFI Application Flow Initialization Server connection setup (if required) Client connection setup (if required) Connection finalization (if required) Data transfer Shutdown www.openfabrics.org 25
KFI Initialization kfi_getinfo( &fi ) Acquire a list of desirable/available fabric providers. Select appropriate fabric (traverse provider list). kfi_fabric(fi, &fabric) Create a fabric instance based on fabric provider selection. kfi_domain(fabric, fi, &domain) create a fabric access domain object. 26 www.openfabrics.org
kfi End Point setup kfi_ep_open( domain, fi, &ep ) create a communications endpoint. kfi_cq_open( domain, attr, &CQ ) create/open a Completion Queue. kfi_ep_bind( ep, CQ, send/recv ) bind the CQ to an endpoint kfi_enable( ep ) Enable end-point operation (e.g. QP- >RTS). 27 www.openfabrics.org
kfi connection components kfi_listen() listen for a connection request kfi_bind() bind fabric address to an endpoint kfi_accept() accept a connection request kfi_connect() post an endpoint connection request kfi_eq_sread() blocking read for connection events. kfi_eq_error() retrieve connection error information 28 www.openfabrics.org
KFI Reliable Datagram transfer kfi_sendto() post a Reliable Datagram send request kfi_recvfrom() post a Reliable Datagram receive request. kfi_cq_sread() synchronous/blocking read CQ event(s). kfi_cq_read() non-blocking read CQ event(s). kfi_cq_error() retrieve data transfer error information fi_close() close any kfi created object. 29 www.openfabrics.org
KFI message data transfer kfi_mr_reg( domain, &mr ) register a memory region kfi_close( mr ) release a registered memory region kfi_send( ep, buf, len, fi_mr_desc(mr), ctx ) post async send from memory request. kfi_recv( ep, buf, len, fi_mr_desc(mr), ctx ) post async read into memory request. kfi_sendmsg() post send using fi_msg (kvec + imm data). kfi_readmsg() post read using fi_msg (kvec + imm data). 30 www.openfabrics.org
KFI RDMA data transfer kfi_write() post RDMA write. kfi_read() post RDMA read. kfi_writemsg() post RDMA write msg (kvec). kfi_readmsg() post RDMA read msg (kvec). 31 www.openfabrics.org
KFI message data transfer kfi_send() post send. kfi_recv() post read. kfi_sendmsg() post write msg (kvec + imData). kfi_recvmsg() post read msg (kvec+ imData). kfi_recvv(), kfi_sendv() post recv/send with kvec. 32 www.openfabrics.org
Bonepile To be deleted prior to use 33 www.openfabrics.org