
Queue-Proportional Sampling: A Better Approach to Crossbar Scheduling
"Learn about Queue-Proportional Sampling, a new approach to crossbar scheduling for input-queued switches. Explore the proposed algorithm, simulation results, and conclusions presented in the research paper. Understand the challenges and constraints associated with scheduling for input-queued crossbar switches. Discover existing scheduling methods and their performance evaluations. Stay informed about the advancements in switch scheduling technology. Published on March 18, 2025, in SIGMETRICS 2017."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Queue-Proportional Sampling: A Better Approach to Crossbar Scheduling for Input-Queued Switches Long Gong, Paul Tune, Liang Liu, Sen, Jun (Jim) Xu
Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 2
Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 3
Input-Queued Crossbar Switches Packets Input 1 VOQ 1 Output 1 Segment Assemble VOQ N Crossbar Input N VOQ 1 Output N Segment Assemble VOQ N Segment variable-size packets into fixed-size Reassemble packets before they leave March 18, 2025 SIGMETRICS 2017 4
Scheduling for Input-Queued Crossbar Switches Abstraction Which input connects to which output? Input 1 Output 1 VOQ 1 Input 1 Output 1 Output 2 Input 2 VOQ N ? Crossbar Input N VOQ 1 Output N Output N Input N VOQ N Constraints (Physical Limits of the Crossbar) Each input can only connect to a single output Each output can only be connected by a single input March 18, 2025 SIGMETRICS 2017 5
Scheduling for Input-Queued Crossbar Switches: Formulation Problem: Compute a one-to- one matching between input and output ports Quality of the matching Model: Weighted Complete Bipartite Graph Output 1 Input 1 Time to compute the matching Output 2 Input 2 Challenge: matchings (i.e., high through- put & low delay) at high speed high quality Output N Input N March 18, 2025 SIGMETRICS 2017 6
Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 7
Existing Scheduling for Input- Queued Crossbar Switches Empirically optimal Quality of the matching Time to compute the matching ?(?3) Maximum Weighted Matching (MWM) Quality of the matching Good delay performance Time to compute the matching iSLIP [McKeown99] Quality of the matching ?(???2?) Time to compute the matching Close to MWM ?(?) Serena [Giaccone03] March 18, 2025 SIGMETRICS 2017 8
Why Need New Scheduling Algorithms Reality Achieve Better Tradeoff 1K ports & 1Tb/s in the future [DeCusatis2013] Improve Decrease Badly Need Quality of the matching Time to compute the matching Switches Can Connect A Large Number of Ports and Operate at Very High Speed March 18, 2025 SIGMETRICS 2017 9
Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 10
Queue-Proportional Sampling (QPS): Overview ??: length of the ?? VOQ ?: total length of VOQs Proposing (at any input port) ?? ? Step 1: Sample an output port ? with probability Step 2: Send ??to output port ? (assume ? is sampled is Step 1) Accepting (at any output port) Accept the one with the largest VOQ length if receiving one or more proposals Note that, besides the longest VOQ first accepting strategy, we also investigate proportional accepting March 18, 2025 SIGMETRICS 2017 11
Queue-Proportional Sampling (QPS): Example VOQ ID 1 2 3 4 VOQ Length 3 7 6 2 Sample a VOQ 2 3 6 Input 1 7 VOQ 1 Output 1 VOQ 4 Crossbar Proposals VOQ Length Input N Input ID 1 VOQ 1 Output 4 3 4 1 VOQ 4 March 18, 2025 SIGMETRICS 2017 12
QPS-Augmented Schemes iSLIP Request Grant Accept March 18, 2025 SIGMETRICS 2017 13
QPS-Augmented Schemes QPS-iSLIP Only those un-matched ports in QPS QPS Request Grant Accept SERENA Arrival graph Full matching Merge Final Matching Matching in the previous time slot March 18, 2025 SIGMETRICS 2017 14
QPS-Augmented Schemes QPS-iSLIP Only those un-matched ports in QPS QPS Propose Grant Accept QPS-SERENA QPS Full matching Merge (Arrival graph) Final Matching Matching in the previous time slot March 18, 2025 SIGMETRICS 2017 15
QPS-Serena: Stable? Stability of Serena Non-degenerative Property P Current matching won t be worse than previous time slot, i.e., ? ? ,?(?) ? ? 1 ,?(?) At any time slot, there is a probability to achieve the weights of MWM the one at positive QPS-Serena does not satisfy March 18, 2025 SIGMETRICS 2017 16
QPS-Serena: Stable! Stability of QPS-Serena Property P Non-degenerative Relax to (?,?)-MWM Current matching won t be worse than previous time slot, i.e., ? ? ,?(?) ? ? 1 ,?(?) the one at At any time slot, ? > 0, ? > 0, such that Pr[? (1 ?)? ] ? Theorem: For any switching scheduling algorithm that satisfies the above two properties, under any i.i.d. admissible traffic, the joint queueing and scheduling process ? ? ,?(?) is an ergodic Markov chain, and the queueing process {? ? } converges in distribution to a random vector ?, such that ? ? 1< . March 18, 2025 SIGMETRICS 2017 17
Basic Steps of Stability Proof based on Lyapunov Function Abstract Key Properties MWM No Abstraction SERENA/TASS Property P Non-Degenerative QPS [McKeown 99] [Tassiulas 98] ?,? ??? Non-Degenerative Design Appropriate Lyapunov Function QPS-SERENA SERENA/TASS ? = ? MWM ? = ? [Tassiulas 98] [McKeown 99] +2 2 2 2+ ?? ?,? 2+ ? ?? ?,? ? = ? Derive the Negative Drift ? ? ? ? + 1 If ? ? ? ? ? ? | ?(?) ? ? ? March 18, 2025 SIGMETRICS 2017 18
Intuition behind The Proofs Design Appropriate Lyapunov Function MWM ? = ? SERENA/TASS ? = ? QPS-SERENA [McKeown 99] [Tassiulas 98] 2 +2 2 2+ ?? ?,? 2+ ? ?? ?,? ? = ? ?(? + 1)2 ?(?)2| ?(?) ? ?(?) + 2 ?? ? ? ? ,? ? ? + ?1 ? ?2? ? + 1 ?2? ? | ?(?) ??2? ? + ?2 ?2? ? + ?3 2 Where ? = (?,?) and ?2? = ?: VOQ length vector ?: Matching from the scheduling algorithm ??: MWM with respect to ? ?? ?,? +?4Q(t)2 March 18, 2025 SIGMETRICS 2017 19
Intuition behind The Proofs Design Appropriate Lyapunov Function MWM ? = ? TASS 2+ QPS-SERENA [McKeown 99] [Tassiulas 98] +2 2 2 ? = ? ?? ?,? 2+ ? ?? ?,? ? = ? ?(? + 1)2 ?(?)2| ?(?) ? ?(?) + 2 ?? ? ? ? ,? ? ? + ?1 ++ ? 1 ? ?(?) + 2 ??? ? ? ? ,? ? ?: VOQ length vector ?: Matching from the scheduling algorithm ??: MWM with respect to ? ? < 1 is a constant March 18, 2025 SIGMETRICS 2017 20
Intuition behind The Proofs Design Appropriate Lyapunov Function MWM ? = ? TASS 2+ QPS-SERENA [McKeown 99] [Tassiulas 98] +2 2 2 ? = ? ?? ?,? 2+ ? ?? ?,? ? = ? ++ ? 1 ?(? + 1)2 ?(?)2| ?(?) ? ?(?) + 2 ? ?? ? ? ? ,? ? ? ? ?2? ? + 1 ?2? ? | ?(?) ? ?2? ? + ? 2 ?2? ? + ? 3 +2 Where ? = (?,?) and ?2? = ? ?? ?,? ?: VOQ length vector ?: Matching from the scheduling algorithm ??: MWM with respect to ? ?: maximum normalized load here ? =1+? 2 March 18, 2025 SIGMETRICS 2017 21
Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 22
Simulation Setup Parameters Values 32 N, number of input/output ports T, number of time slots simulated 6,000X?2 Arrival Processes Bernoulli i.i.d., ON-OFF Bursty Traffic patterns uniform, Quasi-diagonal, Log-diagonal, Diagonal March 18, 2025 SIGMETRICS 2017 23
Maximum Achievable Throughput QPS-iSLIP vs. iSLIP 0 0.1768 0.1261 0.0489 Traffic Uniform Quasi-diagonal Log-Diagonal Diagonal iSLIP 100.00% 81.70% 83.85% 83.47% QPS- iLSIP 100.00% 99.38% 96.46% 88.35% SIGMETRICS 2017 March 18, 2025 24
Mean Delay (Bernoulli i.i.d.) QPS-SERENA vs. SERENA March 18, 2025 SIGMETRICS 2017 25
Mean Delay (Burst) QPS-SERENA vs. SERENA More results are available in our paper (95thpercentile delays, mean delays scale with port number ) March 18, 2025 SIGMETRICS 2017 26
Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 27
Conclusion Proposed a simple yet effective approach to crossbar scheduling Designed an simple data structure that can carry out QPS in O(1) per port Derived a novel and stronger theorem for proving the stability of crossbar scheduling Simulation results demonstrated that QPS can boost the performance (throughput or delay or both) of iSLIP and SERENA March 18, 2025 SIGMETRICS 2017 28
Queue-Proportional Sampling (QPS): O(1) Data Structure VOQs at Input Port i Main Data Structure Array of linked lists Each link list => VOQ B j E j A j F j VOQ j ?? One-to-one correspondence Auxiliary Data Structure Array of all packets G B A C D Packets in All of VOQs at Input Port i March 18, 2025 SIGMETRICS 2017 29
Queue-Proportional Sampling (QPS): O(1) Data Structure VOQs at Input Port i Main Data Structure Array of linked lists Each link list => VOQ E j A j F j VOQ j ?? One-to-one correspondence Auxiliary Data Structure Array of all packets G D A C Packets in All of VOQs at Input Port i March 18, 2025 SIGMETRICS 2017 30
Head-of-Line Blocking 3 2 1 A 1 A B C 3 2 1 B A B C 2 3 2 1 A B C C 3 March 18, 2025 SIGMETRICS 2017 31
QPS-iSLIP: Overview Output Port Input Port iSLIP (Multiple Iterations) Accept Request Grant VOQ Not Empty Round Robin (Accept pointer) Round Robin (Grant pointer) QPS SIGMETRICS 2017 March 18, 2025 32
QPS-iSLIP: Example Starter Matching Request Grant 1 1 2 4 1 1 2 4 1 1 2 4 2 2 2 3 2 2 2 3 2 2 2 3 3 3 4 1 3 3 4 1 3 3 4 1 4 4 3 3 4 4 3 3 4 4 3 3 Accept 1 1 2 4 2 2 2 3 3 3 4 1 4 4 3 3 SIGMETRICS 2017 March 18, 2025 33
QPS-Serena: Overview Starter Matching (QPS) Merge Populate Populate matching matching by pairing unmatched input and output ports round robin manner the starter to a Generate matching with Queue- Proportional Sampling (QPS) a starter full Merge the populated full matching with the matching in previous time slot in a March 18, 2025 SIGMETRICS 2017 34
QPS-Serena: Example Matching in Previous Time Slot Starter Matching 1 1 1 1 1 1 Populate 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 Merge VOQ (1,1) (2,2) (2,4) (3,2) (3,3) (4,3) (4,4) Length 3 8 9 2 1 7 4 1 1 1 1 sub-matching in each cycle Pick heavier 2 2 2 2 3 3 3 3 4 4 4 4 March 18, 2025 SIGMETRICS 2017 35
QPS-iSLIP: Example Starter Matching Request Grant 1 1 1 1 2 4 2 4 1 1 2 4 2 2 2 2 2 3 2 3 2 2 2 3 3 3 3 3 4 1 4 1 3 3 4 1 4 4 4 4 3 3 3 3 4 4 3 3 Accept 1 1 2 2 1 1 2 2 2 2 1 4 2 2 1 4 One More Iteration 3 3 1 1 3 3 1 1 4 4 3 3 4 4 3 3 SIGMETRICS 2017 March 18, 2025 36
Four Traffic Patterns Uniform Quasi-diagonal 1 2(? 1) 1 2 1 2(? 1) 1 2 1 1 1/? 1/? 1/? 1/? 1/? 1/? 1/? 1/? 1/? 2(? 1) 1 2(? 1) 1 2 2(? 1) 1 2(? 1) Log-diagonal 2? 2 2? 1 2? 1 2? 1 2? 3 2? 1 2? 1 2? 1 1 2? 1 2? 2 2? 1 1 Diagonal 2? 1 2 2? 1 2? 1 2? 1 2/3 0 1/3 1/3 2/3 0 0 0 2/3 SIGMETRICS 2017 March 18, 2025 37
ON-OFF Bursty Arrivals ? ON OFF 1 ? 1 ? ? SIGMETRICS 2017 March 18, 2025 38
Mean Delay (Bernoulli i.i.d.) QPS-iSLIP vs. iSLIP SIGMETRICS 2017 March 18, 2025 39
Mean Delay (Burst) QPS-iSLIP vs. iSLIP SIGMETRICS 2017 March 18, 2025 40
Q1: Starvation Issue We believe the primary mission of a crossbar scheduling algorithm is to deliver excellent performance under admissible workloads; such grace under fire (proportional fairness and lack of starvation even when severely overloaded) is a secondary consideration and can be better achieved through other knobs or levers" orthogonal to switching such as congestion scheduling, or traffic policing/shaping. control, packet March 18, 2025 SIGMETRICS 2017 41
Q2: Why Only Synthetic? There are two reasons for that. First, almost all work on input-queued switch scheduling in the literature (e.g., references [2,3,10,16,17,27] in this work) used roughly the same set of synthetic traffic loads (as used in this work) for performance evaluations, since it is not known whether we can meaningfully combine N ``scalar (per-port) packet-level traces into an N-dimensional switch-wide traffic workload, in a principled manner. Second, it appears to us the only additional information we can glean from trace-driven scheduling context, is how the scheduling algorithm reacts to the (TCP/UDP) flow dynamics -- most notably skewness and burstiness -- that are captured in the trace(s). In our simulation studies, we have thoroughly evaluated the impact of both skewness and burstness on the delay performances of QPS- augmented algorithms, using the 4 different traffic patterns and the bursty ON-OFF traffic arrivals respectively. simulation, in this crossbar March 18, 2025 SIGMETRICS 2017 42
Q2: Why Only Simulations? We were not able to perform such an analysis, for the following two reasons: There is theoretical delay analysis on neither of the underlying algorithms (iSLIP and SERENA), as both are very hard to analyze. The interactions (resulting in complicated dependency structures) between QPS and the underlying algorithms further complicate the task of analyzing the delay performances of the QPS- augmented algorithms. This said, we believe this is an interesting topic that can be pursued in our future work. March 18, 2025 SIGMETRICS 2017 43
Input-Queued Crossbar Switches Packet Packet Line Card 1 CPU Line Card 1 CPU CPU B U S B U S Line Card 2 Line Card 2 Add More CPU CPU Line Card N Line Card N Bus, Shared CPU Bus, Shared CPUs CPU FE Line Card 1 CPU B U S Line Card 1 FE Line Card 1 Line Card 2 CPU FE Line Card 1 Line Card N Crossbar, per-line-card Forwarding Engine Bus, per-line-card CPUs March 18, 2025 SIGMETRICS 2017 44
Stability of QPS-Serena: Proof Sketch Design Lyapunov Function 2 ?1? = ?2 ?2? = ([ ? ?? ?,? ]+)2 ? ? = ?1? + ?2(?) Bound Drift of ?2 Bound Drift of ?1 ? ?1? ? + 1 ?1? ? | ?1(? ? ) ? ?2? ? + 1 ?2? ? | ?2(? ? ) ? ? ? ? + 1 ? ? ? | ?(? ? ) < ? ?2 Theorem: For any switching scheduling algorithm that satisfies the above two properties, under any i.i.d. admissible traffic, the joint queueing and scheduling process ? ? ,?(?) is an ergodic Markov chain, and the queueing process {? ? } converges in distribution to a random vector ?, such that ? ? 1< . March 18, 2025 SIGMETRICS 2017 45