Queue-Proportional Sampling: A Better Approach to Crossbar Scheduling

queue proportional sampling a better approach n.w
1 / 45
Embed
Share

"Learn about Queue-Proportional Sampling, a new approach to crossbar scheduling for input-queued switches. Explore the proposed algorithm, simulation results, and conclusions presented in the research paper. Understand the challenges and constraints associated with scheduling for input-queued crossbar switches. Discover existing scheduling methods and their performance evaluations. Stay informed about the advancements in switch scheduling technology. Published on March 18, 2025, in SIGMETRICS 2017."

  • Queue-Proportional Sampling
  • Crossbar Scheduling
  • Input-Queued Switches
  • Scheduling Algorithms
  • SIGMETRICS

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Queue-Proportional Sampling: A Better Approach to Crossbar Scheduling for Input-Queued Switches Long Gong, Paul Tune, Liang Liu, Sen, Jun (Jim) Xu

  2. Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 2

  3. Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 3

  4. Input-Queued Crossbar Switches Packets Input 1 VOQ 1 Output 1 Segment Assemble VOQ N Crossbar Input N VOQ 1 Output N Segment Assemble VOQ N Segment variable-size packets into fixed-size Reassemble packets before they leave March 18, 2025 SIGMETRICS 2017 4

  5. Scheduling for Input-Queued Crossbar Switches Abstraction Which input connects to which output? Input 1 Output 1 VOQ 1 Input 1 Output 1 Output 2 Input 2 VOQ N ? Crossbar Input N VOQ 1 Output N Output N Input N VOQ N Constraints (Physical Limits of the Crossbar) Each input can only connect to a single output Each output can only be connected by a single input March 18, 2025 SIGMETRICS 2017 5

  6. Scheduling for Input-Queued Crossbar Switches: Formulation Problem: Compute a one-to- one matching between input and output ports Quality of the matching Model: Weighted Complete Bipartite Graph Output 1 Input 1 Time to compute the matching Output 2 Input 2 Challenge: matchings (i.e., high through- put & low delay) at high speed high quality Output N Input N March 18, 2025 SIGMETRICS 2017 6

  7. Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 7

  8. Existing Scheduling for Input- Queued Crossbar Switches Empirically optimal Quality of the matching Time to compute the matching ?(?3) Maximum Weighted Matching (MWM) Quality of the matching Good delay performance Time to compute the matching iSLIP [McKeown99] Quality of the matching ?(???2?) Time to compute the matching Close to MWM ?(?) Serena [Giaccone03] March 18, 2025 SIGMETRICS 2017 8

  9. Why Need New Scheduling Algorithms Reality Achieve Better Tradeoff 1K ports & 1Tb/s in the future [DeCusatis2013] Improve Decrease Badly Need Quality of the matching Time to compute the matching Switches Can Connect A Large Number of Ports and Operate at Very High Speed March 18, 2025 SIGMETRICS 2017 9

  10. Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 10

  11. Queue-Proportional Sampling (QPS): Overview ??: length of the ?? VOQ ?: total length of VOQs Proposing (at any input port) ?? ? Step 1: Sample an output port ? with probability Step 2: Send ??to output port ? (assume ? is sampled is Step 1) Accepting (at any output port) Accept the one with the largest VOQ length if receiving one or more proposals Note that, besides the longest VOQ first accepting strategy, we also investigate proportional accepting March 18, 2025 SIGMETRICS 2017 11

  12. Queue-Proportional Sampling (QPS): Example VOQ ID 1 2 3 4 VOQ Length 3 7 6 2 Sample a VOQ 2 3 6 Input 1 7 VOQ 1 Output 1 VOQ 4 Crossbar Proposals VOQ Length Input N Input ID 1 VOQ 1 Output 4 3 4 1 VOQ 4 March 18, 2025 SIGMETRICS 2017 12

  13. QPS-Augmented Schemes iSLIP Request Grant Accept March 18, 2025 SIGMETRICS 2017 13

  14. QPS-Augmented Schemes QPS-iSLIP Only those un-matched ports in QPS QPS Request Grant Accept SERENA Arrival graph Full matching Merge Final Matching Matching in the previous time slot March 18, 2025 SIGMETRICS 2017 14

  15. QPS-Augmented Schemes QPS-iSLIP Only those un-matched ports in QPS QPS Propose Grant Accept QPS-SERENA QPS Full matching Merge (Arrival graph) Final Matching Matching in the previous time slot March 18, 2025 SIGMETRICS 2017 15

  16. QPS-Serena: Stable? Stability of Serena Non-degenerative Property P Current matching won t be worse than previous time slot, i.e., ? ? ,?(?) ? ? 1 ,?(?) At any time slot, there is a probability to achieve the weights of MWM the one at positive QPS-Serena does not satisfy March 18, 2025 SIGMETRICS 2017 16

  17. QPS-Serena: Stable! Stability of QPS-Serena Property P Non-degenerative Relax to (?,?)-MWM Current matching won t be worse than previous time slot, i.e., ? ? ,?(?) ? ? 1 ,?(?) the one at At any time slot, ? > 0, ? > 0, such that Pr[? (1 ?)? ] ? Theorem: For any switching scheduling algorithm that satisfies the above two properties, under any i.i.d. admissible traffic, the joint queueing and scheduling process ? ? ,?(?) is an ergodic Markov chain, and the queueing process {? ? } converges in distribution to a random vector ?, such that ? ? 1< . March 18, 2025 SIGMETRICS 2017 17

  18. Basic Steps of Stability Proof based on Lyapunov Function Abstract Key Properties MWM No Abstraction SERENA/TASS Property P Non-Degenerative QPS [McKeown 99] [Tassiulas 98] ?,? ??? Non-Degenerative Design Appropriate Lyapunov Function QPS-SERENA SERENA/TASS ? = ? MWM ? = ? [Tassiulas 98] [McKeown 99] +2 2 2 2+ ?? ?,? 2+ ? ?? ?,? ? = ? Derive the Negative Drift ? ? ? ? + 1 If ? ? ? ? ? ? | ?(?) ? ? ? March 18, 2025 SIGMETRICS 2017 18

  19. Intuition behind The Proofs Design Appropriate Lyapunov Function MWM ? = ? SERENA/TASS ? = ? QPS-SERENA [McKeown 99] [Tassiulas 98] 2 +2 2 2+ ?? ?,? 2+ ? ?? ?,? ? = ? ?(? + 1)2 ?(?)2| ?(?) ? ?(?) + 2 ?? ? ? ? ,? ? ? + ?1 ? ?2? ? + 1 ?2? ? | ?(?) ??2? ? + ?2 ?2? ? + ?3 2 Where ? = (?,?) and ?2? = ?: VOQ length vector ?: Matching from the scheduling algorithm ??: MWM with respect to ? ?? ?,? +?4Q(t)2 March 18, 2025 SIGMETRICS 2017 19

  20. Intuition behind The Proofs Design Appropriate Lyapunov Function MWM ? = ? TASS 2+ QPS-SERENA [McKeown 99] [Tassiulas 98] +2 2 2 ? = ? ?? ?,? 2+ ? ?? ?,? ? = ? ?(? + 1)2 ?(?)2| ?(?) ? ?(?) + 2 ?? ? ? ? ,? ? ? + ?1 ++ ? 1 ? ?(?) + 2 ??? ? ? ? ,? ? ?: VOQ length vector ?: Matching from the scheduling algorithm ??: MWM with respect to ? ? < 1 is a constant March 18, 2025 SIGMETRICS 2017 20

  21. Intuition behind The Proofs Design Appropriate Lyapunov Function MWM ? = ? TASS 2+ QPS-SERENA [McKeown 99] [Tassiulas 98] +2 2 2 ? = ? ?? ?,? 2+ ? ?? ?,? ? = ? ++ ? 1 ?(? + 1)2 ?(?)2| ?(?) ? ?(?) + 2 ? ?? ? ? ? ,? ? ? ? ?2? ? + 1 ?2? ? | ?(?) ? ?2? ? + ? 2 ?2? ? + ? 3 +2 Where ? = (?,?) and ?2? = ? ?? ?,? ?: VOQ length vector ?: Matching from the scheduling algorithm ??: MWM with respect to ? ?: maximum normalized load here ? =1+? 2 March 18, 2025 SIGMETRICS 2017 21

  22. Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 22

  23. Simulation Setup Parameters Values 32 N, number of input/output ports T, number of time slots simulated 6,000X?2 Arrival Processes Bernoulli i.i.d., ON-OFF Bursty Traffic patterns uniform, Quasi-diagonal, Log-diagonal, Diagonal March 18, 2025 SIGMETRICS 2017 23

  24. Maximum Achievable Throughput QPS-iSLIP vs. iSLIP 0 0.1768 0.1261 0.0489 Traffic Uniform Quasi-diagonal Log-Diagonal Diagonal iSLIP 100.00% 81.70% 83.85% 83.47% QPS- iLSIP 100.00% 99.38% 96.46% 88.35% SIGMETRICS 2017 March 18, 2025 24

  25. Mean Delay (Bernoulli i.i.d.) QPS-SERENA vs. SERENA March 18, 2025 SIGMETRICS 2017 25

  26. Mean Delay (Burst) QPS-SERENA vs. SERENA More results are available in our paper (95thpercentile delays, mean delays scale with port number ) March 18, 2025 SIGMETRICS 2017 26

  27. Contents 1 Background 2 Related Work 3 Proposed Algorithm 4 Simulation Results 5 Conclusion March 18, 2025 SIGMETRICS 2017 27

  28. Conclusion Proposed a simple yet effective approach to crossbar scheduling Designed an simple data structure that can carry out QPS in O(1) per port Derived a novel and stronger theorem for proving the stability of crossbar scheduling Simulation results demonstrated that QPS can boost the performance (throughput or delay or both) of iSLIP and SERENA March 18, 2025 SIGMETRICS 2017 28

  29. Queue-Proportional Sampling (QPS): O(1) Data Structure VOQs at Input Port i Main Data Structure Array of linked lists Each link list => VOQ B j E j A j F j VOQ j ?? One-to-one correspondence Auxiliary Data Structure Array of all packets G B A C D Packets in All of VOQs at Input Port i March 18, 2025 SIGMETRICS 2017 29

  30. Queue-Proportional Sampling (QPS): O(1) Data Structure VOQs at Input Port i Main Data Structure Array of linked lists Each link list => VOQ E j A j F j VOQ j ?? One-to-one correspondence Auxiliary Data Structure Array of all packets G D A C Packets in All of VOQs at Input Port i March 18, 2025 SIGMETRICS 2017 30

  31. Head-of-Line Blocking 3 2 1 A 1 A B C 3 2 1 B A B C 2 3 2 1 A B C C 3 March 18, 2025 SIGMETRICS 2017 31

  32. QPS-iSLIP: Overview Output Port Input Port iSLIP (Multiple Iterations) Accept Request Grant VOQ Not Empty Round Robin (Accept pointer) Round Robin (Grant pointer) QPS SIGMETRICS 2017 March 18, 2025 32

  33. QPS-iSLIP: Example Starter Matching Request Grant 1 1 2 4 1 1 2 4 1 1 2 4 2 2 2 3 2 2 2 3 2 2 2 3 3 3 4 1 3 3 4 1 3 3 4 1 4 4 3 3 4 4 3 3 4 4 3 3 Accept 1 1 2 4 2 2 2 3 3 3 4 1 4 4 3 3 SIGMETRICS 2017 March 18, 2025 33

  34. QPS-Serena: Overview Starter Matching (QPS) Merge Populate Populate matching matching by pairing unmatched input and output ports round robin manner the starter to a Generate matching with Queue- Proportional Sampling (QPS) a starter full Merge the populated full matching with the matching in previous time slot in a March 18, 2025 SIGMETRICS 2017 34

  35. QPS-Serena: Example Matching in Previous Time Slot Starter Matching 1 1 1 1 1 1 Populate 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 Merge VOQ (1,1) (2,2) (2,4) (3,2) (3,3) (4,3) (4,4) Length 3 8 9 2 1 7 4 1 1 1 1 sub-matching in each cycle Pick heavier 2 2 2 2 3 3 3 3 4 4 4 4 March 18, 2025 SIGMETRICS 2017 35

  36. QPS-iSLIP: Example Starter Matching Request Grant 1 1 1 1 2 4 2 4 1 1 2 4 2 2 2 2 2 3 2 3 2 2 2 3 3 3 3 3 4 1 4 1 3 3 4 1 4 4 4 4 3 3 3 3 4 4 3 3 Accept 1 1 2 2 1 1 2 2 2 2 1 4 2 2 1 4 One More Iteration 3 3 1 1 3 3 1 1 4 4 3 3 4 4 3 3 SIGMETRICS 2017 March 18, 2025 36

  37. Four Traffic Patterns Uniform Quasi-diagonal 1 2(? 1) 1 2 1 2(? 1) 1 2 1 1 1/? 1/? 1/? 1/? 1/? 1/? 1/? 1/? 1/? 2(? 1) 1 2(? 1) 1 2 2(? 1) 1 2(? 1) Log-diagonal 2? 2 2? 1 2? 1 2? 1 2? 3 2? 1 2? 1 2? 1 1 2? 1 2? 2 2? 1 1 Diagonal 2? 1 2 2? 1 2? 1 2? 1 2/3 0 1/3 1/3 2/3 0 0 0 2/3 SIGMETRICS 2017 March 18, 2025 37

  38. ON-OFF Bursty Arrivals ? ON OFF 1 ? 1 ? ? SIGMETRICS 2017 March 18, 2025 38

  39. Mean Delay (Bernoulli i.i.d.) QPS-iSLIP vs. iSLIP SIGMETRICS 2017 March 18, 2025 39

  40. Mean Delay (Burst) QPS-iSLIP vs. iSLIP SIGMETRICS 2017 March 18, 2025 40

  41. Q1: Starvation Issue We believe the primary mission of a crossbar scheduling algorithm is to deliver excellent performance under admissible workloads; such grace under fire (proportional fairness and lack of starvation even when severely overloaded) is a secondary consideration and can be better achieved through other knobs or levers" orthogonal to switching such as congestion scheduling, or traffic policing/shaping. control, packet March 18, 2025 SIGMETRICS 2017 41

  42. Q2: Why Only Synthetic? There are two reasons for that. First, almost all work on input-queued switch scheduling in the literature (e.g., references [2,3,10,16,17,27] in this work) used roughly the same set of synthetic traffic loads (as used in this work) for performance evaluations, since it is not known whether we can meaningfully combine N ``scalar (per-port) packet-level traces into an N-dimensional switch-wide traffic workload, in a principled manner. Second, it appears to us the only additional information we can glean from trace-driven scheduling context, is how the scheduling algorithm reacts to the (TCP/UDP) flow dynamics -- most notably skewness and burstiness -- that are captured in the trace(s). In our simulation studies, we have thoroughly evaluated the impact of both skewness and burstness on the delay performances of QPS- augmented algorithms, using the 4 different traffic patterns and the bursty ON-OFF traffic arrivals respectively. simulation, in this crossbar March 18, 2025 SIGMETRICS 2017 42

  43. Q2: Why Only Simulations? We were not able to perform such an analysis, for the following two reasons: There is theoretical delay analysis on neither of the underlying algorithms (iSLIP and SERENA), as both are very hard to analyze. The interactions (resulting in complicated dependency structures) between QPS and the underlying algorithms further complicate the task of analyzing the delay performances of the QPS- augmented algorithms. This said, we believe this is an interesting topic that can be pursued in our future work. March 18, 2025 SIGMETRICS 2017 43

  44. Input-Queued Crossbar Switches Packet Packet Line Card 1 CPU Line Card 1 CPU CPU B U S B U S Line Card 2 Line Card 2 Add More CPU CPU Line Card N Line Card N Bus, Shared CPU Bus, Shared CPUs CPU FE Line Card 1 CPU B U S Line Card 1 FE Line Card 1 Line Card 2 CPU FE Line Card 1 Line Card N Crossbar, per-line-card Forwarding Engine Bus, per-line-card CPUs March 18, 2025 SIGMETRICS 2017 44

  45. Stability of QPS-Serena: Proof Sketch Design Lyapunov Function 2 ?1? = ?2 ?2? = ([ ? ?? ?,? ]+)2 ? ? = ?1? + ?2(?) Bound Drift of ?2 Bound Drift of ?1 ? ?1? ? + 1 ?1? ? | ?1(? ? ) ? ?2? ? + 1 ?2? ? | ?2(? ? ) ? ? ? ? + 1 ? ? ? | ?(? ? ) < ? ?2 Theorem: For any switching scheduling algorithm that satisfies the above two properties, under any i.i.d. admissible traffic, the joint queueing and scheduling process ? ? ,?(?) is an ergodic Markov chain, and the queueing process {? ? } converges in distribution to a random vector ?, such that ? ? 1< . March 18, 2025 SIGMETRICS 2017 45

More Related Content