Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat

Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat
Slide Note
Embed
Share

This poster presents research on the automatic synthesis of parallel Unix commands and pipelines using KumQuat, a system developed by Jiasi Shen, Martin Rinard, and Nikos Vasilakis from MIT CSAIL. The work, presented at PPoPP '22, explores efficient ways to automatically generate parallelized operations in Unix environments.

  • Parallel Computing
  • Unix Commands
  • KumQuat
  • Automation
  • Pipeline Synthesis

Uploaded on Mar 07, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat Jiasi Shen, Martin Rinard, Nikos Vasilakis MIT CSAIL POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 1

  2. Stream-processing programs Input unit Input unit Input unit Input unit Input unit Input unit Input unit Stream of input units Sliding window Program Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 2

  3. Stream-processing pipelines Input unit Input unit Input unit Input unit Input unit Input unit Input unit Widely used in data analytics Often as Unix shell scripts (close to file system, pipes) Old Pipeline Program 1 Program 2 Program 3 Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 3

  4. Example: Count word frequencies cat $IN | tr cs A-Za-z \n | tr A-Z a-z | sort | uniq -c | sort -rn tr -cs A-Za-z \n tr A-Z a-z sort uniq -c sort -rn POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 4

  5. Example: Count word frequencies cat $IN | tr cs A-Za-z \n | tr A-Z a-z | sort | uniq -c | sort -rn tr -cs A-Za-z \n Break document into lines of words tr A-Z a-z sort uniq -c sort -rn POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 5

  6. Example: Count word frequencies cat $IN | tr cs A-Za-z \n | tr A-Z a-z | sort | uniq -c | sort -rn tr -cs A-Za-z \n Break document into lines of words Translate words into lower case tr A-Z a-z sort uniq -c sort -rn POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 6

  7. Example: Count word frequencies cat $IN | tr cs A-Za-z \n | tr A-Z a-z | sort | uniq -c | sort -rn tr -cs A-Za-z \n Break document into lines of words Translate words into lower case tr A-Z a-z sort Sort the lower-case words uniq -c sort -rn POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 7

  8. Example: Count word frequencies cat $IN | tr cs A-Za-z \n | tr A-Z a-z | sort | uniq -c | sort -rn tr -cs A-Za-z \n Break document into lines of words Translate words into lower case tr A-Z a-z sort Sort the lower-case words Remove duplicates and prepend counts uniq -c sort -rn POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 8

  9. Example: Count word frequencies cat $IN | tr cs A-Za-z \n | tr A-Z a-z | sort | uniq -c | sort -rn tr -cs A-Za-z \n Break document into lines of words Translate words into lower case tr A-Z a-z sort Sort the lower-case words Remove duplicates and prepend counts uniq -c Sort words on their counts (reverse order) sort -rn POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 9

  10. Stream-processing pipelines Input unit Input unit Input unit Input unit Input unit Input unit Input unit Widely used in data analytics Often as Unix shell scripts (close to file system, pipes) Often involve multiple languages/platforms Old Pipeline Program 1 Program 2 Program 3 Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 10

  11. Automatic data parallelism New Program Old Old Old KumQuat Program (Black Box) Program Copy Program Copy Merge outputs? Prior work requires manual merge implementation/annotation POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 11

  12. Automatic data parallelism New Program Choose inputs Old Old Old KumQuat Program (Black Box) Program Copy Program Copy Observe outputs Synthesized Merge outputs? Combiner Automatic synthesis of combiner Black box dynamic analysis Active learning on the program behavior POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 12

  13. Domain-specific language (DSL) for data-parallel combiners Combiner operators: Combine potentially structured streams (add, concat, stitch, stitch2, offset, merge) Select among multiple streams (first, second) Recursively map another combiner across stream components (front, back, fuse) Concatenate multiple streams and rerun the program (rerunf) Key characteristics: Expressive enough to capture useful combiner computations Restricted enough to enable our theoretical results POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 13

  14. Choose inputs Unix shell script that invokes program Split inputs Old Old Old KumQuat Program (Black Box) Program Copy Program Copy Observe outputs POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 14

  15. Choose inputs Unix shell script that invokes program Split inputs Old Old Old KumQuat Program (Black Box) Program Copy Program Copy Observe outputs POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 15

  16. Choose inputs Unix shell script that invokes program Split inputs New Program Old Old Old Old Old KumQuat Program (Black Box) Program Copy Program Copy Program Copy Program Copy Synthesized Combiner Observe outputs POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 16

  17. Parallelize each stage in pipeline stage Input Break document into lines of words tr -cs A-Za-z \n Translate words into lower case tr A-Z a-z tr A-Z a-z tr A-Z a-z Sort the lower-case words sort Remove duplicates and prepend counts uniq -c Sort words on their counts (reverse order) sort -rn Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 17

  18. Parallelize each stage in pipeline stage Input Break document into lines of words tr -cs A-Za-z \n Translate words into lower case tr A-Z a-z tr A-Z a-z tr A-Z a-z concat Sort the lower-case words sort Remove duplicates and prepend counts uniq -c Sort words on their counts (reverse order) sort -rn Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 18

  19. Parallelize each stage in pipeline stage Input Break document into lines of words tr -cs A-Za-z \n Translate words into lower case tr A-Z a-z tr A-Z a-z tr A-Z a-z concat Sort the lower-case words sort sort sort Remove duplicates and prepend counts uniq -c Sort words on their counts (reverse order) sort -rn Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 19

  20. Parallelize each stage in pipeline stage Input Break document into lines of words tr -cs A-Za-z \n Translate words into lower case tr A-Z a-z tr A-Z a-z tr A-Z a-z concat Sort the lower-case words sort sort sort merge Remove duplicates and prepend counts uniq -c Sort words on their counts (reverse order) sort -rn Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 20

  21. Parallelize each stage in pipeline stage Input Break document into lines of words tr -cs A-Za-z \n Translate words into lower case tr A-Z a-z tr A-Z a-z tr A-Z a-z concat Sort the lower-case words sort sort sort merge Remove duplicates and prepend counts uniq -c uniq -c uniq -c Sort words on their counts (reverse order) sort -rn Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 21

  22. Parallelize each stage in pipeline stage Input Break document into lines of words tr -cs A-Za-z \n Translate words into lower case tr A-Z a-z tr A-Z a-z tr A-Z a-z concat Sort the lower-case words sort sort sort merge Remove duplicates and prepend counts uniq -c uniq -c uniq -c (stitch2 add first) Sort words on their counts (reverse order) sort -rn Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 22

  23. Eliminate intermediate combiner Input Break document into lines of words tr -cs A-Za-z \n Translate words into lower case tr A-Z a-z tr A-Z a-z tr A-Z a-z concat Sort the lower-case words sort sort sort merge Remove duplicates and prepend counts uniq -c uniq -c uniq -c (stitch2 add first) Sort words on their counts (reverse order) sort -rn Output POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 23

  24. Benchmark data-processing pipelines 70 open-source Unix shell scripts Mass-transit analytics during COVID-19 Natural language processing Classic Unix One-liners Unix50 from Bell Labs POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 24

  25. Script name 1.sh (vehicles per day) 2.sh (vehicle days on road) 3.sh (vehicle hours on road) bi-grams.sh diff.sh nfa-regex.sh set-diff.sh sort.sh spell.sh top-n.sh wf.sh 1_1.sh (count_words) 2_1.sh (merge_upper) 3_1.sh (sort) 3_2.sh (sort_words_by_folding) 3_3.sh (sort_words_by_rhyming) 4_3.sh (bigrams) 4_3b.sh (count_trigrams) 6_1_2.sh (uppercase_by_type) 6_2.sh (4letter_words) 6_3.sh (words_no_vowels) 6_4.sh (1syllable_words) 6_5.sh (2syllable_words) 7_2.sh (count_consonant_seq) 8.2_1.sh (vowel_sequencies_gr_1K) 8.2_2.sh (bigrams_appear_twice) 8.3_2.sh (find_anagrams) 8.3_3.sh (compare_exodus_genesis) 8_1.sh (sort_words_by_n_syllables) 14.sh (6.1: order bodies) 21.sh (8.4: longest words w/o hyphens) unix50 23.sh (9.1: extract word PORT) 28.sh (9.6: follow directions) Benchmark Parallelized stages analytics-mts 7/7 (7/7) analytics-mts 8/8 (8/8) analytics-mts 8/8 (8/8) oneliners 3/5 (3/5) oneliners 4/7 (0/1,2/2,2/2,0/1,0/1) 478 s oneliners 2/2 (2/2) oneliners 5/8 (0/1,3/3,2/2,0/1,0/1) 1308 s 879 s (1.5 x) oneliners 1/1 (1/1) oneliners 6/8 (6/8) oneliners 4/6 (4/6) oneliners 4/5 (4/5) poets 4/6 (4/6) poets 5/7 (5/7) poets 5/7 (5/7) poets 5/7 (5/7) poets 7/9 (7/9) poets 4/8 (2/4,0/1,2/3) poets 4/9 (2/4,0/1,0/1,2/3) poets 4/6 (4/6) poets 7/11 (3/5,4/6) poets 5/7 (5/7) poets 5/8 (5/8) poets 5/8 (5/8) poets 5/7 (5/7) poets 5/8 (5/8) poets 4/9 (2/4,0/1,2/3,0/1) poets 7/9 (2/4,1/1,1/1,3/3) poets 6/10 (3/5,1/2,2/3) poets 6/10 (3/5,2/2,1/3) unix50 3/3 (3/3) 3/3 (3/3) unix50 6/6 (6/6) unix50 6/10 (6/10) Serial 376 s 379 s 427 s 1007 s 668 s (1.5 x) 325 s (1.5 x) 391 s 389 s (1.0 x) Default Unix pipe KumQuat 16-way parallel 333 s (1.1 x) 40 s (9.4 x) 335 s (1.1 x) 41 s (9.3 x) 408 s (1.0 x) 51 s (8.4 x) 118 s (8.6 x) 98 s (4.9 x) 26 s (14.9 x) 144 s (9.1 x) 273 s (1.4 x) 39 s (10.0 x) 427 s (1.7 x) 78 s (9.5 x) 372 s (1.7 x) 63 s (9.9 x) 2089 s 1155 s (1.8 x) 637 s 360 s (1.8 x) 547 s 307 s (1.8 x) 665 s 391 s (1.7 x) 681 s 402 s (1.7 x) 699 s 415 s (1.7 x) 915 s 635 s (1.4 x) 1049 s 862 s (1.2 x) 635 s 330 s (1.9 x) 647 s 327 s (2.0 x) 235 s 220 s (1.1 x) 542 s 433 s (1.3 x) 443 s 397 s (1.1 x) 678 s 475 s (1.4 x) 573 s 417 s (1.4 x) 921 s 645 s (1.4 x) 724 s 237 s (3.1 x) 656 s 334 s (2.0 x) 653 s 346 s (1.9 x) 185 s 143 s (1.3 x) 733 s 428 s (1.7 x) 202 s 111 s (1.8 x) 188 s 87 s (2.2 x) Experimental results (Serial time >= 3 mins) 389 s 736 s 622 s 196 s (10.7 x) 84 s (7.6 x) 79 s (6.9 x) 89 s (7.4 x) 94 s (7.3 x) 100 s (7.0 x) 173 s (5.3 x) 275 s (3.8 x) 64 s (10.0 x) 80 s (8.1 x) 32 s (7.4 x) 57 s (9.5 x) 48 s (9.2 x) 80 s (8.5 x) 73 s (7.9 x) 177 s (5.2 x) 102 s (7.1 x) 74 s (8.8 x) 69 s (9.5 x) 31 s (6.0 x) 64 s (11.4 x) 23 s (8.8 x) 54 s (3.5 x) Parallelizes 131 of 121 unique commands (325 of 427 pipeline stages) Median 8.5x speedup with 16-way parallelism (11.3x with optimization) POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat, PPoPP '22 4/5/2022 25

More Related Content