Introduction to Parallelism in CSE 344: February 23rd Overview

1 / 37

Embed Share

Explore the world of parallel computing in the field of Computer Science and Engineering (CSE) 344 on February 23rd. Discover the importance of computing in parallel, performance metrics, linear vs. non-linear speedup, architectures for parallel databases, and more. Get insights on why sub-linear speedup occurs and the various architectures used in parallel databases. Dive into topics like shared memory, shared disk, and shared-nothing architectures. Learn about the challenges and benefits of parallel computing in the CSE 344 course content.

kamr_380 Follow

Uploaded on May 27, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

CSE 344 FEBRUARY 23RD PARALLELISM INTRO TO

ADMINISTRIVIA OQ5 Due Tonight (11:00) HW6 Due next Wednesday (Feb 28) HW4 Grades Out Cost-estimation problems On Final Exam Look at previous midterms early

WHY COMPUTE IN PARALLEL? Multi-cores: Most processors have multiple cores This trend will likely increase in the future Big data: too large to fit in main memory Distributed query processing on 100x-1000x servers Widely available now using cloud services Recall HW3 and HW6

PERFORMANCE METRICS FOR PARALLEL DBMSS Nodes = processors, computers Speedup: More nodes, same data higher speed Scaleup: More nodes, more data same speed 4 CSE 344 - 2017au

LINEAR V.S. NON- LINEAR SPEEDUP Speedup 10 1 5 15 # nodes (=P)

LINEAR V.S. NON- LINEAR SCALEUP Batch Scaleup Ideal 10 1 5 15 # nodes (=P) AND data size

WHY SUB-LINEAR SPEEDUP AND SCALEUP? Startup cost Cost of starting an operation on many nodes Interference Contention for resources between nodes Skew Slowest node becomes the bottleneck

ARCHITECTURES FOR PARALLEL DATABASES Shared memory Shared disk Shared nothing

SHARED MEMORY P P P Nodes share both RAM and disk Dozens to hundreds of processors Example: SQL Server runs on a single machine and can leverage many threads to speed up a query check your HW3 query plans Interconnection Network Global Shared Memory Easy to use and program Expensive to scale last remaining cash cows in the hardware industry D D D

SHARED DISK P P All nodes access the same disks Found in the largest "single-box" (non-cluster) multiprocessors P M M M Example: Oracle Interconnection Network No need to worry about shared memory Hard to scale: existing deployments typically have fewer than 10 machines D D D

SHARED NOTHING Cluster of commodity machines on high-speed network Interconnection Network Called "clusters" or "blade servers Each machine has its own memory and disk: lowest contention. P P P Example: Google Because all machines today have many cores and many disks, shared- nothing systems typically run many "nodes on a single physical machine. M M M D D D We discuss only Shared Nothing in class Easy to maintain and scale Most difficult to administer and tune.

APPROACHES TO PARALLEL QUERY EVALUATION cid=cid cid=cid cid=cid Inter-query parallelism Transaction per node Good for transactional workloads pid=pid pid=pid Customer pid=pid Customer Customer Product Purchase Purchase Product Product Purchase Inter-operator parallelism Operator per node Good for analytical workloads cid=cid pid=pid Customer Product Purchase Intra-operator parallelism Operator on multiple nodes Good for both? cid=cid pid=pid Customer Product Purchase CSE 344 - 2017au We study only intra-operator parallelism: most scalable 12

DISTRIBUTED QUERY PROCESSING Data is horizontally partitioned on many servers Operators may require data reshuffling First let s discuss how to distribute data across multiple nodes / servers

SINGLE NODE QUERY PROCESSING (REVIEW) Given relations R(A,B) and S(B, C), no indexes: Selection: A=123(R) Scan file R, select records with A=123 Group-by: A,sum(B)(R) Scan file R, insert into a hash table using A as key When a new key is equal to an existing one, add B to the value Join: R S Scan file S, insert into a hash table using B as key Scan file R, probe the hash table using B

HORIZONTAL DATA PARTITIONING Data: Servers: 1 2 . . . P K A B

HORIZONTAL DATA PARTITIONING Data: Servers: 1 2 . . . P K A B K A B K A B K A B Which tuples go to what server?

HORIZONTAL DATA PARTITIONING Block Partition: Partition tuples arbitrarily s.t. size(R1) size(RP) Hash partitioned on attribute A: Tuple t goes to chunk i, where i = h(t.A) mod P + 1 Recall: calling hash fn s is free in this class Range partitioned on attribute A: Partition the range of A into - = v0 < v1< < vP= Tuple t goes to chunk i, if vi-1 < t.A < vi

UNIFORM DATA V.S. SKEWED DATA Let R(K,A,B,C); which of the following partition methods may result in skewed partitions? Block partition Uniform Hash-partition On the key K On the attribute A Assuming good hash function Uniform E.g. when all records have the same value of the attribute A, then all records end up in the same partition May be skewed Keep this in mind in the next few slides

PARALLEL EXECUTION OF RA OPERATORS: GROUPING Data: R(K,A,B,C) Query: A,sum(C)(R) How to compute group by if: R is hash-partitioned on A ? R is block-partitioned ? R is hash-partitioned on K ?

PARALLEL EXECUTION OF RA OPERATORS: GROUPING Data: R(K,A,B,C) Query: A,sum(C)(R) R is block-partitioned or hash-partitioned on K R1 R2 RP . . . Reshuffle R on attribute A R1 R2 RP Run grouping on reshuffled partitions . . .

SPEEDUP AND SCALEUP Consider: Query: A,sum(C)(R) Runtime: only consider I/O costs If we double the number of nodes P, what is the new running time? Half (each server holds as many chunks) If we double both P and the size of R, what is the new running time? Same (each server holds the same # of chunks) But only if the data is without skew!

SKEWED DATA R(K,A,B,C) Informally: we say that the data is skewed if one server holds much more data that the average E.g. we hash-partition on A, and some value of A occurs very many times ( Justin Bieber ) Then the server holding that value will be skewed

PARALLEL EXECUTION OF RA OPERATORS: PARTITIONED HASH-JOIN Data: R(K1, A, B), S(K2, B, C) Query: R(K1, A, B) S(K2, B, C) Initially, both R and S are partitioned on K1 and K2 R1, S1 R2, S2 . . . RP, SP Reshuffle R on R.B and S on S.B R 2, S 2 . . . R P, S P R 1, S 1 Each server computes the join locally

Data: R(K1,A, B), S(K2, B, C) Query: R(K1,A,B) S(K2,B,C) PARALLEL JOIN ILLUSTRATION R1 S1 R2 S2 K1 1 2 B 20 50 K2 101 102 B 50 50 K1 3 4 B 20 20 K2 201 202 B 20 50 Partition M1 M2 Shuffle on B R1 R2 S1 S2 K1 1 3 4 B 20 20 20 K2 201 B 20 K1 2 B 50 K2 101 102 202 B 50 50 50 Local Join M1 M2

Data: R(A, B), S(C, D) Query: R(A,B) B=C S(C,D) BROADCAST JOIN Broadcast S Reshuffle R on R.B R1 R2 . . . RP S R 1, S R 2, S . . . R P, S Why would you want to do this?

Order(oid, item, date), Line(item, ) EXAMPLE PARALLEL QUERY PLAN Find all orders from today, along with the items ordered SELECT * FROM Order o, Line i WHERE o.item = i.item AND o.date = today() join o.item = i.item select date = today() scan scan Item i Order o

Order(oid, item, date), Line(item, ) PARALLEL QUERY PLAN join o.item = i.item select date = today() scan Order o Node 3 Node 1 Node 2 hash hash hash h(o.item) h(o.item) h(o.item) select select select date=today() date=today() date=today() scan scan scan Order o Order o Order o Node 3 Node 1 Node 2

Order(oid, item, date), Line(item, ) PARALLEL QUERY PLAN join o.item = i.item date = today() scan Item i Order o Node 3 Node 1 Node 2 hash hash hash h(i.item) h(i.item) h(i.item) scan scan scan Item i Item i Item i Node 3 Node 1 Node 2

Order(oid, item, date), Line(item, ) EXAMPLE PARALLEL QUERY PLAN join join join o.item = i.item o.item = i.item o.item = i.item Node 3 Node 1 Node 2 contains all orders and all lines where hash(item) = 3 contains all orders and all lines where hash(item) = 2 contains all orders and all lines where hash(item) = 1

A CHALLENGE Have P number of servers (say P=27 or P=1000) How do we compute this Datalog query in one step? Q(x,y,z) :- R(x,y), S(y,z),T(z,x)

A CHALLENGE Have P number of servers (say P=27 or P=1000) How do we compute this Datalog query in one step? Q(x,y,z) = R(x,y),S(y,z),T(z,x) Organize the P servers into a cube with side P Thus, each server is uniquely identified by (i,j,k), i,j,k P (i,j,k) k j i P 1

HYPERCUBE JOIN Have P number of servers (say P=27 or P=1000) How do we compute this Datalog query in one step? Q(x,y,z) = R(x,y),S(y,z),T(z,x) Organize the P servers into a cube with side P Thus, each server is uniquely identified by (i,j,k), i,j,k P Step 1: Each server sends R(x,y) to all servers (h(x),h(y),*) Each server sends S(y,z) to all servers (*,h(y),h(z)) Each server sends T(x,z) to all servers (h(x),*,h(z)) j i R(x,y)

HYPERCUBE JOIN Have P number of servers (say P=27 or P=1000) How do we compute this Datalog query in one step? Q(x,y,z) = R(x,y),S(y,z),T(z,x) Organize the P servers into a cube with side P Thus, each server is uniquely identified by (i,j,k), i,j,k P Step 1: Each server sends R(x,y) to all servers (h(x),h(y),*) Each server sends S(y,z) to all servers (*,h(y),h(z)) Each server sends T(x,z) to all servers (h(x),*,h(z)) Final output: Each server (i,j,k) computes the query R(x,y),S(y,z),T(z,x) locally j i

HYPERCUBE JOIN Have P number of servers (say P=27 or P=1000) How do we compute this Datalog query in one step? Q(x,y,z) = R(x,y),S(y,z),T(z,x) Organize the P servers into a cube with side P Thus, each server is uniquely identified by (i,j,k), i,j,k P Step 1: Each server sends R(x,y) to all servers (h(x),h(y),*) Each server sends S(y,z) to all servers (*,h(y),h(z)) Each server sends T(x,z) to all servers (h(x),*,h(z)) Final output: Each server (i,j,k) computes the query R(x,y),S(y,z),T(z,x) locally Analysis: each tuple R(x,y) is replicated at most P times j i

Q(x,y,z) = R(x,y),S(y,z),T(z,x) Hypercube join R1 R2 R3 S1 T1 S2 T2 S3 T3 y 4 4 z 7 9 z 1 3 x 1 3 y 2 2 z 3 9 z 9 3 x 5 1 y 6 6 z 7 9 z 7 3 x 1 1 x 1 3 y 2 2 x 5 7 y 4 6 x 8 9 y 6 6 Partition P1 P2 P3 Shuffle R1 T1 R2 T2 R3 T3 S1 S2 S3 z 7 x 1 z 3 x 1 z 3 x 3 y 2 z 7 y 2 z 3 y 2 z 3 x 1 y 2 x 1 y 2 x 3 y 2 Local Join P1: (1, 2, 7) P2: (1, 2, 3) P3: (3, 2, 3)

Q(x,y,z) = R(x,y),S(y,z),T(z,x) Hypercube join R1 R2 R3 S1 T1 S2 T2 S3 T3 y 4 4 z 7 9 z 1 3 x 1 3 y 2 2 z 3 9 z 9 3 x 5 1 y 6 6 z 7 9 z 7 3 x 1 1 x 1 3 y 2 2 x 5 7 y 4 6 x 8 9 y 6 6 Partition P1 P2 P3 Shuffle What if h(x): h(1) = h(3)

Q(x,y,z) = R(x,y),S(y,z),T(z,x) Hypercube join R1 R2 R3 S1 T1 S2 T2 S3 T3 y 4 4 z 7 9 z 1 3 x 1 3 y 2 2 z 3 9 z 9 3 x 5 1 y 6 6 z 7 9 z 7 3 x 1 1 x 1 3 y 2 2 x 5 7 y 4 6 x 8 9 y 6 6 Partition P1 P2 P3 Shuffle What if h(x): h(1) = h(3) R1 T1 R2 T2 R3 T3 S1 S2 S3 z 7 x 1 z 3 x 1 z 3 x 3 y 2 z 7 y 2 z 3 y 2 z 3 x 1 3 y 2 2 x 1 y 2 x 1 3 y 2 2 Local Join P1: (1, 2, 7) P2: (1, 2, 3) P3: (3, 2, 3)

Introduction to Parallelism in CSE 344: February 23rd Overview

Download Presentation

Presentation Transcript

Related

More Related Content